SPACE–TIME BLOCK CODES AND THE COMPLEXITY OF
SPHERE DECODING
Miia Ma¨ki
Master’s Thesis
July 2008
DEPARTMENT OF MATHEMATICS
UNIVERSITY OF TURKU
FIN-20014 TURKU
FINLAND
Contents
1 Introduction 1
1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Lattices 3
2.1 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Lattices under Linear Transformation . . . . . . . . . . . . . . . 3
2.3 QR-decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.4 Closest Vector Problem . . . . . . . . . . . . . . . . . . . . . . . 5
3 Space–Time Block Codes 6
3.1 Lattice Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2 MIMO Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Criteria for Code Construction . . . . . . . . . . . . . . . . . . . 10
4 Sphere Decoder 13
4.1 Introduction to Sphere Decoding . . . . . . . . . . . . . . . . . . 13
4.2 The Pohst Enumeration . . . . . . . . . . . . . . . . . . . . . . . 15
4.3 The Schnorr–Euchner Enumeration and the Babai Point . . . . . 16
4.4 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5 Complexity of Sphere Decoding 23
5.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 Choosing the Radius . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.3 SNR and Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4 Preprocessing and Ordering . . . . . . . . . . . . . . . . . . . . . 27
5.5 Error Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 Collapsing Lattices 30
6.1 The Defect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2 Collapsing Lattices and the Decoding Complexity . . . . . . . . . 34
7 Improving the Performance of Sphere Decoding 39
7.1 Adding Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 39
7.2 Distributed Search . . . . . . . . . . . . . . . . . . . . . . . . . . 45
References 50
1 Introduction
In wireless communications the transmitted signals may be affected by noise.
The receiver must decode the received message, which can be mathematically
modelled as a search for the closest lattice point to a given vector. This problem
is known to be NP-hard in general, but for communications applications there
exist algorithms that, for a certain range of system parameters, offer polynomial
expected complexity.
The purpose of this thesis is to study the sphere decoding algorithm intro-
duced in [6] and especially its computational complexity when used in space–
time coding. Computer simulations are used to study how different system
parameters affect the computational complexity of the algorithm. The aim is to
find ways to improve the algorithm from the complexity point of view. The lim-
ited battery life of mobile devices provides a motivation for finding algorithms
with lower complexity and thus lower power consumption.
We start by reviewing the theory of lattices in Chapter 2 for some necessary
background knowledge. In Chapter 3 the principles of space–time coding are
explained and criteria for designing space–time block codes are introduced. In
Chapter 4 the sphere decoding algorithm is discussed in detail.
Chapter 5 contains an analysis of the complexity of sphere decoding with
computer simulation results. In Chapter 6 the concept of collapsing lattices is
introduced and its effect on the complexity of sphere decoding is explained.
The main contribution of this thesis is in Chapter 7, where we search for ways
to improve the sphere decoding algorithm and develop two new modifications
to the algorithm, that are shown to perform faster than the original within a
range of system parameters.
1
1.1 Notations
The following notation is used throughout this thesis.
R Field of real numbers
C Field of complex numbers
Z Ring of rational integers
i
√−1
Z[i] Ring of Gaussian integers
ω 12 (−1 + i
√
3)
Z[ω] Ring of Eisenstein integers
ℜ(x) Real part of x
ℑ(x) Imaginary part of x
x∗ Complex conjugate
x Vector
‖ · ‖ Euclidean norm
〈·, ·〉 Inner product on Rn
X Matrix
XT Transpose
X∗ Hermitian conjugate
detX Determinant of X
In n× n identity matrix
E Expectation
⌊·⌉ Rounding to the closest integer
2
2 Lattices
In this chapter the basic concepts of lattices are reviewed for later use in this
thesis. All of the results are given without proof. For more details see e.g. [5].
2.1 Bases
Letm and n be two positive integers such that n ≤ m. A subset Λ of Rm is called
a lattice of dimension n if there exist n linearly independent m-dimensional
vectors b1, . . . ,bn ∈ Rm such that
Λ =
n∑
i=1
Zbi = {r1b1 + . . .+ rnbn|ri ∈ Z, 1 ≤ i ≤ n}. (1)
The set of vectors b1, . . . ,bn is called the basis of the lattice Λ, and n is called
the rank of Λ. The vectors of a lattice Λ form a group under addition: if a ∈ Λ
then −a ∈ Λ; and if a, b ∈ Λ then a± b ∈ Λ.
The lattice can also be expressed in matrix form Λ = {x|x = Br} where B
is an m × n matrix B = [b1, . . . ,bn] and r is an integer column vector. B is
called the basis matrix of Λ.
The basis is not uniquely determined by the lattice, but the same lattice can
be generated by several different bases. Let b′i be the points
b′i =
∑
j
vijbj , (1 ≤ i, j ≤ n),
where vij are any integers such that det(vij) = ±1. Now
bi =
∑
j
wijb
′
j
with integral wij . It then follows that (1) and the lattice
Λ′ = r′1b
′
1 + . . .+ r
′
nb
′
n,
where r′1, . . . , r
′
n run through all integers, are precisely the same set of points,
Λ = Λ′. That is, b1, . . . ,bn and b′1, . . . ,b
′
n are bases of the same lattice.
Furthermore, every basis b′i of a lattice Λ may be obtained from a given basis
bi in this way [5].
2.2 Lattices under Linear Transformation
Let us consider briefly the effect of a non-singular affine transformation x →
y = αx of n-dimensional space into itself. Let the transformation y = αx be
given by
yi =
∑
1≤j≤n
αijxj (1 ≤ i ≤ n),
3
where y = (y1, . . . , yn) and x = (x1, . . . , xn) are corresponding points in the
transformation and αij are real numbers such that
det(α) = det(αij) 6= 0.
Let Λ be a lattice and denote by αΛ the set of points αx, x ∈ Λ. If
b1, . . . ,bn is a basis for Λ, then the general lattice point x = r1b1 + . . .+ rnbn
(with r1, . . . , rn integers) of Λ maps to
αx = α(r1b1 + . . .+ rnbn) = r1αb1 + . . .+ rnαbn.
Hence αΛ is a lattice with basis αb1, . . . ,αbn.
2.3 QR-decomposition
The basis vectors of a lattice are not necessarily orthogonal. One method for
orthogonalizing the basis is the following Gram–Schmidt procedure. However,
the resulting basis does not necessarily span the same lattice, since the coeffi-
cients are real numbers and not necessarily integers, but the basis vectors span
the same vector space as the original basis.
Gram–Schmidt orthogonalization: Let bi, . . . ,bn ∈ Rn be a set of lin-
early independent vectors. The vectors ui (1 ≤ i ≤ n) are inductively defined
by
u1 = b1
ui = bi −
i−1∑
j=1
φi,juj ,
where the Gram–Schmidt coefficients φi,j ∈ R are defined by
φi,j =
〈bi,uj〉
〈uj ,uj〉 =
〈bi,uj〉
‖uj‖2 .
In particular, φi,i = 1 and φi,j = 0 for j > i. Notice that ui is
the projection of bi on the orthogonal complement of
∑i−1
j=1 Rbj , and that∑i−1
j=1 Rbj =
∑i−1
j=1 Ruj , for 1 ≤ i ≤ n. It follows that u1, . . . ,un is an or-
thogonal basis of Rn.
With the help of the Gram–Schmidt procedure one can obtain the QR-
decomposition of the matrix B = [b1, . . . ,bn], where the vectors bi are the
columns of the matrix. Let us denote ek =
uk
‖uk‖ for k = 1, 2, ..., n. Since
φi,juj =
〈bi,uj〉
‖uj‖2 uj =
〈bi, ej‖uj‖〉
‖uj‖2 ej‖uj‖ = 〈bi, ej〉ej ,
4
by rearranging the equations above we have:
b1 = e1 ‖u1‖ ,
b2 = 〈b2, e1〉e1 + e2 ‖u2‖ ,
...
bn =
n−1∑
j=1
〈bn, ej〉ej + en ‖un‖ .
By writing the equations in matrix form we get the QR-decomposition of
the matrix B:
B = QR
=
[
e1 e2 . . . en
]


‖u1‖ 〈b2, e1〉 〈b3, e1〉 . . . 〈bn, e1〉
0 ‖u2‖ 〈b3, e2〉 . . . 〈bn, e2〉
0 0 ‖u3‖
...
...
...
. . . 〈bn, en−1〉
0 0 . . . 0 ‖un‖


.
Any m × n matrix A with linearly independent columns can be factorized
into a product A = QR, where the columns of Q are orthonormal and R is
upper triangular and invertible [8].
2.4 Closest Vector Problem
The closest vector problem (CVP) is the problem of finding, for a given lattice
Λ and a given input point y ∈ Rm, the lattice point that is closest to y. More
precisely, to find a vector xˆ ∈ Λ such that
‖y − xˆ‖ ≤ ‖y − x‖, for all x ∈ Λ.
For orthogonal lattices, such as the n-dimensional integer lattice Zn, the
solution is simple. However, in the general case the lattice is not necessarily
orthogonal, but skewed. It has been shown [1] that the general closest vec-
tor problem as a function of the dimension m is NP-hard. Thus, all known
algorithms for solving the problem optimally have exponential complexity.
For example, the maximum-likelihood decoding of space–time codes can be
reduced to a closest vector search. In the following chapters this will be ex-
plained in more detail.
5
3 Space–Time Block Codes
Space-–time coding is a method to improve the reliability of data transmission
by using multiple antennas. In this chapter the MIMO (multiple-input multiple-
output) channel model and two space–time lattice codes are introduced. In later
chapters the properties of these two lattice codes are compared with each other.
3.1 Lattice Codes
A lattice codeword is a matrix from a constellation
L =
{
k∑
i=1
aiXi|Xi ∈Mm×l(C), ai ∈ S ⊂ Z
}
,
where Xi, (i = 1, . . . , k), are constant basis matrices and ai, (i = 1, . . . , k), are
integer variables from some finite signal set S of size q. The information to
be transmitted is coded in the coefficients (a1, . . . , ak) ∈ Sk. Throughout the
thesis, the terms “lattice code” and “block code” are used interchangeably to
refer to a code used in space–time coding.
In this thesis we use the pulse amplitude modulation (PAM) signal set of
size q, i.e.,
S = {a = 2u− q + 1|u ∈ Zq}
with Zq = {0, 1, . . . , q − 1}.
The size of the signal set is normally some power of 2, i.e. q = 2b and b
information bits are mapped to each symbol. A commonly used mapping is
the Gray encoding [12], where the adjacent symbols differ from each other by
only one binary digit (as illustrated in Figure 1). This mapping is preferred,
because the most likely errors caused by noise involve the erroneous selection
of an adjacent symbol to the one that was transmitted. In this case a symbol
error results in only a single bit error in the b-bit sequence.
The following two examples of lattice codes were presented in the article
[10]. In later chapters these codes will be used in the simulations and their
performance with regard to decoding complexity will be analyzed.
The block code LNF :
MNF (c1, c2, c3, c4) =


c1 ic4 ic3 ic2
c2 c1 ic4 ic3
c3 c2 c1 ic4
c4 c3 c2 c1


6
(a)q=2
(b) q=4
(c) q=8
0 1
01 11 1000
000 001 011 010 110 111 101 100
Figure 1: Gray encoding for PAM signals
and the block code LQNF :
MQNF (c1, c2, c3, c4) =


c1 ic2 −c∗3 −c∗4
c2 c1 ic
∗
4 −c∗3
c3 ic4 c
∗
1 c
∗
2
c4 c3 −ic∗2 c∗1

 ,
where ci (1 ≤ i ≤ 4) are Gaussian integers; ci ∈ Z[i] = {a+ bi|a, b ∈ Z}.
Both of the above block codes have 8 basis matrices Xi, which are
X1 = M(1, 0, 0, 0), X2 = M(i, 0, 0, 0), X3 = M(0, 1, 0, 0), X4 = M(0, i, 0, 0),
X5 =M(0, 0, 1, 0),X6 =M(0, 0, i, 0),X7 =M(0, 0, 0, 1) andX8 =M(0, 0, 0, i).
For example, the codeword M(c1, c2, c3, c4) is generated by
M(c1, c2, c3, c4) =
4∑
i=1
(ℜ(ci)X2i−1 + ℑ(ci)X2i)
3.2 MIMO Channel
Consider a multiple antenna system with m transmit and n receive antennas.
The codewords X are m × l complex matrices from some constellation L ⊂
Mm×l(C), where l is the length of the code, l ≥ m,
X =


x1
x2
...
xm

 ∈Mm×l(C).
The matrix rows xi correspond to the different transmit antennas and the
columns to the different time slots. At each time slot t, signals xi,t (i = 1, . . . ,m)
are transmitted simultaneously from the m transmit antennas, as depicted in
Figure 2.
7
Figure 2: Space–time coding in a quasi-static Rayleigh fading channel
In a general fading channel the receiver obtains the signal
Y = HX+N
=


h1,1 . . . h1,m
...
. . .
...
hn,1 . . . hn,m




x1,1 . . . x1,l
...
. . .
...
xm,1 . . . xm,l

+


η1,1 . . . η1,l
...
. . .
...
ηn,1 . . . ηn,l


where hi,j denotes the path gain from transmit antenna j to receive antenna i
and N = (ηi,j) represents additive white Gaussian noise. It is assumed that the
antennas are sufficiently spaced such that the fadings are uncorrelated.
Here we use the Rayleigh fading channel model, where the path gains hi,j are
modelled as samples of independent and identically distributed (i.i.d.) complex
Gaussian random variables with zero mean and variance τ2 per dimension, as
it is assumed that signals received at different antennas experience independent
fading. The fading channel is assumed to be quasi-static, i.e., the path gains are
assumed to remain constant for the interval T of transmitting a single codeword
after which they change and again remain constant for the next codeword.
The noise components ηi,t at receive antenna i at time t are modelled as sam-
ples of i.i.d. complex Gaussian random variables with zero mean and variance
σ2 per dimension.
The component yi,j of the received matrix corresponds to the j-th signal
received by antenna i, which is a superposition of the faded versions of all of
the m transmitted signals, with additive noise. For 1 ≤ i ≤ n, 1 ≤ j ≤ m
yi,j = hi,1x1,j + hi,2x2,j + . . .+ hi,mxm,j + ηi,j .
8
The signal-to-noise ratio (SNR) is the ratio of the expected energy of the
transmitted signal to that of the additive noise per receive antenna.
SNR =
E[P (HX)]
E[P (ηr)]
,
where
E[P (ηr)] = E
[
l∑
i=1
ηr,iη
∗
r,i
]
= 2lσ2
is the expected energy of the noise affecting a single receive antenna r during
the transmission of one codeword, and
E[P (HX)] = 2τ2PL
is the expected energy of the corresponding transmitted signal, where PL
denotes the average energy of a codeword X from the constellation L =
{∑ki=1 aiXi|ai ∈ S}
PL =
1
|L|
∑
a1∈S
· · ·
∑
ak∈S
P
(
k∑
i=1
aiXi
)
=
1
|L|
∑
X∈L

 m∑
i=1
l∑
j=1
|xi,j |2

 .
The size of the constellation is |L| = |S|k = qk. Now
SNR =
2τ2PL
2lσ2
=
PL
l
τ2
σ2
.
3.3 Decoding
The receiver can be assumed to have perfect channel state information, i.e., the
channel matrix H is known at the receiver. It is usually measured with so-called
pilot signals. Then the set of possible messages is
HL =
{
k∑
i=1
aiHXi
}
⊆Mn×l(C).
The problem of decoding is, given the received matrix Y and the matrices
HX1, . . . ,HXk, to find the coefficients a1, . . . , ak for Xˆ = HX =
∑k
i=1 aiHXi
such that the metric
d(Y − Xˆ)2 =
n∑
i=1
l∑
j=1
|yi,j − xˆi,j |2 (2)
is minimized. This problem can be transformed into the problem of finding the
closest lattice point. In the next chapter an algorithm for finding the closest
lattice point is presented.
9
The decision metric (2) is based on the maximum likelihood (ML) principle,
in which the conditional probability p(Y|H,X) is maximized. In the follow-
ing it is assumed that all the codewords from the constellation have an equal
probability of being transmitted.
p(Y|H,X) =
n∏
i=1
l∏
j=1
p(yi,j |(HX)i,j)
=
n∏
i=1
l∏
j=1
(
1√
2piσ2
exp
−(yi,j − (HX)i,j)2
2σ2
)
=
(
1√
2piσ2
)n×l
exp
−∑ni=1∑lj=1(yi,j − (HX)i,j)2
2σ2
=
(
1√
2piσ2
)n×l
exp
−d(Y −HX)2
2σ2
The probability p(Y|H,X) is the probability of receiving Y after the code-
wordX has been sent through the channelH. This probability (over all possible
codewords) is maximum for the codeword X′ for which the metric d(Y−HX′)2
of the matrices Y and HX′ is minimized.
3.4 Criteria for Code Construction
The main priority for code design in space–time coding is the so-called rank
criterion [14]. The rank criterion states that the difference of any two distinct
codewords should have full rank. If the rank criterion is satisfied, the code is
said to have full diversity. In that case HX1 6= HX2 for all channel matrices
H 6= 0 and all codewords X1, X2 where X1 6= X2.
Another criterion for code construction is the determinant criterion [14].
Let us denote the difference of two codewords by ∆12 = X1 − X2 and let
E12 = ∆12∆
∗
12. The determinant criterion sets as a design target that the
minimum of the determinant of E12 taken over all pairs of distinct codewords
X1 and X2 should be maximized. In the article [14] it was shown that for
high SNR the probability of making an error between X1 and X2 is inversely
proportional to a power of detE12/σ. Furthermore, the rank criterion maximizes
that exponent.
The code rate is the number of symbols transmitted in one code block divided
by the number of time slots needed to send one block. In this context a symbol
refers to an element of a 2-dimensional alphabet S ⊂ C. Usually S = Z[i] or
Z[ω]. The code rate is limited by the number of antennas, so that the code rate
≤ min{# of Tx antennas, # of Rx antennas}.
10
A block code is said to have full rate if its rate equals the number of transmit
antennas. In that case there must be at least as many receive antennas in the
setting, too.
Block codes constructed from orthogonal designs have been shown to possess
many advantages [13]. They provide full diversity with maximal rate and have
simple maximum-likelihood decoding algorithms with linear processing at the
receiver.
Definition 3.1. A complex orthogonal design of size n is a n × n matrix G
whose entries are the indetermined variables ±x1,±x2, ...,±xn, their conjugates
±x∗1,±x∗2, ...,±x∗n and multiples of these by i =
√−1 such that
G∗G =
(|x1|2 + |x2|2 + · · ·+ |xn|2) In.
Example 3.1. The so-called Alamouti scheme is an example of a complex
orthogonal design. It uses 2 transmit antennas and has rate 1. The transmitted
2× 2 codeword is
A(c1, c2) =
(
c1 c2
−c∗2 c∗1
)
c1, c2 ∈ C
and the receiver obtains the signal y = [y1, y2] = hA+ n.
A∗A =
(
|c1|2 + |c2|2 0
0 |c1|2 + |c2|2
)
Let
A1 =
(
1 0
0 1
)
,A2 =
(
i 0
0 −i
)
,A3 =
(
0 1
−1 0
)
,A4 =
(
0 i
i 0
)
denote the basis matrices. Then for every non-zero h ∈ C2\{0} the vectors hAi
and hAj , i 6= j, are mutually orthogonal, since
〈hAi ,hAj〉R =

0 if i 6= j,|h1|2 + |h2|2 if i = j.
Due to this fact the Alamouti scheme has a very simple ML-decoding algorithm
[2].
However, it has been shown that such full rate complex orthogonal designs
do not exist for more than two transmit antennas [13]. Block codes derived from
so-called generalized complex orthogonal designs still provide orthogonality and
thus have simple ML-decoding, but they have smaller rate.
11
Definition 3.2. A generalized complex orthogonal design of size n is a p × n
matrix G whose entries are 0,±x1,±x∗1±x2,±x∗2, ...,±xk,±x∗k or their product
with i such that
G∗G = κ
(|x1|2 + |x2|2 + · · ·+ |xk|2) In,
where κ is a constant. G has rate R = k/p.
Example 3.2. For four antennas the next example of a generalized complex
orthogonal design provides rate 3/4, i.e., only three complex symbols are trans-
mitted in four time slots.
MOD(c1, c2, c3) =


c1 c2 c3 0
−c∗2 c∗1 0 c3
c∗3 0 −c∗1 c2
0 c∗3 −c∗2 −c1

 (c1, c2, c3 ∈ C).
It spans a 6-dimensional vector space over R with the basis
{MOD(1, 0, 0),MOD(i, 0, 0), . . . ,MOD(0, 0, i)}
and it has the property
M∗ODMOD = (|c1|2 + |c2|2 + |c3|2)I4.
The two block codes LNF and LQNF that were introduced in Section 3.1
do not have the structure of a general orthogonal design. This means that the
decoding at the receiver will be more complex using these codes. They do,
however, have higher rate as both of them have rate 1. In the next chapters
we will show that there exist algorithms that in practical situations provide a
reasonable average decoding complexity for these codes.
12
4 Sphere Decoder
In general the problem of finding the closest lattice point is prohibitively com-
plex. Indeed, certain cryptosystems are based on the difficulty of solving this
and the related problem of finding the shortest vector in a lattice of a rank rang-
ing in the hundreds. However, in communications applications the rank of the
lattice remains moderate and the search is usually constrained among the finite
set of points with coordinates within the prescribed signal set. Nevertheless,
the number of valid combinations of coordinates is still too large for the simple
approach of an exhaustive search. For example, with a lattice code of rank 8
and a signal set of size 8 there would be 88 ≈ 16.8 million different lattice points
through which to search.
In communications applications the given vector is not arbitrary but rather
an unknown lattice point that has been corrupted by an additive noise vector
with known statistical properties. Therefore it is meaningful to consider the
expected (average) complexity instead of the worst-case complexity. It has been
shown that the sphere decoder algorithm, which is described in this chapter,
has a polynomial time average complexity in many relevant cases, for a range of
system parameters [6]. This finding is also supported by the simulation results
in Chapter 5.
This chapter is based on the articles [6] and [9].
4.1 Introduction to Sphere Decoding
The basic idea of sphere decoding is to search for the closest lattice point only
among points within a certain hypersphere of radius r around the given vector y,
instead of an exhaustive search over all lattice points. Clearly the closest lattice
point inside the hypersphere is also the closest point in the whole lattice. Figure
3 illustrates this idea behind the sphere decoder in a 2-dimensional lattice.
Figure 3: Closest lattice point in a 2-dimensional lattice
First of all, it is necessary to know which lattice points lie inside the hyper-
13
sphere. Testing the distances between y and every other lattice point would
still result in an exhaustive search and would therefore offer no reduction to
the overall complexity of decoding. An efficient way of finding the points is
explained in the following. Instead of considering a general m-dimensional hy-
persphere, we can start by considering the one-dimensional case where m = 1.
A one-dimensional hypersphere is simply an interval and therefore the lattice
points inside this hypersphere are the integer values lying in this interval. We
can now use this fact to move from dimension k to k+1. Suppose that we have
determined all of the lattice points in a k-dimensional hypersphere of radius
r. Then for any such k-dimensional point, the set of admissible values for the
k+1-th dimensional coordinate that lies in the higher-dimensional hypersphere
of the same radius r forms an interval. In other words, we can determine all
lattice points in an m-dimensional hypersphere of radius r by successively de-
termining all lattice points in hyperspheres of lower dimensions 1, 2, . . . ,m and
the same radius r.
Such an algorithm for determining the lattice points inside anm-dimensional
hypersphere can be represented by a tree where the branches in the k-th level of
the tree correspond to the lattice points inside the k-dimensional hypersphere
of the same radius. Figure 4 below shows an example of such a tree generated
by finding the lattice points inside a 4-dimensional hypersphere. Furthermore,
the complexity of such an algorithm will depend on the size of the tree, i.e., on
the number of lattice points visited by the algorithm in different dimensions.
k=1
k=2
k=3
k=4
Figure 4: The search tree
14
4.2 The Pohst Enumeration
This strategy for enumerating all of the lattice points inside a sphere with a cer-
tain radius was first proposed by Pohst [11]. This so-called Pohst enumeration
can be more specifically outlined as follows [6].
Assume that B is an m× n real matrix with m ≥ n and rank(B) = n. For
B ∈ Rm×n and y ∈ Rm, consider the minimization
xˆ = arg min
x∈Zn
|y −Bx|2.
The set Λ = {Bx|x ∈ Zn} is an n-dimensional (infinite) lattice in Rm. Let C0
be the squared radius of an m-dimensional sphere S(y,
√
C0) centered at y. A
lattice point Bx lies inside a hypersphere of radius
√
C0 if and only if
C0 ≥ |y −Bx|2.
Now we wish to produce a list of all of these points of Λ ∩ S(y,√C0). In order
to break the problem into the subproblems of finding the points successively in
each dimension, it is useful to consider the QR factorization of the matrix B.
By performing the Gram–Schmidt orthonormalization of the columns of B, we
get
B = [Q,Q′]
[
R
0
]
where R is an n × n upper triangular matrix with positive diagonal elements,
0 is an (m− n)× n zero matrix, Q (resp. Q′) is an m× n (resp. m× (m− n))
matrix and [Q,Q′] is orthogonal. The condition Bx ∈ S(y,√C0) can be written
as
|y −Bx|2 ≤ C0∣∣∣∣[Q,Q′]Ty −
[
R
0
]
x
∣∣∣∣2 ≤ C0∣∣QTy −Rx∣∣2 ≤ C0 − ∣∣(Q′)Ty)∣∣2
|y′ −Rx|2 ≤ C ′0
where y′ , QTy and C ′0 , C0−|(Q′)Ty)|2. Because R has an upper triangular
form, the last inequality implies the set of conditions
n∑
j=i
∣∣∣∣y′j − n∑
l=j
rj,lxl
∣∣∣∣2 ≤ C ′0, i = 1, . . . , n. (3)
By considering the above conditions in the descending order from n to 1 (in the
same way as in back-substitution when solving a linear upper triangular system),
we obtain the set of suitable values of each symbol xi for fixed values of previous
15
symbols xi+1, . . . , xn. More explicitly, let x
n
l , (xl, xl+1, . . . , xn)
T denote the
last n− l+1 components of the vector x. When xni+1 is fixed, the component xi
can take on integer values in the range Ii(xni+1) = [Ai(xni+1), Bi(xni+1)] where
Ai(x
n
i+1) =
⌈
1
ri,i
(
y′i −
n∑
j=i+1
ri,jxj −
√√√√C ′0 − n∑
j=i+1
∣∣∣∣y′j − n∑
l=j
rj,lxl
∣∣∣∣2
)⌉
and
Bi(x
n
i+1) =
⌊
1
ri,i
(
y′i −
n∑
j=i+1
ri,jxj +
√√√√C ′0 − n∑
j=i+1
∣∣∣∣y′j − n∑
l=j
rj,lxl
∣∣∣∣2
)⌋
.
If for some i
n∑
j=i+1
∣∣∣∣y′j − n∑
l=j
rj,lxl
∣∣∣∣2 ≥ C ′0
or if Ai(x
n
i+1) > Bi(x
n
i+1), then Ii(xni+1) = ∅. In this case, there exists no such
value of xi that would satisfy the inequalities (3) and the points which have the
same values of xi+1, . . . , xn as x
n
i+1 do not lie inside the sphere S(y,
√
C0).
The Pohst enumeration starts from level i = n and proceeds climbing up to
levels i = n − 1, n − 2, . . . , 1. At each level i, the variable values chosen earlier
at lower levels determine the interval Ii(xni+1) for xi. If I1(xn2 ) is nonempty,
the vectors x = (x1, (x
n
2 )
T )T , for all x1 ∈ I1(xn2 ), correspond to lattice points
Bx ∈ S(y,√C0). The squared Euclidean distances between such points and y
are given by
d2(y,Bx) =
n∑
j=i
∣∣∣∣y′j − n∑
l=j
rj,lxl
∣∣∣∣2.
The output of the algorithm is the point xˆ for which this distance is minimum.
If no point is found inside the sphere it is declared empty and the search fails. In
this case the squared radius C0 must be increased and the search is performed
again.
4.3 The Schnorr–Euchner Enumeration and the Babai
Point
In the article [1] Agrell et al. proposed the use of Schnorr–Euchner refinement
of the Pohst enumeration in the closest lattice point search. They concluded,
based on numerical results, that their sphere decoding algorithm with Schnorr–
Euchner enumeration is more efficient than the Viterbo–Boutros implementation
that uses Pohst enumeration.
The Pohst enumeration is based on the so-called natural spanning of the
intervals Ii(xni+1) at each level i. In other words, xi takes on values in the as-
16
cending order Ai(x
n
i+1), Ai(x
n
i+1) + 1, . . . , Bi(x
n
i+1). The Schnorr–Euchner enu-
meration is a variation of the Pohst strategy where the intervals are spanned in
a zig-zag order, starting from the midpoint
Si(x
n
i+1) =
⌊
1
ri,i
(
y′i −
n∑
j=i+1
ri,jxj
)⌉
. (4)
Hence, the sequence of values produced by the Schnorr–Euchner enumeration
at each level i is
xi ∈ {Si(xni+1), Si(xni+1)+1, Si(xni+1)−1, Si(xni+1)+2, Si(xni+1)−2, . . .}∩Ii(xni+1)
if
y′i −
n∑
j=i+1
ri,jxj − ri,iSi(xni+1) ≥ 0
or the sequence of values
xi ∈ {Si(xni+1), Si(xni+1)−1, Si(xni+1)+1, Si(xni+1)−2, Si(xni+1)+2, . . .}∩Ii(xni+1)
if
y′i −
n∑
j=i+1
ri,jxj − ri,iSi(xni+1) < 0.
Similar to Pohst enumeration, when a given value of xi results in a point segment
xni outside the sphere, the next value of xi+1 (at level i+ 1) is produced.
Every time a vector x′ ∈ Zn is found such that Bx′ ∈ S(y,√C0), the
squared radius of the sphere can be dynamically updated to d2(y,Bx′), since
this will obviously be an upper bound for the distance of the closest point. Note
that with the Schnorr–Euchner enumeration one can set the squared radius to
C0 = ∞, in which case the event of declaring an empty sphere never occurs.
The first point found in this case is by definition the Babai point [1]. The Babai
point is not necessarily the closest point, but the error can be bounded, i.e.,
it is a nearby point. This point is also known as the nulling and cancelling
or zero-forcing decision-feedback equalization (ZF-DFE) point. Explicitly, it is
given by
xzf−dfei = Si(x
zf−dfe
i+1 , . . . , x
zf−dfe
n )
=
⌊
1
ri,i
(
y′i −
n∑
j=i+1
ri,jx
zf−dfe
j
)⌉
for i = m,m − 1, . . . , 1. The procedure begins by first finding the midpoint
of the interval In and setting xzf−dfen at it. Its effect is then cancelled out by
back-substitution, which is enabled by the upper triangular form of R. The
same process is then repeated for xzf−dfen−1 and so on.
17
4.4 The Algorithm
The following algorithm was proposed by Damen et al. [6] and was shown
to be very efficient in terms of receiver complexity when compared to other
known sphere decoding algorithms. It is a modification of the Schnorr–Euchner
enumeration in order to take into account a finite signal set boundary. Here the
lattice is not infinite, but x takes on values from the set {0, 1, . . . , Q − 1}, i.e.,
Λ = {Bx|x ∈ ZnQ}.
Algorithm II, Smart Implementation (Input C ′0, y
′, R, Output xˆ):
Step 1 (Initialization) Set i := n, Tn := 0, ξn := 0, and dc := C
′
0 (current
sphere squared radius).
Step 2 (DFE on xi) Set xi := ⌊(y′i − ξi)/ri,i⌉ and ∆i := sign(y′i − ξi − ri,ixi).
Step 3 (Main step) If dc < Ti + |y′i − ξi − ri,ixi|2, then go to Step 4 (i.e., we
are outside the sphere).
Else if xi /∈ [0, Q−1] go to Step 6 (i.e., we are inside the sphere but outside
the signal set boundaries).
Else (i.e., we are inside the sphere and signal set boundaries) if i > 1, then
{let ξi−1 :=
∑n
j=1 ri−1,jxj , Ti−1 := Ti + |y′i − ξi − ri,ixi|2, i := i− 1, and
go to Step 2}.
Else (i = 1) go to Step 5.
Step 4 If i = n, terminate, else set i := i+ 1 and go to Step 6.
Step 5 (A valid point is found) Let dc := T1 + |y′1 − ξ1 − r1,1x1|2, save xˆ := x.
Then, let i := i+ 1 and go to Step 6.
Step 6 (Schnorr–Euchner enumeration of level i) Let xi := xi + ∆i, ∆i :=
−∆i − sign(∆i), and go to Step 3.

Here Ti−1 corresponds to the squared distance between the points
(xi, . . . , xn) and (yi, . . . , yn). Note that only the coordinates i, . . . , n of the
lattice points are considered. The variable ξi, for i = n, . . . , 1, is the decision
feedback of a ZF-DFE when the decisions on the symbols from i + 1 to n are
the current values of (xi+1, . . . , xn). The variable ∆i holds the information of
which value of xi follows next according to the Schnorr–Euchner enumeration.
Every time a new value for xi is chosen, it corresponds to a new node in
the tree representation of the algorithm – see the Example 4.1 and Figure 6
below. The first step is initialization and we start from the root of the tree by
18
choosing a value for the coordinate xn in step 2. We come to step 2 also every
time we move to a lower level i in the tree. Here the midpoint of the interval
is chosen as the first value of xi, i.e., we choose the value of xi with which the
distance from y is the smallest when the previous values of xj , j = i+ 1, . . . , n
are known. In step 3 we calculate the distance of the point from y and compare
it to the radius of the sphere. If we are outside the sphere, we continue to step
4. In this case we go up one level in the tree to the next value of xi+1 which
is determined by ∆i+1. In case we are at the upper most level of the tree, the
search is terminated. If in step 3 we are inside the sphere but outside the signal
set boundaries, we go to step 6 and move to the next value of xi staying at the
same level in the tree. If we are both inside the sphere and inside the signal set
boundaries, a suitable value for xi was found and we move down one level in the
tree and go to step 2 again. We come to step 5 if a lattice point is found inside
the sphere. The radius is then updated and we continue by moving up one level
to the node determined by ∆i+1. Step 6 takes care of the zig-zag order of the
values of xi determined by the Schnorr–Euchner enumeration by adjusting the
variable ∆i.
Figure 5 on the next page presents a flow chart of Algorithm II.
Let us next consider a numerical example:
Example 4.1. Let x = (2, 7, 3, 2)T be the message to be transmitted (from the
signal set xi ∈ Z8) and
B =


3.7 1.6 −0.6 20.2
1.5 5.2 5.1 13.5
6.7 9.4 8.6 −21.9
−4.3 −20.7 −23.1 −10.7

 and N =


−0.8
0.1
1.2
−0.9


be the channel and noise matrices. This corresponds to the case of 4 transmit
and 4 receive antennas and block length l = 1. For simplicity we use real
numbers in this example instead of complex numbers. The receiver obtains the
signal
y = Bx+N =


56.4
81.8
62.4
−245.1

 .
The problem is now to determine the transmitted signal x, when y and B are
known. This corresponds to the problem of finding the closest lattice point to
the point y in the lattice {Bx|x ∈ Z48} . For this purpose, the QR-decomposition
19
Initialization:
i= n
T = 0i
xi = 0
d = CC
Set xi at the midpoint of the interval
and calculate :
x = (y - )/r
= sign(y - - r x )
D
ù
D x
i
i i i i,i
i i i i,i i
ë x
Is the distance
of (x ,...,x ) from (y ,...,y )
greater than the radius:
d < T + |y - - r x |
i n i n
C i i i i,i ix
2
Is i = n?
Yes
Is x in [0,Q-1]?i
No
Is i > 1?
Yes
The point is
outside the
sphere, return
to the previous
level:
i = i + 1
Select the next value for x :
= +
i
x x
= - -sign( )
i i i
i i i
D
D D D
No
Yes
A valid coordinate was found,
continue to the next level:
= r x , where j=1,...,n
T = T + |y - i - r x |
i = i -1
x S
x
i-1 i-1,j j
i-1 i i i,i i
2
Yes
A valid point was found,
save x' = x, update the
radius and move back to
the previous level:
d < T + |y - - r x |
i = i + 1
C 1 1 1 1,1 1x
2
No
Output x' if a valid
point was found
Input:
vector
upper diagonal matrix R
signal set size Q
squared radius C
y
Figure 5: Flow chart of Algorithm II
20
of B is first computed:
B = QR
=


0.42 −0.43 −0.68 0.43
0.17 0.15 −0.51 −0.83
0.75 −0.32 0.53 −0.23
−0.48 −0.83 0.06 −0.28




8.91 18.61 18.23 −0.64
0 14.15 17.34 9.19
0 0 0.98 −32.72
0 0 0 5.38

 .
The sphere decoding algorithm takes the upper triangular matrix R and the
vector y′ = QTy


202.5
170.2
−61.4
10.3

 as input. The squared radius of the sphere is set
to be C0 = 15. It then proceeds as depicted in Figure 6 below. The nodes in
the tree correspond to the different values of xi that the algorithm goes through
when searching for the closest lattice point, and the dashed arrows show how
the sphere decoder traverses through the tree.
Figure 6: The search tree for the numerical example
The first value for x4 is computed by x4 = ⌊y′4/r4,4⌉ = ⌊10.3/5.38⌉ =
⌊1.91⌉ = 2. This point is inside the sphere, which can be verified using the
condition (3), and the algorithm moves to the next level. The same happens at
the second and third levels after which a lattice point is found inside the sphere
with coordinates x = (2, 6, 4, 2)T . This point is also referred to as the Babai
point. Now the radius of the sphere is replaced by the distance between this
21
point and y′ in the lattice, since the closest point cannot be further away from
y′ than the one that was already found. The new value of the squared radius
C0 is 8.25.
Next, the algorithm jumps to the previous level with x2 = 5. However, all of
the points x = (x1, 5, 4, 2)
T are outside the sphere and the algorithm moves up
again to x3 = 5. This is inside, but at the next level the points x = (x1, 5, 5, 2)
T
are again outside the sphere. The next value for x3 is 3 and the algorithm
proceeds all the way to the level x1, where another lattice point is found at
x = (2, 7, 3, 2)T . The radius of the sphere is again updated to 2.9, and the
algorithm moves up in the tree. All of the following points at each level are
then outside the sphere and the algorithm moves up and finally stops. The
point which was found to be the closest one to y′ is xˆ = (2, 7, 3, 2)T , which is
exactly the transmitted point.
22
5 Complexity of Sphere Decoding
In e.g. [1], [6], [9], [10], [13] and [14] several factors that may affect the complex-
ity of the sphere decoder have been suggested. In this and the next chapter the
complexity of sphere decoding is studied using results from computer simula-
tions. The complexity comparisons are necessarily statistical in nature. In most
cases the average complexity is relevant (where the complexity is averaged over
several channel realizations and several choices of a transmitted lattice point).
In some cases the high complexity tail is more informative.
In the simulations the two block codes from Section 3.1 are used as code-
books. The number of nodes in the search tree is used as a measure of complex-
ity, so that the implementation details or the environment where the simulation
runs do not affect it.
5.1 System Model
In this section we describe the system model that was used in the simulations.
The system takes the SNR, the numbers of transmit and receive antennasm and
n, the initial squared radius C0 for the sphere decoder, and the size of the signal
set q as input parameters. The basis matricesX1, ...,Xk, whereXi ∈Mm×l(C),
of the code to be used for transmitting are read from a file that is also given as
an input parameter.
The transmission is simulated by first generating a random message a =
(a1, ..., ak) to be sent, where the components are from the PAM signal set ai =
{2ci − q + 1|ci ∈ Zq}. Then the channel matrix H ∈ Mn×m(C) and the noise
matrix N ∈ Mn×l(C) are generated. The elements ηi,j of N are i.i.d. complex
Gaussian random variables with zero mean and variance σ2 = 1 per dimension.
The elements hi,j of H are also i.i.d. complex Gaussian random variables with
zero mean and variance τ2 = SNR lPL per dimension, where PL is the average
energy of a codeword from the used constellation. The received matrix Y ∈
Mn×l(C) is computed by
Y = H
k∑
i=1
aiXi +N.
Next the system simulates the receiver by finding the closest point to the
received one using knowledge of the channel state. A translation and scaling is
applied to the received matrix so that instead of the coefficients ai we have the
coefficients ci ∈ Zq in order to use the sphere decoder. The matrix Y is then
turned into a real valued vector y′ ∈ R2ln by converting each row yi to real
valued vector y′i and then combining these together:
y′i = (ℜ(yi,1),ℑ(yi,1),ℜ(yi,2),ℑ(yi,2), . . . ,ℜ(yi,l),ℑ(yi,l)) for i = 1, ..., n
23
and
y′ =
[
y′1 y
′
2 · · · y′n
]
.
The matrices HX1, ...,HXk ∈ Mn×l(C) are computed and turned into
vectors bi ∈ R2ln in the same way as the received matrix. Then the QR-
decomposition is computed for the matrix B =
[
bT1 b
T
2 · · · bTk
]
and R, y′
and the initial squared radius C0 are given to the sphere decoding algorithm
as input. The sphere decoder then returns the closest point x, from which the
transmitted point a can be obtained by computing ai = 2xi−q+1 for i = 1, ..., k.
The system was implemented in C++ and the simulations were run on a
Windows XP Professional environment with a 1.83 GHz dual-core processor
and 2048MB RAM. In each simulation at least 1 000 000 channel realizations
were created, unless otherwise stated. In the cases where the performances of
different algorithms were being compared, the same channel realizations were
used for both algorithms.
5.2 Choosing the Radius
The selection of the initial radius affects the performance of the sphere decoding
algorithm. Too small an initial radius causes no lattice points to be found inside
the sphere and the sphere decoder must start the search again with an increased
radius. On the other hand, too large a radius may result in an unnecessarily
large search tree.
Algorithm II is in general not very sensitive to a large initial radius. This
is because in most cases it finds the Babai point right away and updates the
radius accordingly so that the large initial radius is not a problem. However,
with finite lattices the Babai point may be outside the constellation, in which
case the algorithm does not find it and may have to search for a long time before
finding a valid point inside the sphere.
Figures 7(a) and 7(b) show the average complexity of Algorithm II as a
function of C0 in systems using the lattice codes LQNF and LNF . We observe
that with LQNF after a certain point a larger initial radius does not have much
of an effect on the complexity. A system using the code LNF on the contrary is
much more sensitive to the selection of the initial radius, especially in the low
SNR range.
24
2 4 6 8 10 12 14 16
15
20
25
30
35
40
45
Algorithm II with LQNF, q=8, n=1
Initial squared radius C0
Av
er
ag
e 
co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
SNR=19 dB
SNR=25 dB
(a) The code LQNF is used for transmission
2 4 6 8 10 12 14 16
15
20
25
30
35
40
45
Algorithm II with LNF, q=8, n=1
Initial squared radius C0
Av
er
ag
e 
co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
SNR=25 dB
SNR=28 dB
SNR=31 dB
(b) The code LNF is used for transmission
Figure 7: The average complexity of Algorithm II as a function of the initial
squared radius C0 in a system with one receive antenna and q = 8.
25
5.3 SNR and Complexity
A high SNR typically reduces the complexity of decoding. When the SNR in-
creases, the influence of the noise gets smaller and the received point is expected
to be nearer to a lattice point. It also means that the Babai point is more often
the closest point. As the Babai point is the first lattice point that the sphere
decoder finds, the algorithm finds the solution very quickly. Figure 8 shows
the average complexity plotted against SNR for the code LNF . It can be seen
that as the SNR increases, the average complexity tends to 15. With an 8-level
search tree, 15 is the minimum number points the algorithm needs to visit. It
means that the sphere decoder goes down the search tree, finds a lattice point
and then comes straight up without any zigzagging. We also notice, that the
sphere decoder performs better with the lattice code LQNF than with LNF . We
study the reasons for this in Chapter 6.
22 24 26 28 30 32 34
10
15
20
25
30
35
40
45
50
55
60
Algorithm II, q=8, n=1, C0=6
SNR (dB)
Av
er
ag
e 
co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
LQNF
LNF
Figure 8: The average complexity of Algorithm II as a function of the SNR,
using the codes LQNF and LNF
The complexity depends also on the data rate, i.e., the choice of q. A greater
size of the signal set means that the codebook is larger and also the search tree
26
will be larger, since on each level of the tree there will be more valid options
for the coefficients. This effect is visible in Figure 9, which presents the average
complexity against SNR when different data rates are used.
5 10 15 20 25
10
15
20
25
30
35
40
Algorithm II with LQNF, q=8, n=1, C0=6
SNR (dB)
Av
er
ag
e 
co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
q=8
q=4
q=2
Figure 9: The average complexity of Algorithm II as a function of the SNR,
using the code LQNF and different data rates in the transmission
5.4 Preprocessing and Ordering
Preprocessing and the ordering in which the components of x are considered
has a significant effect on the performance of the sphere decoder. The standard
preprocessing and ordering consists of the QR-decomposition and the natural
back-substitution ordering xk, xk−1, . . . , x1. But this is not necessarily the op-
timal ordering.
One example of a preprocessing approach is the vertical Bell Labs layered
space–time (V-BLAST) optimal decision ordering [7]. Its purpose is to maximize
the minimum value of the R-matrix’s diagonal elements, since a large value of
ri,i gives a short search interval on the corresponding level of the search tree.
Ordering the diagonal elements so that rk,k is the largest and r1,1 the smallest
27
value also reduces the effect of error propagation. Since the sphere decoder
starts from the level k, selecting wrong values in the beginning can have an
adverse effect on the complexity.
The lattice code LQNF does not need this kind of preprocessing, because as
we will see in Section 7.2, the default ordering gives its R-matrix an advanta-
geous structure, which allows for the use of a faster, modified algorithm.
5.5 Error Probability
Since the sphere decoder finds the exact maximum-likelihood solution, the error
rates show exactly how often the closest lattice point detected at the receiver
has not been the point that was sent. The block error rate (BLER) and the
bit error rate (BER) indicate how often there have been symbol errors and
correspondingly bit errors in the detected message. In the simulations Gray
encoding has been used for the bit encoding.
The error rates naturally decrease as the SNR increases and the effect of
noise becomes smaller. Figures 10(a) and 10(b) display a comparison of the
block and bit error rates for the codes LQNF and LNF with different values of
SNR. It can be seen that the code LQNF performs better than the code LNF
in the sense of error probability.
Other effects are more subtle. Some lattices are more adversely affected by
certain anomalous channel realizations than others. In such cases the lattice is
severely skewed and high complexity comes jointly with high error probability.
This scenario will be explained in more detail in Chapter 6.
28
19 20 21 22 23 24 25 26 27 28
10−3
10−2
10−1
100
Block error rate, q=8, n=1
SNR (dB)
BL
ER
LNF
LQNF
(a) Block error rates for LQNF and LNF
19 20 21 22 23 24 25 26 27 28
10−4
10−3
10−2
10−1
Bit error rate, q=8, n=1
SNR (dB)
BE
R
LNF
LQNF
(b) Bit error rates for LQNF and LNF
Figure 10: Error rates for LQNF and LNF
29
6 Collapsing Lattices
In this section only the multiple-input single-output (MISO) channel setting
with a single receive antenna is considered. In the MISO case the channel
matrix H is a complex vector, which we here denote by h.
6.1 The Defect
Several different 8-dimensional lattice codes of 4×4 complex matrices have been
designed, but they all have the property that for some non-zero channel vector
h the dimension of the lattice hL is smaller than the dimension of L. In the
following we call this event the collapsing of the lattice. The closely related
notion of a lattice’s defect was presented in the article [10]:
Definition 6.1. Matrix lattice L has defect r, if its rank is m but the minimum
positive real dimension of the span of hL is m − r. In other words, the lattice
collapses by dimension r.
A lattice hL collapses if and only if there exists a vector h ∈ Cn\{0} such
that the set {hX1, . . . ,hXk} is linearly dependent over R, where X1, . . . ,Xk
are the basis matrices of L.
Let us consider next the lattice code LNF that was introduced in Section
3.1. The code matrices are of the form
MNF (c1, c2, c3, c4) =


c1 ic4 ic3 ic2
c2 c1 ic4 ic3
c3 c2 c1 ic4
c4 c3 c2 c1

 c1, c2, c3, c4 ∈ Z[i].
Let the basis matrices X1, . . . ,X8 be as defined in Section 3.1. Now LNF =
ZX1 ⊕ ZX2 ⊕ . . .⊕ ZX8 and it has dimension 8.
Let us define hj = (1, ζ
j , ζ2j , ζ3j) for j ∈ {1, 5, 9, 13}, where ζ = e2pii/16.
The vectors hj are the eigenvectors of all of the matrices of the lattice LNF , as
shown below. Note that ζ4 = i and ζ4j = ij = i.
30
hjMNF (c1, c2, c3, c4) = (1, ζ
j , ζ2j , ζ3j)


c1 ic4 ic3 ic2
c2 c1 ic4 ic3
c3 c2 c1 ic4
c4 c3 c2 c1


= (c1 + ζ
jc2 + ζ
2jc3 + ζ
3jc4︸ ︷︷ ︸
=λj
, . . . , ic2 + ζ
jic3 + ζ
2jic4 + ζ
3jc1)
= (λj , . . . , ζ
4jc2 + ζ
5jc3 + ζ
6jc4 + ζ
3jc1)
= (λj , . . . , ζ
3j(c1 + ζ
jc2 + ζ
2jc3 + ζ
3jc4))
= (λj , ζ
jλj , ζ
2jλj , ζ
3jλj)
= λj (1, ζ
j , ζ2j , ζ3j)
= λjhj
The vectors {h1, h5, h9, h13} form a basis of C4. The channel vector h can
be written in terms of the basis vectors: h = α1h1+α5h5+α9h9+α13h13, where
αj ∈ C. Now, suppose that the channel vector h is a scalar multiple of only one
of the hj . Then only two of the vectors hX1, . . . ,hX8 are linearly independent
over R, and the lattice hLNF has real dimension 2. Thus, the lattice LNF has
defect 6.
Theorem 6.1. With the lattice code LNF the image lattice hLNF collapses
⇔ αj = 0 for some j ∈ {1, 5, 9, 13}.
Proof. The basis vectors of the lattice hLNF are
hXk =
∑
j
αjhjXk
=
∑
j
αjλ(Xk, j)hj
where λ(Xk, j) is the scalar in hjXk = λ(Xk, j)hj . If αj = 0 for some j, say
α9 = 0, then ∀k = 1, . . . , 8
hXk = β1h1 + β5h5 + β13h13
∈ Ch1 + Ch5 + Ch13 , V9ˆ
where dimCV9ˆ= 3 and dimRV9ˆ= 6.
Conversely, assume that αj 6= 0 ∀j ∈ {1, 5, 9, 13}. Since hX2k = ihX2k−1
for k = 1, . . . , 4, it suffices to check that the vectors hX2k−1, k = 1, . . . , 4 are
linearly independent. For them λ(X2k−1, j) = ζ(k−1)j .
hX2k−1 =
∑
j∈{1,5,9,13}
αjζ
(k−1)jhj , k = 1, . . . , 4.
31
The determinant∣∣∣∣∣∣∣∣∣∣
α1 α1ζ α1ζ
2 α1ζ
3
α5 α5ζ
5 α5ζ
10 α5ζ
15
α9 α9ζ
9 α9ζ
18 α9ζ
27
α13 α13ζ
13 α13ζ
26 α13ζ
39
∣∣∣∣∣∣∣∣∣∣
= α1α5α9α13|VM | 6= 0
where |VM | is the determinant of a Van der Monde matrix, so the vectors
hX2k−1, k = 1, . . . , 4, are linearly independent.
Let us consider now the code LQNF , which was also introduced in Section
3.1. The code matrices are of the form
MQNF (c1, c2, c3, c4) =


c1 ic2 −c∗3 −c∗4
c2 c1 ic
∗
4 −c∗3
c3 ic4 c
∗
1 c
∗
2
c4 c3 −ic∗2 c∗1

 c1, c2, c3, c4 ∈ Z[i].
The channel vector h can be written in terms of vectors vi, (i = 1, . . . , 4),
which form a basis of C4:
h = h+ + h− = h1v1 + h2v2︸ ︷︷ ︸
h+
+h3v3 + h4v4︸ ︷︷ ︸
h−
where v1 = (1, ξ, 0, 0) , v2 = (0, 0, 1, ξ), v3 = (1,−ξ, 0, 0), v4 = (0, 0, 1,−ξ),
hj ∈ C (j = 1, . . . 4) and ξ = e2pii/8. We denote by V+ = LC(v1, v2) the
vector space generated by v1 and v2 and by V− = LC(v3, v4) the vector space
generated by v3 and v4.
Now
v1MQNF (c1, c2, c3, c4) = (c1 + ξc2)v1 − (c3 + ξc4)∗v2,
v2MQNF (c1, c2, c3, c4) = (c3 + ξc4)v1 + (c1 + ξc2)
∗v2,
v3MQNF (c1, c2, c3, c4) = (c1 − ξc2)v3 − (c3 − ξc4)∗v4 and
v4MQNF (c1, c2, c3, c4) = (c3 − ξc4)v3 + (c1 − ξc2)∗v4.
Theorem 6.2. With the lattice code LQNF the image lattice hLQNF collapses
⇔ h+ = 0 or h− = 0
Proof. Consider the vector space
V = {hMQNF (z1, z2, z3, z4) | zi ∈ C}.
First we show that if h+ 6= 0 and h− 6= 0 then dimR V = 8.
32
Now if z1 = ξz2 and z3 = ξz4 then v3MQNF (z1, z2, z3, z4) = 0 and
v4MQNF (z1, z2, z3, z4) = 0, and the dimension of V depends only on h+ and
not on h−.
Notice that ξ = e2pii/8 = 1+i√
2
, ξ2 = i and ξ∗ = −iξ.
i) If z1 =
1
2 , z2 =
1
2ξ
∗, z3 = z4 = 0 then v1MQNF = v1 and v2MQNF = v2
ii) If z1 =
i
2 , z2 =
1
2ξ, z3 = z4 = 0 then v1MQNF = iv1 and v2MQNF = −iv2
iii) If z1 = z2 = 0, z3 = − 12 , z4 = − 12ξ∗ then v1MQNF = v2 and v2MQNF =
−v1
iv) If z1 = z2 = 0, z3 =
i
2 , z4 =
1
2ξ then v1MQNF = iv2 and v2MQNF = iv1
From these we see that the four h+MQNF vectors below are all in V .
i) ⇒ h+MQNF = h1v1 + h2v2 ∈ V
ii) ⇒ h+MQNF = ih1v1 − ih2v2 ∈ V
iii) ⇒ h+MQNF = −h2v1 + h1v2 ∈ V
iv) ⇒ h+MQNF = ih2v1 + ih1v2 ∈ V
All of these belong to V+ and by showing that these four vectors are linearly
independent over R we prove that the whole V+ is included in V . Let h1 =
x1+ y1i and h2 = x2+ y2i where xj , yj ∈ R for j = 1, 2. In terms of the R-basis
{v1, iv1,v2, iv2} of V+, these vectors get the form
(x1, y1, x2, y2) ∈ V
(−y1, x1, y2,−x2) ∈ V
(−x2,−y2, x1, y1) ∈ V
(−y2, x2,−y1, x1) ∈ V
By looking at the determinant of the matrix
A =


x1 y1 x2 y2
−y1 x1 y2 −x2
−x2 −y2 x1 y1
−y2 x2 −y1 x1


we see that if h1 6= 0 or h2 6= 0 the vectors are linearly independent. Since
AAT = (x21 + y
2
1 + x
2
2 + y
2
2)I4 it follows that detA 6= 0 if h1 6= 0 or h2 6= 0.
Hence, if h+ 6= 0 then V+ ⊆ V .
33
Similarly by considering the code matrices for which z1 = −ξz2 and z3 =
−ξz4 we see that if h− 6= 0 then V− ⊆ V .
Now V+ ⊕ V− ⊆ V ⊆ LC(v1,v2,v3,v4). Since V+ ⊕ V− = LC(v1,v2,v3,v4)
it follows that V = LC(v1,v2,v3,v4) and hence dimR V = 8 if h+ 6= 0 and
h− 6= 0.
Conversely, if h+ = 0 or h− = 0 then the resulting vector space hMQNF
only has terms of v1 and v2 or terms of v3 and v4 and thus has real dimension
4. This means that LQNF has defect 4.
6.2 Collapsing Lattices and the Decoding Complexity
If the vectors {hX1, . . . ,hXk} are orthogonal then the closest lattice point is the
Babai point, which the sphere decoder finds right away without any zigzagging.
It is natural to assume that the decoding complexity would be significantly
higher if the lattice collapses or comes near to collapsing. In this section we
study the effect of the collapsing lattice on the decoding performance using
computer simulations.
When the lattice collapses or is near collapsing, it causes the diagonal el-
ements of R to be very small. This can be verified from Figure 11(a), which
shows the correlation between the minimum diagonal element of R and the
length of the shortest component αjhj of h for the lattice code LNF . Figure
13(a) shows the same for the lattice code LQNF , where h has components h+
and h−.
A small diagonal element of R means that the corresponding interval for the
sphere decoder on that level is large, which makes the search tree also large. It
can be seen from Figures 11(b) and 13(b), that a small diagonal element very
often corresponds to a high complexity.
Figures 12(a) and 12(b) show the complexity distributions of 10,000 trans-
missions for the lattice code LNF with different SNRs. The horizontal axis
indicates how close to collapsing the image lattice hL has been. Figures 14(a)
and 14(b) show the same for the lattice code LQNF . For both LNF and LQNF ,
the figures show that the smaller the minimum component of h, the higher the
complexity. We also notice that the lattice code LNF comes near to collapsing
more often than LQNF . This can be explained by the higher defect of LNF .
34
0 1 2 3 4 5 6 7
0
1
2
3
4
5
6
7
LNF, SNR=24dB, q=8
min(|αjhj|2)
m
in
(r i,
i)
(a) Minimum diagonal element of R vs. minimum |αjhj |
2
0 1 2 3 4 5 6 7
0
500
1000
1500
2000
2500
3000
3500
4000
LNF: complexity vs. defect, SNR=24dB, q=8
min(ri,i)
Co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
(b) Complexity vs. minimum diagonal element of R
Figure 11: Distributions of 10,000 transmissions using the code LNF
35
0 1 2 3 4 5 6 7
0
500
1000
1500
2000
2500
3000
3500
4000
LNF: complexity vs. defect, SNR=24dB, q=8
min(|αj hj|2)
Co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
(a) Complexity vs. defect with LNF and an SNR of 24dB
0 5 10 15 20 25 30 35
0
200
400
600
800
1000
1200
1400
LNF: complexity vs. defect, SNR=30dB, q=8
min(|αjhj|2)
Co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
(b) Complexity vs. defect with LNF and an SNR of 30dB
Figure 12: Complexity distributions of 10,000 transmissions using the code LNF
and different SNRs
36
0 1 2 3 4 5 6 7
0
0.5
1
1.5
2
2.5
3
3.5
4
LQNF, SNR=19dB, q=8
min(|H+|2,|H−|2)
m
in
(r i,
i)
(a) Minimum diagonal element of R vs. min{|h+|2,h−|2} with LQNF and an
SNR of 19dB
0 0.5 1 1.5 2 2.5 3 3.5 4
0
200
400
600
800
1000
1200
LQNF: complexity vs. min(ri,i), SNR=19dB, q=8
min(ri,i)
Co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
(b) Complexity vs. minimum diagonal element of R with LQNF and an SNR of
19dB
Figure 13: Distributions of 10,000 transmissions using the code LQNF
37
0 1 2 3 4 5 6 7
0
200
400
600
800
1000
1200
LQNF: complexity vs. defect, SNR=19dB, q=8
min(|H+|2,|H−|2)
Co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
(a) Complexity vs. defect with LQNF and an SNR of 19dB
0 5 10 15 20 25
0
50
100
150
200
250
LQNF: complexity vs. defect, SNR=25dB, q=8
min(|H+|2,|H−|2)
Co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
(b) Complexity vs. defect with LQNF and an SNR of 25dB
Figure 14: Complexity distributions of 10,000 transmissions using the code
LQNF and different SNRs
38
7 Improving the Performance of Sphere Decod-
ing
In this chapter we look for ways to reduce the complexity of sphere decoding by
two methods: first by modifying the algorithm and second, by taking advantage
of the properties of the lattice.
7.1 Adding Constraints
The fact that we are using finite code sets gives us an opportunity to improve
the algorithm. The idea is to avoid going through points that are outside the
signal set. The original algorithm picks the midpoint of the interval on each
level as the starting point and zigzags its way from there until it finds a valid
point or is outside the sphere. If the midpoint is far below or above the signal set
[0, Q− 1], the algorithm may have to go through a lot of points before reaching
the signal set values. As we know that any points outside the signal set cannot
be a part of the solution, we can improve the algorithm by jumping straight to
the first possible value on each level. If the midpoint of the interval is smaller
than 0, we can pick 0 as the starting point, as it is the first value belonging to
the signal set that the algorithm would come across. Correspondingly, if the
midpoint is greater than Q− 1, we can pick Q− 1 as the starting point.
Another case where Algorithm II may go through unnecessarily many points
in the search tree, is when it has already processed all of the values that belong
to the signal set on that level, but still has not gone outside the radius. After
that point all of the following values that are considered cannot belong to the
solution, since they are outside the signal set. We can now add another check
to the algorithm: if all of the values from 0 to Q − 1 have been processed, we
can finish on that level and move up a level.
Example 7.1. Let us consider only the first level of the search tree and assume
that the corresponding diagonal element of the R matrix rn,n = 0.1, the received
point y′n = 1.3, signal set size Q = 8 and the squared radius C
′
0 = 5. The interval
for Algorithm II on this level has a lower boundary An = ⌈(y′n −
√
C ′0)/rn,n⌉ =
−9 and an upper boundary Bn = ⌊(y′n +
√
C ′0)/rn,n⌋ = 35. Algorithm II starts
zigzagging from the middle of the interval xn := ⌊y′n/rn,n⌉ = 13 and continues
until it has gone through all of the values in the interval. Since the signal set
is [0, 7] the algorithm goes through points 13, 14, 12, 15, 11, 16, 10, 17, 9, 18,
8, 19 before reaching 7 as the first value belonging to the signal set. By adding
the first check described above to the algorithm we could leave out the points
in between and jump straight from 13 to 7 thus saving time. After 7 the next
values assigned to xn are 20, 6, 21, 5, 22, 4, 23, 3, 24, 2, 25, 1, 26, 0 and 27.
39
At xn := 27, all of the values of the signal set have been processed, but xn is
still not outside the radius. The first value outside the radius is 36. By adding
the second check to the algorithm we could now stop processing on this level
without going through the rest of the points in the interval.
The amount of these unnecessary points in the search tree depends greatly
on the R matrix and its diagonal elements. The smaller the diagonal element,
the longer the interval that Algorithm II has to go through on that level.
In practice the first modification is added to step 2 of the algorithm, right
after the part where the value for xi is calculated. The second modification is
added to step 3 after checking whether xi belongs to the signal set. There is
also a flow chart of the modified algorithm on page 41.
Modified algorithm (Input C ′0, y
′, R, Output xˆ):
Step 1 (Initialization) Set i := n, Tn := 0, ξn := 0, and dc := C
′
0 (current
sphere squared radius).
Step 2 (DFE on xi) Calculate xtemp := ⌊(y′i − ξi)/ri,i⌉.
If xtemp < 0 set xi := 0 and ∆i := 1.
Else if xtemp > Q− 1 set xi := Q− 1 and ∆i := −1.
Else set xi := xtemp and ∆i := sign(y
′
i − ξi − ri,ixi).
Step 3 (Main step) If dc < Ti + |y′i − ξi − ri,ixi|2, then go to Step 4 (i.e., we
are outside the sphere).
Else if xi /∈ [0, Q− 1] (i.e., we are inside the sphere but outside the signal
set boundaries) then {Set xnext := xi+∆i. If (xi < 0 and xnext > Q− 1)
or (xi > Q− 1 and xnext < 0) then go to Step 4, otherwise go to Step 6 }.
Else (i.e., we are inside the sphere and signal set boundaries) if i > 1, then
{let ξi−1 :=
∑n
j=1 ri−1,jxj , Ti−1 := Ti + |y′i − ξi − ri,ixi|2, i := i− 1, and
go to Step 2}.
Else (i = 1) go to Step 5.
Step 4 If i = n, terminate, else set i := i+ 1 and go to Step 6.
Step 5 (A valid point is found) Let dc := T1 + |y′1 − ξ1 − r1,1x1|2, save xˆ := x.
Then, let i := i+ 1 and go to Step 6.
Step 6 (Schnorr–Euchner enumeration of level i) Let xi := xi + ∆i, ∆i :=
−∆i − sign(∆i), and go to Step 3.

40
Initialization:
i= n, T = 0,
= 0, d = C
i
i Cx
Set xi at the midpoint of the interval
and calculate :
x = (y - )/r
= sign(y - - r x )
D
ù
D
i
i i i i,i
i i i,i i
ë x
xi
Is the distance of
(x ,...,x ) from
(y ,...,y ) greater
than the radius?
i n
i n
Is d smaller than
T +
C
i |y - - r x | ?i i i,i ix
2
Is i = n?
Yes
Is x in [0,Q-1]?i
No
Is i > 1?
Yes
Return to
the previous
level:
i = i + 1
Select the next value for x :
= +
i
x x
= - -sign( )
i i i
i i i
D
D D D
No
Yes
A valid coordinate was
found, continue to the
next level:
= r x , where j=1,...,n
T = T + |y - i - r x |
i = i -1
x S
x
i-1 i-1,j j
i-1 i i i,i i
2
Yes
A valid point was found,
save x' = x, update the
radius and move back to
the previous level:
d < T + |y - - r x |
i = i + 1
C 1 1 1 1,1 1x
2
No
Output x' if a valid
point was found
Input:
vector
upper diagonal matrix R
signal set size Q
squared radius C
y
No
Is x < 0?i
Is x > Q-1?i
No
x = 0
= 1
i
Di
x = Q-1
= -1
i
Di
No
Yes
Yes
Is <0
and Q-1
or
i >Q-1 and
?
x
x +
s x
x +
i
i i
i
i i
D >
D <0
No
Yes
Figure 15: Flow chart of the modified sphere decoding algorithm
41
Figures 16(a) and 16(b) show a comparison of Algorithm II and the modified
algorithm using the codes LQNF and LNF respectively. The complexity is
measured in average number of points visited by the algorithm. It can be
seen from the charts that the modifications indeed offer a slight reduction in
complexity for both codes.
The reduction is more significant in smaller SNRs. This is a natural result,
because as the SNR gets smaller, the influence of the noise grows and both the
probability of error and the complexity increase. Then also the size of the search
tree grows and it is more common that there are points in the search tree that
do not belong to the signal set.
Figures 17(a) and 17(b) show the same comparisons measured in average
CPU time used by the algorithms. It can be seen that the reduction in com-
plexity is also visible in terms of CPU time. This means that even though the
additional checks in the modified algorithm increase its complexity somewhat,
the reduction they offer in the number of visited points more than makes up for
the overhead.
42
17 18 19 20 21 22 23 24 25 26 27 28
15
20
25
30
35
40
LQNF, q=8, n=1, C0=6
SNR (dB)
Av
er
ag
e 
co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
Algorithm II
Modified algorithm
(a) The code LQNF is used for transmission
20 22 24 26 28 30 32
20
30
40
50
60
70
80
LNF, q=8, n=1, C0=6
SNR (dB)
Av
er
ag
e 
co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
Algorithm II
Modified algorithm
(b) The code LNF is used for transmission
Figure 16: The average complexity of Algorithm II compared with the modified
algorithm in a system with one receive antenna and q = 8. Complexity is
measured in average number of points in the search tree.
43
16 18 20 22 24 26 28
4.95
5
5.05
5.1
5.15
5.2
5.25
LQNF, q=8, n=1, C0=6
SNR (dB)
lo
g 8
(A
ve
rag
e C
PU
 tim
e)
Algorithm II
Modified algorithm
(a) The code LQNF is used for transmission
20 22 24 26 28 30 32
1.65
1.7
1.75
1.8
1.85
1.9
1.95
2
LNF, q=8, n=1, C0=6
SNR (dB)
lo
g 8
(A
ve
rag
e C
PU
 tim
e)
Algorithm II
Modified algorithm
(b) The code LNF is used for transmission
Figure 17: The average complexity of Algorithm II compared with the modified
algorithm in a system with one receive antenna and q = 8. Complexity is
measured in average CPU time used.
44
7.2 Distributed Search
Recently, new sphere decoding algorithms with lower decoding complexity have
been developed, that take advantage of the zeros in the R-matrix; see e.g. the
articles [4] and [3]. In the following we introduce a new approach to sphere
decoding with a similar idea to utilize the zeros in the R-matrix. It is enabled
by the structure of the lattice code LQNF .
At the receiver we compute the complex matrices HX1,HX2, . . . ,HX8 and
turn them into a real matrix B as described in Section 5.1. When the code
LQNF is used for encoding and there is only one receive antenna, we have
H = h1 = (h1, h2, h3, h4)
and the matrix B is of the form:

ℜ(h1) −ℑ(h1) ℜ(h2) −ℑ(h2) ℜ(h3) −ℑ(h3) ℜ(h4) −ℑ(h4)
ℑ(h1) ℜ(h1) ℑ(h2) ℜ(h2) ℑ(h3) ℜ(h3) ℑ(h4) ℜ(h4)
ℜ(h2) −ℑ(h2) −ℑ(h1) −ℜ(h1) ℜ(h4) −ℑ(h4) −ℑ(h3) −ℜ(h3)
ℑ(h2) ℜ(h2) ℜ(h1) −ℑ(h1) ℑ(h4) ℜ(h4) ℜ(h3) −ℑ(h3)
ℜ(h3) ℑ(h3) ℑ(h4) −ℜ(h4) −ℜ(h1) −ℑ(h1) −ℑ(h2) ℜ(h2)
ℑ(h3) −ℜ(h3) −ℜ(h4) −ℑ(h4) −ℑ(h1) ℜ(h1) ℜ(h2) ℑ(h2)
ℜ(h4) ℑ(h4) ℜ(h3) ℑ(h3) −ℜ(h2) −ℑ(h2) −ℜ(h1) −ℑ(h1)
ℑ(h4) −ℜ(h4) ℑ(h3) −ℜ(h3) −ℑ(h2) ℜ(h2) −ℑ(h1) ℜ(h1)


Let us denote the above matrix relating to row vector h1 as Bh1 . If there
are n receive antennas, then
H =


h1
...
hn


and B is a 8n× 8 matrix of the form
B =


Bh1
...
Bhn

 .
The QR-decomposition is then computed for B and the R-matrix is given
as input to the sphere decoder. Let us now take a look at the properties of the
R-matrix.
45
Theorem 7.1. For the code LQNF , the R-matrix is of the form:
R =


r1,1 r1,2 r1,3 r1,4 0 0 0 0
0 r2,2 r2,3 r2,4 0 0 0 0
0 0 r3,3 r3,4 0 0 0 0
0 0 0 r4,4 0 0 0 0
0 0 0 0 r5,5 r5,6 r5,7 r5,8
0 0 0 0 0 r6,6 r6,7 r6,8
0 0 0 0 0 0 r7,7 r7,8
0 0 0 0 0 0 0 r8,8


.
Proof. We prove this by showing that the Gram–Schmidt procedure introduced
in Chapter 2.3 gives the R-matrix the described form, when applied to matrix
B. Let RGS denote the R-matrix obtained by the Gram–Schmidt procedure
from B.
RGS =


‖u1‖ 〈b2, e1〉 〈b3, e1〉 . . . 〈bn, e1〉
0 ‖u2‖ 〈b3, e2〉 . . . 〈bn, e2〉
0 0 ‖u3‖
...
...
...
. . . 〈bn, en−1〉
0 0 . . . 0 ‖un‖


Now we prove that the upper right corner of RGS is all zeros. Note that the
matrix B has the properties that the first four column vectors are all orthogonal
with the last four column vectors, i.e. 〈bi,bj〉 = 0 for all i ∈ [1, 4] and j ∈ [5, 8].
Since
〈bj , ei〉 = 〈bj , ui‖ui‖〉 =
1
‖ui‖〈bj ,ui〉,
it suffices to show that 〈bj ,ui〉 = 0 for all i ∈ [1, 4] and j ∈ [5, 8]:
〈bj ,u1〉 = 〈bj ,b1〉 = 0,
〈bj ,u2〉 = 〈bj ,b2 − φ2,1u1〉 = 〈bj ,b2〉︸ ︷︷ ︸
=0
−φ2,1 〈bj ,u1〉︸ ︷︷ ︸
=0
= 0,
〈bj ,u3〉 = 〈bj ,b3 − φ3,2u2 − φ3,1u1〉
= 〈bj ,b3〉︸ ︷︷ ︸
=0
−φ3,2 〈bj ,u2〉︸ ︷︷ ︸
=0
−φ3,1 〈bj ,u1〉︸ ︷︷ ︸
=0
= 0,
〈bj ,u4〉 = 〈bj ,b4 − φ4,3u3 − φ4,2u2 − φ4,1u1〉
= 〈bj ,b4〉︸ ︷︷ ︸
=0
−φ4,3 〈bj ,u3〉︸ ︷︷ ︸
=0
−φ4,2 〈bj ,u2〉︸ ︷︷ ︸
=0
−φ4,1 〈bj ,u1〉︸ ︷︷ ︸
=0
= 0.
46
Since the upper right corner is all zeros, the R-matrix actually has form
R =
[
R1 0
0 R2
]
where R1 and R2 are upper diagonal matrices.
The zeros in the upper right corner mean that in the sphere decoder search
the first four values of xi found by the sphere decoder, where i = 8, . . . , 5, do
not have any effect on what the next four values of xi, where i = 4, . . . , 1,
will be. We can use this fact to divide the search into two parts. We denote
y′1 = (y
′
1, y
′
2, y
′
3, y
′
4)
T , y′2 = (y
′
5, y
′
6, y
′
7, y
′
8)
T , x1 = (x1, x2, x3, x4)
T and x2 =
(x5, x6, x7, x8)
T .
|y′ −Rx|2 ≤ C ′0∣∣∣∣∣
[
y′1
y′2
]
−
[
R1 0
0 R2
][
x1
x2
]∣∣∣∣∣
2
≤ C ′0∣∣∣∣∣
[
y′1
y′2
]
−
[
R1x1
R2x2
]∣∣∣∣∣
2
≤ C ′0
|y′1 −R1x1|2 + |y′2 −R2x2|2 ≤ C ′0
We can now divide the processing into two searches for four symbols instead
of one search for 8 symbols.
Distributed algorithm (Input C ′0, y
′, R, Output xˆ):
Step 1 Obtain R1 and R2 from R.
Step 2 Get x1 by performing Algorithm II with input C
′
0, y
′
1 and R1.
Step 3 Set radius for the second sphere decoder: dC := C
′
0 − |y′1 −R1x1|2
Step 4 Get x2 by performing Algorithm II with input dC , y
′
2 and R2.
Step 5 Return xˆ := [x1,x2].

In the first step R is divided into R1 and R2 as defined above. In the second
step we use the original sphere decoder to obtain the first four elements of x. In
step 3 we calculate a smaller radius for the second sphere decoder by subtracting
the squared distance between x1 and y
′
1 from C
′
0. In step 4 we use the sphere
decoder again to find the last four elements of x.
47
We could also leave out step 3 altogether and use the same squared radius
for both searches. In this case the two searches could be done in parallel.
The average number of points in the search tree for the Distributed algorithm
is calculated as the sum of the points visited by the sphere decoders in steps 2
and 3. As can be seen from the Figures 18(a) and 18(b), the reduction in
complexity is remarkable, especially with lower SNRs. This is due to the fact
that the search is divided into two distinct parts. The original sphere decoder
has an 8-level search tree. This means that every time it finds suitable values
for the first four xi’s, it continues the search to the lower levels of the search
tree. With the Distributed algorithm the first search finds the optimal solution
for the first four xi’s using a 4-level search tree and only after that will the next
search look for the last four values of xi. Also with an 8-level search tree the
minimum amount of points in the search tree is 15, as with two 4-level searches
the minimum is 2 ∗ 7 = 14.
Remark 7.1. The form of the R-matrix depends on the order of the basis
matrices.
Remark 7.2. The code LNF cannot be used together with Distributed algo-
rithm, because its R-matrix does not have zeros in the right upper corner.
48
14 16 18 20 22 24 26
0
10
20
30
40
50
60
70
LQNF, q=8, n=1, C0=6
SNR (dB)
Av
er
ag
e 
co
m
pl
ex
ity
 (#
 vi
sit
ed
 po
int
s)
Algorithm II
Distributed algorithm
(a) Complexity measured in average number of points in the search tree
14 16 18 20 22 24 26
1.55
1.6
1.65
1.7
1.75
1.8
1.85
1.9
1.95
2
LQNF, q=8, n=1, C0=6
SNR (dB)
lo
g 8
(A
ve
rag
e C
PU
 tim
e)
Algorithm II
Distributed algorithm
(b) Complexity measured in average CPU time
Figure 18: The average complexity of Algorithm II compared with that of the
Distributed algorithm in a system with one receive antenna and q = 8.
49
References
[1] E. Agrell, T. Eriksson, A. Vardy and K. Zeger: Closest point search in
lattices. IEEE Transactions on Information Theory, Vol. 48, pp.2201–2214,
August 2002.
[2] S. M. Alamouti: A Simple Transmit Diversity Scheme for Wireless Com-
munications. IEEE Journal on Selected Areas in Communications, Vol. 16,
pp. 1451–1458, October 1998.
[3] L. Azzam and E. Ayanoglu: Reduced Complexity Sphere Decoding for Square
QAM via a New Lattice Representation. arXiv:0705.2435v1 [cs.IT], May
2007.
[4] E. Biglieri, Y. Hong and E. Viterbo: On Fast-Decodable Space–Time Block
Codes. arXiv:0708.2804v1 [cs.IT], August 2007.
[5] J.W.S. Cassels: An Introduction to the Geometry of Numbers. Springer Ver-
lag, 1971.
[6] M.O. Damen, H. El Gamal and G. Caire: On Maximum-Likelihood Detec-
tion and the Search for the Closest Lattice Point. IEEE Transactions on
Information Theory, Vol. 49, pp.2389–2402, October 2003.
[7] G. Foschini, G. Golden, R. Valenzuela and P. Wolniansky: Simplified Pro-
cessing for High Spectral Efficiency Wireless Communication Employing
Multi-Element Arrays IEEE Journal on Selected Areas in Communications,
Vol. 17, pp. 1841–1852, November 1999.
[8] Gene H. Golub and Charles F. Van Loan: Matrix Computations, 3rd ed.
Johns Hopkins, 1996.
[9] B. Hassibi and H. Vikalo: Maximum-likelihood decoding and integer least-
squares: The expected complexity. Multiantenna Channels: Capacity, Coding
and Signal Processing, (editors J. Foschini and S. Verdu), AMS 2003.
[10] C. Hollanti, J. Lahtonen and H.-f. Lu: Maximal Orders in the Design of
Dense Space–Time Lattice Codes. Accepted for publication in IEEE Trans-
actions on Information Theory (2008).
[11] M. Pohst: On the Computation of Lattice Vectors of Minimal Length, Suc-
cessive Minima and Reduced Basis with Applications. ACM SIGSAM, vol.
15, pp. 37–44, 1981.
[12] J. Proakis: Digital Communications, 3rd ed. McGraw–Hill, 1995.
50
[13] V. Tarokh, H. Jafarkhani and A. R. Calderbank: Space–Time Block Codes
from Orthogonal Designs. IEEE Transactions on Communications, Vol. 45,
pp.1456–1467, July 2002.
[14] V. Tarokh, N. Seshadri and A. R. Calderbank: Space–Time Codes for High
Data Rate Wireless Communication: Performance Criterion and Code Con-
struction. IEEE Transactions on Information Theory, Vol. 44, pp.744–765,
July 1998.
51