SPACE–TIME BLOCK CODES AND THE COMPLEXITY OF SPHERE DECODING Miia Ma¨ki Master’s Thesis July 2008 DEPARTMENT OF MATHEMATICS UNIVERSITY OF TURKU FIN-20014 TURKU FINLAND Contents 1 Introduction 1 1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 Lattices 3 2.1 Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Lattices under Linear Transformation . . . . . . . . . . . . . . . 3 2.3 QR-decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.4 Closest Vector Problem . . . . . . . . . . . . . . . . . . . . . . . 5 3 Space–Time Block Codes 6 3.1 Lattice Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2 MIMO Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3 Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.4 Criteria for Code Construction . . . . . . . . . . . . . . . . . . . 10 4 Sphere Decoder 13 4.1 Introduction to Sphere Decoding . . . . . . . . . . . . . . . . . . 13 4.2 The Pohst Enumeration . . . . . . . . . . . . . . . . . . . . . . . 15 4.3 The Schnorr–Euchner Enumeration and the Babai Point . . . . . 16 4.4 The Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 5 Complexity of Sphere Decoding 23 5.1 System Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.2 Choosing the Radius . . . . . . . . . . . . . . . . . . . . . . . . . 24 5.3 SNR and Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 26 5.4 Preprocessing and Ordering . . . . . . . . . . . . . . . . . . . . . 27 5.5 Error Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 6 Collapsing Lattices 30 6.1 The Defect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 6.2 Collapsing Lattices and the Decoding Complexity . . . . . . . . . 34 7 Improving the Performance of Sphere Decoding 39 7.1 Adding Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . 39 7.2 Distributed Search . . . . . . . . . . . . . . . . . . . . . . . . . . 45 References 50 1 Introduction In wireless communications the transmitted signals may be affected by noise. The receiver must decode the received message, which can be mathematically modelled as a search for the closest lattice point to a given vector. This problem is known to be NP-hard in general, but for communications applications there exist algorithms that, for a certain range of system parameters, offer polynomial expected complexity. The purpose of this thesis is to study the sphere decoding algorithm intro- duced in [6] and especially its computational complexity when used in space– time coding. Computer simulations are used to study how different system parameters affect the computational complexity of the algorithm. The aim is to find ways to improve the algorithm from the complexity point of view. The lim- ited battery life of mobile devices provides a motivation for finding algorithms with lower complexity and thus lower power consumption. We start by reviewing the theory of lattices in Chapter 2 for some necessary background knowledge. In Chapter 3 the principles of space–time coding are explained and criteria for designing space–time block codes are introduced. In Chapter 4 the sphere decoding algorithm is discussed in detail. Chapter 5 contains an analysis of the complexity of sphere decoding with computer simulation results. In Chapter 6 the concept of collapsing lattices is introduced and its effect on the complexity of sphere decoding is explained. The main contribution of this thesis is in Chapter 7, where we search for ways to improve the sphere decoding algorithm and develop two new modifications to the algorithm, that are shown to perform faster than the original within a range of system parameters. 1 1.1 Notations The following notation is used throughout this thesis. R Field of real numbers C Field of complex numbers Z Ring of rational integers i √−1 Z[i] Ring of Gaussian integers ω 12 (−1 + i √ 3) Z[ω] Ring of Eisenstein integers ℜ(x) Real part of x ℑ(x) Imaginary part of x x∗ Complex conjugate x Vector ‖ · ‖ Euclidean norm 〈·, ·〉 Inner product on Rn X Matrix XT Transpose X∗ Hermitian conjugate detX Determinant of X In n× n identity matrix E Expectation ⌊·⌉ Rounding to the closest integer 2 2 Lattices In this chapter the basic concepts of lattices are reviewed for later use in this thesis. All of the results are given without proof. For more details see e.g. [5]. 2.1 Bases Letm and n be two positive integers such that n ≤ m. A subset Λ of Rm is called a lattice of dimension n if there exist n linearly independent m-dimensional vectors b1, . . . ,bn ∈ Rm such that Λ = n∑ i=1 Zbi = {r1b1 + . . .+ rnbn|ri ∈ Z, 1 ≤ i ≤ n}. (1) The set of vectors b1, . . . ,bn is called the basis of the lattice Λ, and n is called the rank of Λ. The vectors of a lattice Λ form a group under addition: if a ∈ Λ then −a ∈ Λ; and if a, b ∈ Λ then a± b ∈ Λ. The lattice can also be expressed in matrix form Λ = {x|x = Br} where B is an m × n matrix B = [b1, . . . ,bn] and r is an integer column vector. B is called the basis matrix of Λ. The basis is not uniquely determined by the lattice, but the same lattice can be generated by several different bases. Let b′i be the points b′i = ∑ j vijbj , (1 ≤ i, j ≤ n), where vij are any integers such that det(vij) = ±1. Now bi = ∑ j wijb ′ j with integral wij . It then follows that (1) and the lattice Λ′ = r′1b ′ 1 + . . .+ r ′ nb ′ n, where r′1, . . . , r ′ n run through all integers, are precisely the same set of points, Λ = Λ′. That is, b1, . . . ,bn and b′1, . . . ,b ′ n are bases of the same lattice. Furthermore, every basis b′i of a lattice Λ may be obtained from a given basis bi in this way [5]. 2.2 Lattices under Linear Transformation Let us consider briefly the effect of a non-singular affine transformation x → y = αx of n-dimensional space into itself. Let the transformation y = αx be given by yi = ∑ 1≤j≤n αijxj (1 ≤ i ≤ n), 3 where y = (y1, . . . , yn) and x = (x1, . . . , xn) are corresponding points in the transformation and αij are real numbers such that det(α) = det(αij) 6= 0. Let Λ be a lattice and denote by αΛ the set of points αx, x ∈ Λ. If b1, . . . ,bn is a basis for Λ, then the general lattice point x = r1b1 + . . .+ rnbn (with r1, . . . , rn integers) of Λ maps to αx = α(r1b1 + . . .+ rnbn) = r1αb1 + . . .+ rnαbn. Hence αΛ is a lattice with basis αb1, . . . ,αbn. 2.3 QR-decomposition The basis vectors of a lattice are not necessarily orthogonal. One method for orthogonalizing the basis is the following Gram–Schmidt procedure. However, the resulting basis does not necessarily span the same lattice, since the coeffi- cients are real numbers and not necessarily integers, but the basis vectors span the same vector space as the original basis. Gram–Schmidt orthogonalization: Let bi, . . . ,bn ∈ Rn be a set of lin- early independent vectors. The vectors ui (1 ≤ i ≤ n) are inductively defined by u1 = b1 ui = bi − i−1∑ j=1 φi,juj , where the Gram–Schmidt coefficients φi,j ∈ R are defined by φi,j = 〈bi,uj〉 〈uj ,uj〉 = 〈bi,uj〉 ‖uj‖2 . In particular, φi,i = 1 and φi,j = 0 for j > i. Notice that ui is the projection of bi on the orthogonal complement of ∑i−1 j=1 Rbj , and that∑i−1 j=1 Rbj = ∑i−1 j=1 Ruj , for 1 ≤ i ≤ n. It follows that u1, . . . ,un is an or- thogonal basis of Rn. With the help of the Gram–Schmidt procedure one can obtain the QR- decomposition of the matrix B = [b1, . . . ,bn], where the vectors bi are the columns of the matrix. Let us denote ek = uk ‖uk‖ for k = 1, 2, ..., n. Since φi,juj = 〈bi,uj〉 ‖uj‖2 uj = 〈bi, ej‖uj‖〉 ‖uj‖2 ej‖uj‖ = 〈bi, ej〉ej , 4 by rearranging the equations above we have: b1 = e1 ‖u1‖ , b2 = 〈b2, e1〉e1 + e2 ‖u2‖ , ... bn = n−1∑ j=1 〈bn, ej〉ej + en ‖un‖ . By writing the equations in matrix form we get the QR-decomposition of the matrix B: B = QR = [ e1 e2 . . . en ]   ‖u1‖ 〈b2, e1〉 〈b3, e1〉 . . . 〈bn, e1〉 0 ‖u2‖ 〈b3, e2〉 . . . 〈bn, e2〉 0 0 ‖u3‖ ... ... ... . . . 〈bn, en−1〉 0 0 . . . 0 ‖un‖   . Any m × n matrix A with linearly independent columns can be factorized into a product A = QR, where the columns of Q are orthonormal and R is upper triangular and invertible [8]. 2.4 Closest Vector Problem The closest vector problem (CVP) is the problem of finding, for a given lattice Λ and a given input point y ∈ Rm, the lattice point that is closest to y. More precisely, to find a vector xˆ ∈ Λ such that ‖y − xˆ‖ ≤ ‖y − x‖, for all x ∈ Λ. For orthogonal lattices, such as the n-dimensional integer lattice Zn, the solution is simple. However, in the general case the lattice is not necessarily orthogonal, but skewed. It has been shown [1] that the general closest vec- tor problem as a function of the dimension m is NP-hard. Thus, all known algorithms for solving the problem optimally have exponential complexity. For example, the maximum-likelihood decoding of space–time codes can be reduced to a closest vector search. In the following chapters this will be ex- plained in more detail. 5 3 Space–Time Block Codes Space-–time coding is a method to improve the reliability of data transmission by using multiple antennas. In this chapter the MIMO (multiple-input multiple- output) channel model and two space–time lattice codes are introduced. In later chapters the properties of these two lattice codes are compared with each other. 3.1 Lattice Codes A lattice codeword is a matrix from a constellation L = { k∑ i=1 aiXi|Xi ∈Mm×l(C), ai ∈ S ⊂ Z } , where Xi, (i = 1, . . . , k), are constant basis matrices and ai, (i = 1, . . . , k), are integer variables from some finite signal set S of size q. The information to be transmitted is coded in the coefficients (a1, . . . , ak) ∈ Sk. Throughout the thesis, the terms “lattice code” and “block code” are used interchangeably to refer to a code used in space–time coding. In this thesis we use the pulse amplitude modulation (PAM) signal set of size q, i.e., S = {a = 2u− q + 1|u ∈ Zq} with Zq = {0, 1, . . . , q − 1}. The size of the signal set is normally some power of 2, i.e. q = 2b and b information bits are mapped to each symbol. A commonly used mapping is the Gray encoding [12], where the adjacent symbols differ from each other by only one binary digit (as illustrated in Figure 1). This mapping is preferred, because the most likely errors caused by noise involve the erroneous selection of an adjacent symbol to the one that was transmitted. In this case a symbol error results in only a single bit error in the b-bit sequence. The following two examples of lattice codes were presented in the article [10]. In later chapters these codes will be used in the simulations and their performance with regard to decoding complexity will be analyzed. The block code LNF : MNF (c1, c2, c3, c4) =   c1 ic4 ic3 ic2 c2 c1 ic4 ic3 c3 c2 c1 ic4 c4 c3 c2 c1   6 (a)q=2 (b) q=4 (c) q=8 0 1 01 11 1000 000 001 011 010 110 111 101 100 Figure 1: Gray encoding for PAM signals and the block code LQNF : MQNF (c1, c2, c3, c4) =   c1 ic2 −c∗3 −c∗4 c2 c1 ic ∗ 4 −c∗3 c3 ic4 c ∗ 1 c ∗ 2 c4 c3 −ic∗2 c∗1   , where ci (1 ≤ i ≤ 4) are Gaussian integers; ci ∈ Z[i] = {a+ bi|a, b ∈ Z}. Both of the above block codes have 8 basis matrices Xi, which are X1 = M(1, 0, 0, 0), X2 = M(i, 0, 0, 0), X3 = M(0, 1, 0, 0), X4 = M(0, i, 0, 0), X5 =M(0, 0, 1, 0),X6 =M(0, 0, i, 0),X7 =M(0, 0, 0, 1) andX8 =M(0, 0, 0, i). For example, the codeword M(c1, c2, c3, c4) is generated by M(c1, c2, c3, c4) = 4∑ i=1 (ℜ(ci)X2i−1 + ℑ(ci)X2i) 3.2 MIMO Channel Consider a multiple antenna system with m transmit and n receive antennas. The codewords X are m × l complex matrices from some constellation L ⊂ Mm×l(C), where l is the length of the code, l ≥ m, X =   x1 x2 ... xm   ∈Mm×l(C). The matrix rows xi correspond to the different transmit antennas and the columns to the different time slots. At each time slot t, signals xi,t (i = 1, . . . ,m) are transmitted simultaneously from the m transmit antennas, as depicted in Figure 2. 7 Figure 2: Space–time coding in a quasi-static Rayleigh fading channel In a general fading channel the receiver obtains the signal Y = HX+N =   h1,1 . . . h1,m ... . . . ... hn,1 . . . hn,m     x1,1 . . . x1,l ... . . . ... xm,1 . . . xm,l  +   η1,1 . . . η1,l ... . . . ... ηn,1 . . . ηn,l   where hi,j denotes the path gain from transmit antenna j to receive antenna i and N = (ηi,j) represents additive white Gaussian noise. It is assumed that the antennas are sufficiently spaced such that the fadings are uncorrelated. Here we use the Rayleigh fading channel model, where the path gains hi,j are modelled as samples of independent and identically distributed (i.i.d.) complex Gaussian random variables with zero mean and variance τ2 per dimension, as it is assumed that signals received at different antennas experience independent fading. The fading channel is assumed to be quasi-static, i.e., the path gains are assumed to remain constant for the interval T of transmitting a single codeword after which they change and again remain constant for the next codeword. The noise components ηi,t at receive antenna i at time t are modelled as sam- ples of i.i.d. complex Gaussian random variables with zero mean and variance σ2 per dimension. The component yi,j of the received matrix corresponds to the j-th signal received by antenna i, which is a superposition of the faded versions of all of the m transmitted signals, with additive noise. For 1 ≤ i ≤ n, 1 ≤ j ≤ m yi,j = hi,1x1,j + hi,2x2,j + . . .+ hi,mxm,j + ηi,j . 8 The signal-to-noise ratio (SNR) is the ratio of the expected energy of the transmitted signal to that of the additive noise per receive antenna. SNR = E[P (HX)] E[P (ηr)] , where E[P (ηr)] = E [ l∑ i=1 ηr,iη ∗ r,i ] = 2lσ2 is the expected energy of the noise affecting a single receive antenna r during the transmission of one codeword, and E[P (HX)] = 2τ2PL is the expected energy of the corresponding transmitted signal, where PL denotes the average energy of a codeword X from the constellation L = {∑ki=1 aiXi|ai ∈ S} PL = 1 |L| ∑ a1∈S · · · ∑ ak∈S P ( k∑ i=1 aiXi ) = 1 |L| ∑ X∈L   m∑ i=1 l∑ j=1 |xi,j |2   . The size of the constellation is |L| = |S|k = qk. Now SNR = 2τ2PL 2lσ2 = PL l τ2 σ2 . 3.3 Decoding The receiver can be assumed to have perfect channel state information, i.e., the channel matrix H is known at the receiver. It is usually measured with so-called pilot signals. Then the set of possible messages is HL = { k∑ i=1 aiHXi } ⊆Mn×l(C). The problem of decoding is, given the received matrix Y and the matrices HX1, . . . ,HXk, to find the coefficients a1, . . . , ak for Xˆ = HX = ∑k i=1 aiHXi such that the metric d(Y − Xˆ)2 = n∑ i=1 l∑ j=1 |yi,j − xˆi,j |2 (2) is minimized. This problem can be transformed into the problem of finding the closest lattice point. In the next chapter an algorithm for finding the closest lattice point is presented. 9 The decision metric (2) is based on the maximum likelihood (ML) principle, in which the conditional probability p(Y|H,X) is maximized. In the follow- ing it is assumed that all the codewords from the constellation have an equal probability of being transmitted. p(Y|H,X) = n∏ i=1 l∏ j=1 p(yi,j |(HX)i,j) = n∏ i=1 l∏ j=1 ( 1√ 2piσ2 exp −(yi,j − (HX)i,j)2 2σ2 ) = ( 1√ 2piσ2 )n×l exp −∑ni=1∑lj=1(yi,j − (HX)i,j)2 2σ2 = ( 1√ 2piσ2 )n×l exp −d(Y −HX)2 2σ2 The probability p(Y|H,X) is the probability of receiving Y after the code- wordX has been sent through the channelH. This probability (over all possible codewords) is maximum for the codeword X′ for which the metric d(Y−HX′)2 of the matrices Y and HX′ is minimized. 3.4 Criteria for Code Construction The main priority for code design in space–time coding is the so-called rank criterion [14]. The rank criterion states that the difference of any two distinct codewords should have full rank. If the rank criterion is satisfied, the code is said to have full diversity. In that case HX1 6= HX2 for all channel matrices H 6= 0 and all codewords X1, X2 where X1 6= X2. Another criterion for code construction is the determinant criterion [14]. Let us denote the difference of two codewords by ∆12 = X1 − X2 and let E12 = ∆12∆ ∗ 12. The determinant criterion sets as a design target that the minimum of the determinant of E12 taken over all pairs of distinct codewords X1 and X2 should be maximized. In the article [14] it was shown that for high SNR the probability of making an error between X1 and X2 is inversely proportional to a power of detE12/σ. Furthermore, the rank criterion maximizes that exponent. The code rate is the number of symbols transmitted in one code block divided by the number of time slots needed to send one block. In this context a symbol refers to an element of a 2-dimensional alphabet S ⊂ C. Usually S = Z[i] or Z[ω]. The code rate is limited by the number of antennas, so that the code rate ≤ min{# of Tx antennas, # of Rx antennas}. 10 A block code is said to have full rate if its rate equals the number of transmit antennas. In that case there must be at least as many receive antennas in the setting, too. Block codes constructed from orthogonal designs have been shown to possess many advantages [13]. They provide full diversity with maximal rate and have simple maximum-likelihood decoding algorithms with linear processing at the receiver. Definition 3.1. A complex orthogonal design of size n is a n × n matrix G whose entries are the indetermined variables ±x1,±x2, ...,±xn, their conjugates ±x∗1,±x∗2, ...,±x∗n and multiples of these by i = √−1 such that G∗G = (|x1|2 + |x2|2 + · · ·+ |xn|2) In. Example 3.1. The so-called Alamouti scheme is an example of a complex orthogonal design. It uses 2 transmit antennas and has rate 1. The transmitted 2× 2 codeword is A(c1, c2) = ( c1 c2 −c∗2 c∗1 ) c1, c2 ∈ C and the receiver obtains the signal y = [y1, y2] = hA+ n. A∗A = ( |c1|2 + |c2|2 0 0 |c1|2 + |c2|2 ) Let A1 = ( 1 0 0 1 ) ,A2 = ( i 0 0 −i ) ,A3 = ( 0 1 −1 0 ) ,A4 = ( 0 i i 0 ) denote the basis matrices. Then for every non-zero h ∈ C2\{0} the vectors hAi and hAj , i 6= j, are mutually orthogonal, since 〈hAi ,hAj〉R =  0 if i 6= j,|h1|2 + |h2|2 if i = j. Due to this fact the Alamouti scheme has a very simple ML-decoding algorithm [2]. However, it has been shown that such full rate complex orthogonal designs do not exist for more than two transmit antennas [13]. Block codes derived from so-called generalized complex orthogonal designs still provide orthogonality and thus have simple ML-decoding, but they have smaller rate. 11 Definition 3.2. A generalized complex orthogonal design of size n is a p × n matrix G whose entries are 0,±x1,±x∗1±x2,±x∗2, ...,±xk,±x∗k or their product with i such that G∗G = κ (|x1|2 + |x2|2 + · · ·+ |xk|2) In, where κ is a constant. G has rate R = k/p. Example 3.2. For four antennas the next example of a generalized complex orthogonal design provides rate 3/4, i.e., only three complex symbols are trans- mitted in four time slots. MOD(c1, c2, c3) =   c1 c2 c3 0 −c∗2 c∗1 0 c3 c∗3 0 −c∗1 c2 0 c∗3 −c∗2 −c1   (c1, c2, c3 ∈ C). It spans a 6-dimensional vector space over R with the basis {MOD(1, 0, 0),MOD(i, 0, 0), . . . ,MOD(0, 0, i)} and it has the property M∗ODMOD = (|c1|2 + |c2|2 + |c3|2)I4. The two block codes LNF and LQNF that were introduced in Section 3.1 do not have the structure of a general orthogonal design. This means that the decoding at the receiver will be more complex using these codes. They do, however, have higher rate as both of them have rate 1. In the next chapters we will show that there exist algorithms that in practical situations provide a reasonable average decoding complexity for these codes. 12 4 Sphere Decoder In general the problem of finding the closest lattice point is prohibitively com- plex. Indeed, certain cryptosystems are based on the difficulty of solving this and the related problem of finding the shortest vector in a lattice of a rank rang- ing in the hundreds. However, in communications applications the rank of the lattice remains moderate and the search is usually constrained among the finite set of points with coordinates within the prescribed signal set. Nevertheless, the number of valid combinations of coordinates is still too large for the simple approach of an exhaustive search. For example, with a lattice code of rank 8 and a signal set of size 8 there would be 88 ≈ 16.8 million different lattice points through which to search. In communications applications the given vector is not arbitrary but rather an unknown lattice point that has been corrupted by an additive noise vector with known statistical properties. Therefore it is meaningful to consider the expected (average) complexity instead of the worst-case complexity. It has been shown that the sphere decoder algorithm, which is described in this chapter, has a polynomial time average complexity in many relevant cases, for a range of system parameters [6]. This finding is also supported by the simulation results in Chapter 5. This chapter is based on the articles [6] and [9]. 4.1 Introduction to Sphere Decoding The basic idea of sphere decoding is to search for the closest lattice point only among points within a certain hypersphere of radius r around the given vector y, instead of an exhaustive search over all lattice points. Clearly the closest lattice point inside the hypersphere is also the closest point in the whole lattice. Figure 3 illustrates this idea behind the sphere decoder in a 2-dimensional lattice. Figure 3: Closest lattice point in a 2-dimensional lattice First of all, it is necessary to know which lattice points lie inside the hyper- 13 sphere. Testing the distances between y and every other lattice point would still result in an exhaustive search and would therefore offer no reduction to the overall complexity of decoding. An efficient way of finding the points is explained in the following. Instead of considering a general m-dimensional hy- persphere, we can start by considering the one-dimensional case where m = 1. A one-dimensional hypersphere is simply an interval and therefore the lattice points inside this hypersphere are the integer values lying in this interval. We can now use this fact to move from dimension k to k+1. Suppose that we have determined all of the lattice points in a k-dimensional hypersphere of radius r. Then for any such k-dimensional point, the set of admissible values for the k+1-th dimensional coordinate that lies in the higher-dimensional hypersphere of the same radius r forms an interval. In other words, we can determine all lattice points in an m-dimensional hypersphere of radius r by successively de- termining all lattice points in hyperspheres of lower dimensions 1, 2, . . . ,m and the same radius r. Such an algorithm for determining the lattice points inside anm-dimensional hypersphere can be represented by a tree where the branches in the k-th level of the tree correspond to the lattice points inside the k-dimensional hypersphere of the same radius. Figure 4 below shows an example of such a tree generated by finding the lattice points inside a 4-dimensional hypersphere. Furthermore, the complexity of such an algorithm will depend on the size of the tree, i.e., on the number of lattice points visited by the algorithm in different dimensions. k=1 k=2 k=3 k=4 Figure 4: The search tree 14 4.2 The Pohst Enumeration This strategy for enumerating all of the lattice points inside a sphere with a cer- tain radius was first proposed by Pohst [11]. This so-called Pohst enumeration can be more specifically outlined as follows [6]. Assume that B is an m× n real matrix with m ≥ n and rank(B) = n. For B ∈ Rm×n and y ∈ Rm, consider the minimization xˆ = arg min x∈Zn |y −Bx|2. The set Λ = {Bx|x ∈ Zn} is an n-dimensional (infinite) lattice in Rm. Let C0 be the squared radius of an m-dimensional sphere S(y, √ C0) centered at y. A lattice point Bx lies inside a hypersphere of radius √ C0 if and only if C0 ≥ |y −Bx|2. Now we wish to produce a list of all of these points of Λ ∩ S(y,√C0). In order to break the problem into the subproblems of finding the points successively in each dimension, it is useful to consider the QR factorization of the matrix B. By performing the Gram–Schmidt orthonormalization of the columns of B, we get B = [Q,Q′] [ R 0 ] where R is an n × n upper triangular matrix with positive diagonal elements, 0 is an (m− n)× n zero matrix, Q (resp. Q′) is an m× n (resp. m× (m− n)) matrix and [Q,Q′] is orthogonal. The condition Bx ∈ S(y,√C0) can be written as |y −Bx|2 ≤ C0∣∣∣∣[Q,Q′]Ty − [ R 0 ] x ∣∣∣∣2 ≤ C0∣∣QTy −Rx∣∣2 ≤ C0 − ∣∣(Q′)Ty)∣∣2 |y′ −Rx|2 ≤ C ′0 where y′ , QTy and C ′0 , C0−|(Q′)Ty)|2. Because R has an upper triangular form, the last inequality implies the set of conditions n∑ j=i ∣∣∣∣y′j − n∑ l=j rj,lxl ∣∣∣∣2 ≤ C ′0, i = 1, . . . , n. (3) By considering the above conditions in the descending order from n to 1 (in the same way as in back-substitution when solving a linear upper triangular system), we obtain the set of suitable values of each symbol xi for fixed values of previous 15 symbols xi+1, . . . , xn. More explicitly, let x n l , (xl, xl+1, . . . , xn) T denote the last n− l+1 components of the vector x. When xni+1 is fixed, the component xi can take on integer values in the range Ii(xni+1) = [Ai(xni+1), Bi(xni+1)] where Ai(x n i+1) = ⌈ 1 ri,i ( y′i − n∑ j=i+1 ri,jxj − √√√√C ′0 − n∑ j=i+1 ∣∣∣∣y′j − n∑ l=j rj,lxl ∣∣∣∣2 )⌉ and Bi(x n i+1) = ⌊ 1 ri,i ( y′i − n∑ j=i+1 ri,jxj + √√√√C ′0 − n∑ j=i+1 ∣∣∣∣y′j − n∑ l=j rj,lxl ∣∣∣∣2 )⌋ . If for some i n∑ j=i+1 ∣∣∣∣y′j − n∑ l=j rj,lxl ∣∣∣∣2 ≥ C ′0 or if Ai(x n i+1) > Bi(x n i+1), then Ii(xni+1) = ∅. In this case, there exists no such value of xi that would satisfy the inequalities (3) and the points which have the same values of xi+1, . . . , xn as x n i+1 do not lie inside the sphere S(y, √ C0). The Pohst enumeration starts from level i = n and proceeds climbing up to levels i = n − 1, n − 2, . . . , 1. At each level i, the variable values chosen earlier at lower levels determine the interval Ii(xni+1) for xi. If I1(xn2 ) is nonempty, the vectors x = (x1, (x n 2 ) T )T , for all x1 ∈ I1(xn2 ), correspond to lattice points Bx ∈ S(y,√C0). The squared Euclidean distances between such points and y are given by d2(y,Bx) = n∑ j=i ∣∣∣∣y′j − n∑ l=j rj,lxl ∣∣∣∣2. The output of the algorithm is the point xˆ for which this distance is minimum. If no point is found inside the sphere it is declared empty and the search fails. In this case the squared radius C0 must be increased and the search is performed again. 4.3 The Schnorr–Euchner Enumeration and the Babai Point In the article [1] Agrell et al. proposed the use of Schnorr–Euchner refinement of the Pohst enumeration in the closest lattice point search. They concluded, based on numerical results, that their sphere decoding algorithm with Schnorr– Euchner enumeration is more efficient than the Viterbo–Boutros implementation that uses Pohst enumeration. The Pohst enumeration is based on the so-called natural spanning of the intervals Ii(xni+1) at each level i. In other words, xi takes on values in the as- 16 cending order Ai(x n i+1), Ai(x n i+1) + 1, . . . , Bi(x n i+1). The Schnorr–Euchner enu- meration is a variation of the Pohst strategy where the intervals are spanned in a zig-zag order, starting from the midpoint Si(x n i+1) = ⌊ 1 ri,i ( y′i − n∑ j=i+1 ri,jxj )⌉ . (4) Hence, the sequence of values produced by the Schnorr–Euchner enumeration at each level i is xi ∈ {Si(xni+1), Si(xni+1)+1, Si(xni+1)−1, Si(xni+1)+2, Si(xni+1)−2, . . .}∩Ii(xni+1) if y′i − n∑ j=i+1 ri,jxj − ri,iSi(xni+1) ≥ 0 or the sequence of values xi ∈ {Si(xni+1), Si(xni+1)−1, Si(xni+1)+1, Si(xni+1)−2, Si(xni+1)+2, . . .}∩Ii(xni+1) if y′i − n∑ j=i+1 ri,jxj − ri,iSi(xni+1) < 0. Similar to Pohst enumeration, when a given value of xi results in a point segment xni outside the sphere, the next value of xi+1 (at level i+ 1) is produced. Every time a vector x′ ∈ Zn is found such that Bx′ ∈ S(y,√C0), the squared radius of the sphere can be dynamically updated to d2(y,Bx′), since this will obviously be an upper bound for the distance of the closest point. Note that with the Schnorr–Euchner enumeration one can set the squared radius to C0 = ∞, in which case the event of declaring an empty sphere never occurs. The first point found in this case is by definition the Babai point [1]. The Babai point is not necessarily the closest point, but the error can be bounded, i.e., it is a nearby point. This point is also known as the nulling and cancelling or zero-forcing decision-feedback equalization (ZF-DFE) point. Explicitly, it is given by xzf−dfei = Si(x zf−dfe i+1 , . . . , x zf−dfe n ) = ⌊ 1 ri,i ( y′i − n∑ j=i+1 ri,jx zf−dfe j )⌉ for i = m,m − 1, . . . , 1. The procedure begins by first finding the midpoint of the interval In and setting xzf−dfen at it. Its effect is then cancelled out by back-substitution, which is enabled by the upper triangular form of R. The same process is then repeated for xzf−dfen−1 and so on. 17 4.4 The Algorithm The following algorithm was proposed by Damen et al. [6] and was shown to be very efficient in terms of receiver complexity when compared to other known sphere decoding algorithms. It is a modification of the Schnorr–Euchner enumeration in order to take into account a finite signal set boundary. Here the lattice is not infinite, but x takes on values from the set {0, 1, . . . , Q − 1}, i.e., Λ = {Bx|x ∈ ZnQ}. Algorithm II, Smart Implementation (Input C ′0, y ′, R, Output xˆ): Step 1 (Initialization) Set i := n, Tn := 0, ξn := 0, and dc := C ′ 0 (current sphere squared radius). Step 2 (DFE on xi) Set xi := ⌊(y′i − ξi)/ri,i⌉ and ∆i := sign(y′i − ξi − ri,ixi). Step 3 (Main step) If dc < Ti + |y′i − ξi − ri,ixi|2, then go to Step 4 (i.e., we are outside the sphere). Else if xi /∈ [0, Q−1] go to Step 6 (i.e., we are inside the sphere but outside the signal set boundaries). Else (i.e., we are inside the sphere and signal set boundaries) if i > 1, then {let ξi−1 := ∑n j=1 ri−1,jxj , Ti−1 := Ti + |y′i − ξi − ri,ixi|2, i := i− 1, and go to Step 2}. Else (i = 1) go to Step 5. Step 4 If i = n, terminate, else set i := i+ 1 and go to Step 6. Step 5 (A valid point is found) Let dc := T1 + |y′1 − ξ1 − r1,1x1|2, save xˆ := x. Then, let i := i+ 1 and go to Step 6. Step 6 (Schnorr–Euchner enumeration of level i) Let xi := xi + ∆i, ∆i := −∆i − sign(∆i), and go to Step 3.  Here Ti−1 corresponds to the squared distance between the points (xi, . . . , xn) and (yi, . . . , yn). Note that only the coordinates i, . . . , n of the lattice points are considered. The variable ξi, for i = n, . . . , 1, is the decision feedback of a ZF-DFE when the decisions on the symbols from i + 1 to n are the current values of (xi+1, . . . , xn). The variable ∆i holds the information of which value of xi follows next according to the Schnorr–Euchner enumeration. Every time a new value for xi is chosen, it corresponds to a new node in the tree representation of the algorithm – see the Example 4.1 and Figure 6 below. The first step is initialization and we start from the root of the tree by 18 choosing a value for the coordinate xn in step 2. We come to step 2 also every time we move to a lower level i in the tree. Here the midpoint of the interval is chosen as the first value of xi, i.e., we choose the value of xi with which the distance from y is the smallest when the previous values of xj , j = i+ 1, . . . , n are known. In step 3 we calculate the distance of the point from y and compare it to the radius of the sphere. If we are outside the sphere, we continue to step 4. In this case we go up one level in the tree to the next value of xi+1 which is determined by ∆i+1. In case we are at the upper most level of the tree, the search is terminated. If in step 3 we are inside the sphere but outside the signal set boundaries, we go to step 6 and move to the next value of xi staying at the same level in the tree. If we are both inside the sphere and inside the signal set boundaries, a suitable value for xi was found and we move down one level in the tree and go to step 2 again. We come to step 5 if a lattice point is found inside the sphere. The radius is then updated and we continue by moving up one level to the node determined by ∆i+1. Step 6 takes care of the zig-zag order of the values of xi determined by the Schnorr–Euchner enumeration by adjusting the variable ∆i. Figure 5 on the next page presents a flow chart of Algorithm II. Let us next consider a numerical example: Example 4.1. Let x = (2, 7, 3, 2)T be the message to be transmitted (from the signal set xi ∈ Z8) and B =   3.7 1.6 −0.6 20.2 1.5 5.2 5.1 13.5 6.7 9.4 8.6 −21.9 −4.3 −20.7 −23.1 −10.7   and N =   −0.8 0.1 1.2 −0.9   be the channel and noise matrices. This corresponds to the case of 4 transmit and 4 receive antennas and block length l = 1. For simplicity we use real numbers in this example instead of complex numbers. The receiver obtains the signal y = Bx+N =   56.4 81.8 62.4 −245.1   . The problem is now to determine the transmitted signal x, when y and B are known. This corresponds to the problem of finding the closest lattice point to the point y in the lattice {Bx|x ∈ Z48} . For this purpose, the QR-decomposition 19 Initialization: i= n T = 0i xi = 0 d = CC Set xi at the midpoint of the interval and calculate : x = (y - )/r = sign(y - - r x ) D ù D x i i i i i,i i i i i,i i ë x Is the distance of (x ,...,x ) from (y ,...,y ) greater than the radius: d < T + |y - - r x | i n i n C i i i i,i ix 2 Is i = n? Yes Is x in [0,Q-1]?i No Is i > 1? Yes The point is outside the sphere, return to the previous level: i = i + 1 Select the next value for x : = + i x x = - -sign( ) i i i i i i D D D D No Yes A valid coordinate was found, continue to the next level: = r x , where j=1,...,n T = T + |y - i - r x | i = i -1 x S x i-1 i-1,j j i-1 i i i,i i 2 Yes A valid point was found, save x' = x, update the radius and move back to the previous level: d < T + |y - - r x | i = i + 1 C 1 1 1 1,1 1x 2 No Output x' if a valid point was found Input: vector upper diagonal matrix R signal set size Q squared radius C y Figure 5: Flow chart of Algorithm II 20 of B is first computed: B = QR =   0.42 −0.43 −0.68 0.43 0.17 0.15 −0.51 −0.83 0.75 −0.32 0.53 −0.23 −0.48 −0.83 0.06 −0.28     8.91 18.61 18.23 −0.64 0 14.15 17.34 9.19 0 0 0.98 −32.72 0 0 0 5.38   . The sphere decoding algorithm takes the upper triangular matrix R and the vector y′ = QTy   202.5 170.2 −61.4 10.3   as input. The squared radius of the sphere is set to be C0 = 15. It then proceeds as depicted in Figure 6 below. The nodes in the tree correspond to the different values of xi that the algorithm goes through when searching for the closest lattice point, and the dashed arrows show how the sphere decoder traverses through the tree. Figure 6: The search tree for the numerical example The first value for x4 is computed by x4 = ⌊y′4/r4,4⌉ = ⌊10.3/5.38⌉ = ⌊1.91⌉ = 2. This point is inside the sphere, which can be verified using the condition (3), and the algorithm moves to the next level. The same happens at the second and third levels after which a lattice point is found inside the sphere with coordinates x = (2, 6, 4, 2)T . This point is also referred to as the Babai point. Now the radius of the sphere is replaced by the distance between this 21 point and y′ in the lattice, since the closest point cannot be further away from y′ than the one that was already found. The new value of the squared radius C0 is 8.25. Next, the algorithm jumps to the previous level with x2 = 5. However, all of the points x = (x1, 5, 4, 2) T are outside the sphere and the algorithm moves up again to x3 = 5. This is inside, but at the next level the points x = (x1, 5, 5, 2) T are again outside the sphere. The next value for x3 is 3 and the algorithm proceeds all the way to the level x1, where another lattice point is found at x = (2, 7, 3, 2)T . The radius of the sphere is again updated to 2.9, and the algorithm moves up in the tree. All of the following points at each level are then outside the sphere and the algorithm moves up and finally stops. The point which was found to be the closest one to y′ is xˆ = (2, 7, 3, 2)T , which is exactly the transmitted point. 22 5 Complexity of Sphere Decoding In e.g. [1], [6], [9], [10], [13] and [14] several factors that may affect the complex- ity of the sphere decoder have been suggested. In this and the next chapter the complexity of sphere decoding is studied using results from computer simula- tions. The complexity comparisons are necessarily statistical in nature. In most cases the average complexity is relevant (where the complexity is averaged over several channel realizations and several choices of a transmitted lattice point). In some cases the high complexity tail is more informative. In the simulations the two block codes from Section 3.1 are used as code- books. The number of nodes in the search tree is used as a measure of complex- ity, so that the implementation details or the environment where the simulation runs do not affect it. 5.1 System Model In this section we describe the system model that was used in the simulations. The system takes the SNR, the numbers of transmit and receive antennasm and n, the initial squared radius C0 for the sphere decoder, and the size of the signal set q as input parameters. The basis matricesX1, ...,Xk, whereXi ∈Mm×l(C), of the code to be used for transmitting are read from a file that is also given as an input parameter. The transmission is simulated by first generating a random message a = (a1, ..., ak) to be sent, where the components are from the PAM signal set ai = {2ci − q + 1|ci ∈ Zq}. Then the channel matrix H ∈ Mn×m(C) and the noise matrix N ∈ Mn×l(C) are generated. The elements ηi,j of N are i.i.d. complex Gaussian random variables with zero mean and variance σ2 = 1 per dimension. The elements hi,j of H are also i.i.d. complex Gaussian random variables with zero mean and variance τ2 = SNR lPL per dimension, where PL is the average energy of a codeword from the used constellation. The received matrix Y ∈ Mn×l(C) is computed by Y = H k∑ i=1 aiXi +N. Next the system simulates the receiver by finding the closest point to the received one using knowledge of the channel state. A translation and scaling is applied to the received matrix so that instead of the coefficients ai we have the coefficients ci ∈ Zq in order to use the sphere decoder. The matrix Y is then turned into a real valued vector y′ ∈ R2ln by converting each row yi to real valued vector y′i and then combining these together: y′i = (ℜ(yi,1),ℑ(yi,1),ℜ(yi,2),ℑ(yi,2), . . . ,ℜ(yi,l),ℑ(yi,l)) for i = 1, ..., n 23 and y′ = [ y′1 y ′ 2 · · · y′n ] . The matrices HX1, ...,HXk ∈ Mn×l(C) are computed and turned into vectors bi ∈ R2ln in the same way as the received matrix. Then the QR- decomposition is computed for the matrix B = [ bT1 b T 2 · · · bTk ] and R, y′ and the initial squared radius C0 are given to the sphere decoding algorithm as input. The sphere decoder then returns the closest point x, from which the transmitted point a can be obtained by computing ai = 2xi−q+1 for i = 1, ..., k. The system was implemented in C++ and the simulations were run on a Windows XP Professional environment with a 1.83 GHz dual-core processor and 2048MB RAM. In each simulation at least 1 000 000 channel realizations were created, unless otherwise stated. In the cases where the performances of different algorithms were being compared, the same channel realizations were used for both algorithms. 5.2 Choosing the Radius The selection of the initial radius affects the performance of the sphere decoding algorithm. Too small an initial radius causes no lattice points to be found inside the sphere and the sphere decoder must start the search again with an increased radius. On the other hand, too large a radius may result in an unnecessarily large search tree. Algorithm II is in general not very sensitive to a large initial radius. This is because in most cases it finds the Babai point right away and updates the radius accordingly so that the large initial radius is not a problem. However, with finite lattices the Babai point may be outside the constellation, in which case the algorithm does not find it and may have to search for a long time before finding a valid point inside the sphere. Figures 7(a) and 7(b) show the average complexity of Algorithm II as a function of C0 in systems using the lattice codes LQNF and LNF . We observe that with LQNF after a certain point a larger initial radius does not have much of an effect on the complexity. A system using the code LNF on the contrary is much more sensitive to the selection of the initial radius, especially in the low SNR range. 24 2 4 6 8 10 12 14 16 15 20 25 30 35 40 45 Algorithm II with LQNF, q=8, n=1 Initial squared radius C0 Av er ag e co m pl ex ity (# vi sit ed po int s) SNR=19 dB SNR=25 dB (a) The code LQNF is used for transmission 2 4 6 8 10 12 14 16 15 20 25 30 35 40 45 Algorithm II with LNF, q=8, n=1 Initial squared radius C0 Av er ag e co m pl ex ity (# vi sit ed po int s) SNR=25 dB SNR=28 dB SNR=31 dB (b) The code LNF is used for transmission Figure 7: The average complexity of Algorithm II as a function of the initial squared radius C0 in a system with one receive antenna and q = 8. 25 5.3 SNR and Complexity A high SNR typically reduces the complexity of decoding. When the SNR in- creases, the influence of the noise gets smaller and the received point is expected to be nearer to a lattice point. It also means that the Babai point is more often the closest point. As the Babai point is the first lattice point that the sphere decoder finds, the algorithm finds the solution very quickly. Figure 8 shows the average complexity plotted against SNR for the code LNF . It can be seen that as the SNR increases, the average complexity tends to 15. With an 8-level search tree, 15 is the minimum number points the algorithm needs to visit. It means that the sphere decoder goes down the search tree, finds a lattice point and then comes straight up without any zigzagging. We also notice, that the sphere decoder performs better with the lattice code LQNF than with LNF . We study the reasons for this in Chapter 6. 22 24 26 28 30 32 34 10 15 20 25 30 35 40 45 50 55 60 Algorithm II, q=8, n=1, C0=6 SNR (dB) Av er ag e co m pl ex ity (# vi sit ed po int s) LQNF LNF Figure 8: The average complexity of Algorithm II as a function of the SNR, using the codes LQNF and LNF The complexity depends also on the data rate, i.e., the choice of q. A greater size of the signal set means that the codebook is larger and also the search tree 26 will be larger, since on each level of the tree there will be more valid options for the coefficients. This effect is visible in Figure 9, which presents the average complexity against SNR when different data rates are used. 5 10 15 20 25 10 15 20 25 30 35 40 Algorithm II with LQNF, q=8, n=1, C0=6 SNR (dB) Av er ag e co m pl ex ity (# vi sit ed po int s) q=8 q=4 q=2 Figure 9: The average complexity of Algorithm II as a function of the SNR, using the code LQNF and different data rates in the transmission 5.4 Preprocessing and Ordering Preprocessing and the ordering in which the components of x are considered has a significant effect on the performance of the sphere decoder. The standard preprocessing and ordering consists of the QR-decomposition and the natural back-substitution ordering xk, xk−1, . . . , x1. But this is not necessarily the op- timal ordering. One example of a preprocessing approach is the vertical Bell Labs layered space–time (V-BLAST) optimal decision ordering [7]. Its purpose is to maximize the minimum value of the R-matrix’s diagonal elements, since a large value of ri,i gives a short search interval on the corresponding level of the search tree. Ordering the diagonal elements so that rk,k is the largest and r1,1 the smallest 27 value also reduces the effect of error propagation. Since the sphere decoder starts from the level k, selecting wrong values in the beginning can have an adverse effect on the complexity. The lattice code LQNF does not need this kind of preprocessing, because as we will see in Section 7.2, the default ordering gives its R-matrix an advanta- geous structure, which allows for the use of a faster, modified algorithm. 5.5 Error Probability Since the sphere decoder finds the exact maximum-likelihood solution, the error rates show exactly how often the closest lattice point detected at the receiver has not been the point that was sent. The block error rate (BLER) and the bit error rate (BER) indicate how often there have been symbol errors and correspondingly bit errors in the detected message. In the simulations Gray encoding has been used for the bit encoding. The error rates naturally decrease as the SNR increases and the effect of noise becomes smaller. Figures 10(a) and 10(b) display a comparison of the block and bit error rates for the codes LQNF and LNF with different values of SNR. It can be seen that the code LQNF performs better than the code LNF in the sense of error probability. Other effects are more subtle. Some lattices are more adversely affected by certain anomalous channel realizations than others. In such cases the lattice is severely skewed and high complexity comes jointly with high error probability. This scenario will be explained in more detail in Chapter 6. 28 19 20 21 22 23 24 25 26 27 28 10−3 10−2 10−1 100 Block error rate, q=8, n=1 SNR (dB) BL ER LNF LQNF (a) Block error rates for LQNF and LNF 19 20 21 22 23 24 25 26 27 28 10−4 10−3 10−2 10−1 Bit error rate, q=8, n=1 SNR (dB) BE R LNF LQNF (b) Bit error rates for LQNF and LNF Figure 10: Error rates for LQNF and LNF 29 6 Collapsing Lattices In this section only the multiple-input single-output (MISO) channel setting with a single receive antenna is considered. In the MISO case the channel matrix H is a complex vector, which we here denote by h. 6.1 The Defect Several different 8-dimensional lattice codes of 4×4 complex matrices have been designed, but they all have the property that for some non-zero channel vector h the dimension of the lattice hL is smaller than the dimension of L. In the following we call this event the collapsing of the lattice. The closely related notion of a lattice’s defect was presented in the article [10]: Definition 6.1. Matrix lattice L has defect r, if its rank is m but the minimum positive real dimension of the span of hL is m − r. In other words, the lattice collapses by dimension r. A lattice hL collapses if and only if there exists a vector h ∈ Cn\{0} such that the set {hX1, . . . ,hXk} is linearly dependent over R, where X1, . . . ,Xk are the basis matrices of L. Let us consider next the lattice code LNF that was introduced in Section 3.1. The code matrices are of the form MNF (c1, c2, c3, c4) =   c1 ic4 ic3 ic2 c2 c1 ic4 ic3 c3 c2 c1 ic4 c4 c3 c2 c1   c1, c2, c3, c4 ∈ Z[i]. Let the basis matrices X1, . . . ,X8 be as defined in Section 3.1. Now LNF = ZX1 ⊕ ZX2 ⊕ . . .⊕ ZX8 and it has dimension 8. Let us define hj = (1, ζ j , ζ2j , ζ3j) for j ∈ {1, 5, 9, 13}, where ζ = e2pii/16. The vectors hj are the eigenvectors of all of the matrices of the lattice LNF , as shown below. Note that ζ4 = i and ζ4j = ij = i. 30 hjMNF (c1, c2, c3, c4) = (1, ζ j , ζ2j , ζ3j)   c1 ic4 ic3 ic2 c2 c1 ic4 ic3 c3 c2 c1 ic4 c4 c3 c2 c1   = (c1 + ζ jc2 + ζ 2jc3 + ζ 3jc4︸ ︷︷ ︸ =λj , . . . , ic2 + ζ jic3 + ζ 2jic4 + ζ 3jc1) = (λj , . . . , ζ 4jc2 + ζ 5jc3 + ζ 6jc4 + ζ 3jc1) = (λj , . . . , ζ 3j(c1 + ζ jc2 + ζ 2jc3 + ζ 3jc4)) = (λj , ζ jλj , ζ 2jλj , ζ 3jλj) = λj (1, ζ j , ζ2j , ζ3j) = λjhj The vectors {h1, h5, h9, h13} form a basis of C4. The channel vector h can be written in terms of the basis vectors: h = α1h1+α5h5+α9h9+α13h13, where αj ∈ C. Now, suppose that the channel vector h is a scalar multiple of only one of the hj . Then only two of the vectors hX1, . . . ,hX8 are linearly independent over R, and the lattice hLNF has real dimension 2. Thus, the lattice LNF has defect 6. Theorem 6.1. With the lattice code LNF the image lattice hLNF collapses ⇔ αj = 0 for some j ∈ {1, 5, 9, 13}. Proof. The basis vectors of the lattice hLNF are hXk = ∑ j αjhjXk = ∑ j αjλ(Xk, j)hj where λ(Xk, j) is the scalar in hjXk = λ(Xk, j)hj . If αj = 0 for some j, say α9 = 0, then ∀k = 1, . . . , 8 hXk = β1h1 + β5h5 + β13h13 ∈ Ch1 + Ch5 + Ch13 , V9ˆ where dimCV9ˆ= 3 and dimRV9ˆ= 6. Conversely, assume that αj 6= 0 ∀j ∈ {1, 5, 9, 13}. Since hX2k = ihX2k−1 for k = 1, . . . , 4, it suffices to check that the vectors hX2k−1, k = 1, . . . , 4 are linearly independent. For them λ(X2k−1, j) = ζ(k−1)j . hX2k−1 = ∑ j∈{1,5,9,13} αjζ (k−1)jhj , k = 1, . . . , 4. 31 The determinant∣∣∣∣∣∣∣∣∣∣ α1 α1ζ α1ζ 2 α1ζ 3 α5 α5ζ 5 α5ζ 10 α5ζ 15 α9 α9ζ 9 α9ζ 18 α9ζ 27 α13 α13ζ 13 α13ζ 26 α13ζ 39 ∣∣∣∣∣∣∣∣∣∣ = α1α5α9α13|VM | 6= 0 where |VM | is the determinant of a Van der Monde matrix, so the vectors hX2k−1, k = 1, . . . , 4, are linearly independent. Let us consider now the code LQNF , which was also introduced in Section 3.1. The code matrices are of the form MQNF (c1, c2, c3, c4) =   c1 ic2 −c∗3 −c∗4 c2 c1 ic ∗ 4 −c∗3 c3 ic4 c ∗ 1 c ∗ 2 c4 c3 −ic∗2 c∗1   c1, c2, c3, c4 ∈ Z[i]. The channel vector h can be written in terms of vectors vi, (i = 1, . . . , 4), which form a basis of C4: h = h+ + h− = h1v1 + h2v2︸ ︷︷ ︸ h+ +h3v3 + h4v4︸ ︷︷ ︸ h− where v1 = (1, ξ, 0, 0) , v2 = (0, 0, 1, ξ), v3 = (1,−ξ, 0, 0), v4 = (0, 0, 1,−ξ), hj ∈ C (j = 1, . . . 4) and ξ = e2pii/8. We denote by V+ = LC(v1, v2) the vector space generated by v1 and v2 and by V− = LC(v3, v4) the vector space generated by v3 and v4. Now v1MQNF (c1, c2, c3, c4) = (c1 + ξc2)v1 − (c3 + ξc4)∗v2, v2MQNF (c1, c2, c3, c4) = (c3 + ξc4)v1 + (c1 + ξc2) ∗v2, v3MQNF (c1, c2, c3, c4) = (c1 − ξc2)v3 − (c3 − ξc4)∗v4 and v4MQNF (c1, c2, c3, c4) = (c3 − ξc4)v3 + (c1 − ξc2)∗v4. Theorem 6.2. With the lattice code LQNF the image lattice hLQNF collapses ⇔ h+ = 0 or h− = 0 Proof. Consider the vector space V = {hMQNF (z1, z2, z3, z4) | zi ∈ C}. First we show that if h+ 6= 0 and h− 6= 0 then dimR V = 8. 32 Now if z1 = ξz2 and z3 = ξz4 then v3MQNF (z1, z2, z3, z4) = 0 and v4MQNF (z1, z2, z3, z4) = 0, and the dimension of V depends only on h+ and not on h−. Notice that ξ = e2pii/8 = 1+i√ 2 , ξ2 = i and ξ∗ = −iξ. i) If z1 = 1 2 , z2 = 1 2ξ ∗, z3 = z4 = 0 then v1MQNF = v1 and v2MQNF = v2 ii) If z1 = i 2 , z2 = 1 2ξ, z3 = z4 = 0 then v1MQNF = iv1 and v2MQNF = −iv2 iii) If z1 = z2 = 0, z3 = − 12 , z4 = − 12ξ∗ then v1MQNF = v2 and v2MQNF = −v1 iv) If z1 = z2 = 0, z3 = i 2 , z4 = 1 2ξ then v1MQNF = iv2 and v2MQNF = iv1 From these we see that the four h+MQNF vectors below are all in V . i) ⇒ h+MQNF = h1v1 + h2v2 ∈ V ii) ⇒ h+MQNF = ih1v1 − ih2v2 ∈ V iii) ⇒ h+MQNF = −h2v1 + h1v2 ∈ V iv) ⇒ h+MQNF = ih2v1 + ih1v2 ∈ V All of these belong to V+ and by showing that these four vectors are linearly independent over R we prove that the whole V+ is included in V . Let h1 = x1+ y1i and h2 = x2+ y2i where xj , yj ∈ R for j = 1, 2. In terms of the R-basis {v1, iv1,v2, iv2} of V+, these vectors get the form (x1, y1, x2, y2) ∈ V (−y1, x1, y2,−x2) ∈ V (−x2,−y2, x1, y1) ∈ V (−y2, x2,−y1, x1) ∈ V By looking at the determinant of the matrix A =   x1 y1 x2 y2 −y1 x1 y2 −x2 −x2 −y2 x1 y1 −y2 x2 −y1 x1   we see that if h1 6= 0 or h2 6= 0 the vectors are linearly independent. Since AAT = (x21 + y 2 1 + x 2 2 + y 2 2)I4 it follows that detA 6= 0 if h1 6= 0 or h2 6= 0. Hence, if h+ 6= 0 then V+ ⊆ V . 33 Similarly by considering the code matrices for which z1 = −ξz2 and z3 = −ξz4 we see that if h− 6= 0 then V− ⊆ V . Now V+ ⊕ V− ⊆ V ⊆ LC(v1,v2,v3,v4). Since V+ ⊕ V− = LC(v1,v2,v3,v4) it follows that V = LC(v1,v2,v3,v4) and hence dimR V = 8 if h+ 6= 0 and h− 6= 0. Conversely, if h+ = 0 or h− = 0 then the resulting vector space hMQNF only has terms of v1 and v2 or terms of v3 and v4 and thus has real dimension 4. This means that LQNF has defect 4. 6.2 Collapsing Lattices and the Decoding Complexity If the vectors {hX1, . . . ,hXk} are orthogonal then the closest lattice point is the Babai point, which the sphere decoder finds right away without any zigzagging. It is natural to assume that the decoding complexity would be significantly higher if the lattice collapses or comes near to collapsing. In this section we study the effect of the collapsing lattice on the decoding performance using computer simulations. When the lattice collapses or is near collapsing, it causes the diagonal el- ements of R to be very small. This can be verified from Figure 11(a), which shows the correlation between the minimum diagonal element of R and the length of the shortest component αjhj of h for the lattice code LNF . Figure 13(a) shows the same for the lattice code LQNF , where h has components h+ and h−. A small diagonal element of R means that the corresponding interval for the sphere decoder on that level is large, which makes the search tree also large. It can be seen from Figures 11(b) and 13(b), that a small diagonal element very often corresponds to a high complexity. Figures 12(a) and 12(b) show the complexity distributions of 10,000 trans- missions for the lattice code LNF with different SNRs. The horizontal axis indicates how close to collapsing the image lattice hL has been. Figures 14(a) and 14(b) show the same for the lattice code LQNF . For both LNF and LQNF , the figures show that the smaller the minimum component of h, the higher the complexity. We also notice that the lattice code LNF comes near to collapsing more often than LQNF . This can be explained by the higher defect of LNF . 34 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 LNF, SNR=24dB, q=8 min(|αjhj|2) m in (r i, i) (a) Minimum diagonal element of R vs. minimum |αjhj | 2 0 1 2 3 4 5 6 7 0 500 1000 1500 2000 2500 3000 3500 4000 LNF: complexity vs. defect, SNR=24dB, q=8 min(ri,i) Co m pl ex ity (# vi sit ed po int s) (b) Complexity vs. minimum diagonal element of R Figure 11: Distributions of 10,000 transmissions using the code LNF 35 0 1 2 3 4 5 6 7 0 500 1000 1500 2000 2500 3000 3500 4000 LNF: complexity vs. defect, SNR=24dB, q=8 min(|αj hj|2) Co m pl ex ity (# vi sit ed po int s) (a) Complexity vs. defect with LNF and an SNR of 24dB 0 5 10 15 20 25 30 35 0 200 400 600 800 1000 1200 1400 LNF: complexity vs. defect, SNR=30dB, q=8 min(|αjhj|2) Co m pl ex ity (# vi sit ed po int s) (b) Complexity vs. defect with LNF and an SNR of 30dB Figure 12: Complexity distributions of 10,000 transmissions using the code LNF and different SNRs 36 0 1 2 3 4 5 6 7 0 0.5 1 1.5 2 2.5 3 3.5 4 LQNF, SNR=19dB, q=8 min(|H+|2,|H−|2) m in (r i, i) (a) Minimum diagonal element of R vs. min{|h+|2,h−|2} with LQNF and an SNR of 19dB 0 0.5 1 1.5 2 2.5 3 3.5 4 0 200 400 600 800 1000 1200 LQNF: complexity vs. min(ri,i), SNR=19dB, q=8 min(ri,i) Co m pl ex ity (# vi sit ed po int s) (b) Complexity vs. minimum diagonal element of R with LQNF and an SNR of 19dB Figure 13: Distributions of 10,000 transmissions using the code LQNF 37 0 1 2 3 4 5 6 7 0 200 400 600 800 1000 1200 LQNF: complexity vs. defect, SNR=19dB, q=8 min(|H+|2,|H−|2) Co m pl ex ity (# vi sit ed po int s) (a) Complexity vs. defect with LQNF and an SNR of 19dB 0 5 10 15 20 25 0 50 100 150 200 250 LQNF: complexity vs. defect, SNR=25dB, q=8 min(|H+|2,|H−|2) Co m pl ex ity (# vi sit ed po int s) (b) Complexity vs. defect with LQNF and an SNR of 25dB Figure 14: Complexity distributions of 10,000 transmissions using the code LQNF and different SNRs 38 7 Improving the Performance of Sphere Decod- ing In this chapter we look for ways to reduce the complexity of sphere decoding by two methods: first by modifying the algorithm and second, by taking advantage of the properties of the lattice. 7.1 Adding Constraints The fact that we are using finite code sets gives us an opportunity to improve the algorithm. The idea is to avoid going through points that are outside the signal set. The original algorithm picks the midpoint of the interval on each level as the starting point and zigzags its way from there until it finds a valid point or is outside the sphere. If the midpoint is far below or above the signal set [0, Q− 1], the algorithm may have to go through a lot of points before reaching the signal set values. As we know that any points outside the signal set cannot be a part of the solution, we can improve the algorithm by jumping straight to the first possible value on each level. If the midpoint of the interval is smaller than 0, we can pick 0 as the starting point, as it is the first value belonging to the signal set that the algorithm would come across. Correspondingly, if the midpoint is greater than Q− 1, we can pick Q− 1 as the starting point. Another case where Algorithm II may go through unnecessarily many points in the search tree, is when it has already processed all of the values that belong to the signal set on that level, but still has not gone outside the radius. After that point all of the following values that are considered cannot belong to the solution, since they are outside the signal set. We can now add another check to the algorithm: if all of the values from 0 to Q − 1 have been processed, we can finish on that level and move up a level. Example 7.1. Let us consider only the first level of the search tree and assume that the corresponding diagonal element of the R matrix rn,n = 0.1, the received point y′n = 1.3, signal set size Q = 8 and the squared radius C ′ 0 = 5. The interval for Algorithm II on this level has a lower boundary An = ⌈(y′n − √ C ′0)/rn,n⌉ = −9 and an upper boundary Bn = ⌊(y′n + √ C ′0)/rn,n⌋ = 35. Algorithm II starts zigzagging from the middle of the interval xn := ⌊y′n/rn,n⌉ = 13 and continues until it has gone through all of the values in the interval. Since the signal set is [0, 7] the algorithm goes through points 13, 14, 12, 15, 11, 16, 10, 17, 9, 18, 8, 19 before reaching 7 as the first value belonging to the signal set. By adding the first check described above to the algorithm we could leave out the points in between and jump straight from 13 to 7 thus saving time. After 7 the next values assigned to xn are 20, 6, 21, 5, 22, 4, 23, 3, 24, 2, 25, 1, 26, 0 and 27. 39 At xn := 27, all of the values of the signal set have been processed, but xn is still not outside the radius. The first value outside the radius is 36. By adding the second check to the algorithm we could now stop processing on this level without going through the rest of the points in the interval. The amount of these unnecessary points in the search tree depends greatly on the R matrix and its diagonal elements. The smaller the diagonal element, the longer the interval that Algorithm II has to go through on that level. In practice the first modification is added to step 2 of the algorithm, right after the part where the value for xi is calculated. The second modification is added to step 3 after checking whether xi belongs to the signal set. There is also a flow chart of the modified algorithm on page 41. Modified algorithm (Input C ′0, y ′, R, Output xˆ): Step 1 (Initialization) Set i := n, Tn := 0, ξn := 0, and dc := C ′ 0 (current sphere squared radius). Step 2 (DFE on xi) Calculate xtemp := ⌊(y′i − ξi)/ri,i⌉. If xtemp < 0 set xi := 0 and ∆i := 1. Else if xtemp > Q− 1 set xi := Q− 1 and ∆i := −1. Else set xi := xtemp and ∆i := sign(y ′ i − ξi − ri,ixi). Step 3 (Main step) If dc < Ti + |y′i − ξi − ri,ixi|2, then go to Step 4 (i.e., we are outside the sphere). Else if xi /∈ [0, Q− 1] (i.e., we are inside the sphere but outside the signal set boundaries) then {Set xnext := xi+∆i. If (xi < 0 and xnext > Q− 1) or (xi > Q− 1 and xnext < 0) then go to Step 4, otherwise go to Step 6 }. Else (i.e., we are inside the sphere and signal set boundaries) if i > 1, then {let ξi−1 := ∑n j=1 ri−1,jxj , Ti−1 := Ti + |y′i − ξi − ri,ixi|2, i := i− 1, and go to Step 2}. Else (i = 1) go to Step 5. Step 4 If i = n, terminate, else set i := i+ 1 and go to Step 6. Step 5 (A valid point is found) Let dc := T1 + |y′1 − ξ1 − r1,1x1|2, save xˆ := x. Then, let i := i+ 1 and go to Step 6. Step 6 (Schnorr–Euchner enumeration of level i) Let xi := xi + ∆i, ∆i := −∆i − sign(∆i), and go to Step 3.  40 Initialization: i= n, T = 0, = 0, d = C i i Cx Set xi at the midpoint of the interval and calculate : x = (y - )/r = sign(y - - r x ) D ù D i i i i i,i i i i,i i ë x xi Is the distance of (x ,...,x ) from (y ,...,y ) greater than the radius? i n i n Is d smaller than T + C i |y - - r x | ?i i i,i ix 2 Is i = n? Yes Is x in [0,Q-1]?i No Is i > 1? Yes Return to the previous level: i = i + 1 Select the next value for x : = + i x x = - -sign( ) i i i i i i D D D D No Yes A valid coordinate was found, continue to the next level: = r x , where j=1,...,n T = T + |y - i - r x | i = i -1 x S x i-1 i-1,j j i-1 i i i,i i 2 Yes A valid point was found, save x' = x, update the radius and move back to the previous level: d < T + |y - - r x | i = i + 1 C 1 1 1 1,1 1x 2 No Output x' if a valid point was found Input: vector upper diagonal matrix R signal set size Q squared radius C y No Is x < 0?i Is x > Q-1?i No x = 0 = 1 i Di x = Q-1 = -1 i Di No Yes Yes Is <0 and Q-1 or i >Q-1 and ? x x + s x x + i i i i i i D > D <0 No Yes Figure 15: Flow chart of the modified sphere decoding algorithm 41 Figures 16(a) and 16(b) show a comparison of Algorithm II and the modified algorithm using the codes LQNF and LNF respectively. The complexity is measured in average number of points visited by the algorithm. It can be seen from the charts that the modifications indeed offer a slight reduction in complexity for both codes. The reduction is more significant in smaller SNRs. This is a natural result, because as the SNR gets smaller, the influence of the noise grows and both the probability of error and the complexity increase. Then also the size of the search tree grows and it is more common that there are points in the search tree that do not belong to the signal set. Figures 17(a) and 17(b) show the same comparisons measured in average CPU time used by the algorithms. It can be seen that the reduction in com- plexity is also visible in terms of CPU time. This means that even though the additional checks in the modified algorithm increase its complexity somewhat, the reduction they offer in the number of visited points more than makes up for the overhead. 42 17 18 19 20 21 22 23 24 25 26 27 28 15 20 25 30 35 40 LQNF, q=8, n=1, C0=6 SNR (dB) Av er ag e co m pl ex ity (# vi sit ed po int s) Algorithm II Modified algorithm (a) The code LQNF is used for transmission 20 22 24 26 28 30 32 20 30 40 50 60 70 80 LNF, q=8, n=1, C0=6 SNR (dB) Av er ag e co m pl ex ity (# vi sit ed po int s) Algorithm II Modified algorithm (b) The code LNF is used for transmission Figure 16: The average complexity of Algorithm II compared with the modified algorithm in a system with one receive antenna and q = 8. Complexity is measured in average number of points in the search tree. 43 16 18 20 22 24 26 28 4.95 5 5.05 5.1 5.15 5.2 5.25 LQNF, q=8, n=1, C0=6 SNR (dB) lo g 8 (A ve rag e C PU tim e) Algorithm II Modified algorithm (a) The code LQNF is used for transmission 20 22 24 26 28 30 32 1.65 1.7 1.75 1.8 1.85 1.9 1.95 2 LNF, q=8, n=1, C0=6 SNR (dB) lo g 8 (A ve rag e C PU tim e) Algorithm II Modified algorithm (b) The code LNF is used for transmission Figure 17: The average complexity of Algorithm II compared with the modified algorithm in a system with one receive antenna and q = 8. Complexity is measured in average CPU time used. 44 7.2 Distributed Search Recently, new sphere decoding algorithms with lower decoding complexity have been developed, that take advantage of the zeros in the R-matrix; see e.g. the articles [4] and [3]. In the following we introduce a new approach to sphere decoding with a similar idea to utilize the zeros in the R-matrix. It is enabled by the structure of the lattice code LQNF . At the receiver we compute the complex matrices HX1,HX2, . . . ,HX8 and turn them into a real matrix B as described in Section 5.1. When the code LQNF is used for encoding and there is only one receive antenna, we have H = h1 = (h1, h2, h3, h4) and the matrix B is of the form:  ℜ(h1) −ℑ(h1) ℜ(h2) −ℑ(h2) ℜ(h3) −ℑ(h3) ℜ(h4) −ℑ(h4) ℑ(h1) ℜ(h1) ℑ(h2) ℜ(h2) ℑ(h3) ℜ(h3) ℑ(h4) ℜ(h4) ℜ(h2) −ℑ(h2) −ℑ(h1) −ℜ(h1) ℜ(h4) −ℑ(h4) −ℑ(h3) −ℜ(h3) ℑ(h2) ℜ(h2) ℜ(h1) −ℑ(h1) ℑ(h4) ℜ(h4) ℜ(h3) −ℑ(h3) ℜ(h3) ℑ(h3) ℑ(h4) −ℜ(h4) −ℜ(h1) −ℑ(h1) −ℑ(h2) ℜ(h2) ℑ(h3) −ℜ(h3) −ℜ(h4) −ℑ(h4) −ℑ(h1) ℜ(h1) ℜ(h2) ℑ(h2) ℜ(h4) ℑ(h4) ℜ(h3) ℑ(h3) −ℜ(h2) −ℑ(h2) −ℜ(h1) −ℑ(h1) ℑ(h4) −ℜ(h4) ℑ(h3) −ℜ(h3) −ℑ(h2) ℜ(h2) −ℑ(h1) ℜ(h1)   Let us denote the above matrix relating to row vector h1 as Bh1 . If there are n receive antennas, then H =   h1 ... hn   and B is a 8n× 8 matrix of the form B =   Bh1 ... Bhn   . The QR-decomposition is then computed for B and the R-matrix is given as input to the sphere decoder. Let us now take a look at the properties of the R-matrix. 45 Theorem 7.1. For the code LQNF , the R-matrix is of the form: R =   r1,1 r1,2 r1,3 r1,4 0 0 0 0 0 r2,2 r2,3 r2,4 0 0 0 0 0 0 r3,3 r3,4 0 0 0 0 0 0 0 r4,4 0 0 0 0 0 0 0 0 r5,5 r5,6 r5,7 r5,8 0 0 0 0 0 r6,6 r6,7 r6,8 0 0 0 0 0 0 r7,7 r7,8 0 0 0 0 0 0 0 r8,8   . Proof. We prove this by showing that the Gram–Schmidt procedure introduced in Chapter 2.3 gives the R-matrix the described form, when applied to matrix B. Let RGS denote the R-matrix obtained by the Gram–Schmidt procedure from B. RGS =   ‖u1‖ 〈b2, e1〉 〈b3, e1〉 . . . 〈bn, e1〉 0 ‖u2‖ 〈b3, e2〉 . . . 〈bn, e2〉 0 0 ‖u3‖ ... ... ... . . . 〈bn, en−1〉 0 0 . . . 0 ‖un‖   Now we prove that the upper right corner of RGS is all zeros. Note that the matrix B has the properties that the first four column vectors are all orthogonal with the last four column vectors, i.e. 〈bi,bj〉 = 0 for all i ∈ [1, 4] and j ∈ [5, 8]. Since 〈bj , ei〉 = 〈bj , ui‖ui‖〉 = 1 ‖ui‖〈bj ,ui〉, it suffices to show that 〈bj ,ui〉 = 0 for all i ∈ [1, 4] and j ∈ [5, 8]: 〈bj ,u1〉 = 〈bj ,b1〉 = 0, 〈bj ,u2〉 = 〈bj ,b2 − φ2,1u1〉 = 〈bj ,b2〉︸ ︷︷ ︸ =0 −φ2,1 〈bj ,u1〉︸ ︷︷ ︸ =0 = 0, 〈bj ,u3〉 = 〈bj ,b3 − φ3,2u2 − φ3,1u1〉 = 〈bj ,b3〉︸ ︷︷ ︸ =0 −φ3,2 〈bj ,u2〉︸ ︷︷ ︸ =0 −φ3,1 〈bj ,u1〉︸ ︷︷ ︸ =0 = 0, 〈bj ,u4〉 = 〈bj ,b4 − φ4,3u3 − φ4,2u2 − φ4,1u1〉 = 〈bj ,b4〉︸ ︷︷ ︸ =0 −φ4,3 〈bj ,u3〉︸ ︷︷ ︸ =0 −φ4,2 〈bj ,u2〉︸ ︷︷ ︸ =0 −φ4,1 〈bj ,u1〉︸ ︷︷ ︸ =0 = 0. 46 Since the upper right corner is all zeros, the R-matrix actually has form R = [ R1 0 0 R2 ] where R1 and R2 are upper diagonal matrices. The zeros in the upper right corner mean that in the sphere decoder search the first four values of xi found by the sphere decoder, where i = 8, . . . , 5, do not have any effect on what the next four values of xi, where i = 4, . . . , 1, will be. We can use this fact to divide the search into two parts. We denote y′1 = (y ′ 1, y ′ 2, y ′ 3, y ′ 4) T , y′2 = (y ′ 5, y ′ 6, y ′ 7, y ′ 8) T , x1 = (x1, x2, x3, x4) T and x2 = (x5, x6, x7, x8) T . |y′ −Rx|2 ≤ C ′0∣∣∣∣∣ [ y′1 y′2 ] − [ R1 0 0 R2 ][ x1 x2 ]∣∣∣∣∣ 2 ≤ C ′0∣∣∣∣∣ [ y′1 y′2 ] − [ R1x1 R2x2 ]∣∣∣∣∣ 2 ≤ C ′0 |y′1 −R1x1|2 + |y′2 −R2x2|2 ≤ C ′0 We can now divide the processing into two searches for four symbols instead of one search for 8 symbols. Distributed algorithm (Input C ′0, y ′, R, Output xˆ): Step 1 Obtain R1 and R2 from R. Step 2 Get x1 by performing Algorithm II with input C ′ 0, y ′ 1 and R1. Step 3 Set radius for the second sphere decoder: dC := C ′ 0 − |y′1 −R1x1|2 Step 4 Get x2 by performing Algorithm II with input dC , y ′ 2 and R2. Step 5 Return xˆ := [x1,x2].  In the first step R is divided into R1 and R2 as defined above. In the second step we use the original sphere decoder to obtain the first four elements of x. In step 3 we calculate a smaller radius for the second sphere decoder by subtracting the squared distance between x1 and y ′ 1 from C ′ 0. In step 4 we use the sphere decoder again to find the last four elements of x. 47 We could also leave out step 3 altogether and use the same squared radius for both searches. In this case the two searches could be done in parallel. The average number of points in the search tree for the Distributed algorithm is calculated as the sum of the points visited by the sphere decoders in steps 2 and 3. As can be seen from the Figures 18(a) and 18(b), the reduction in complexity is remarkable, especially with lower SNRs. This is due to the fact that the search is divided into two distinct parts. The original sphere decoder has an 8-level search tree. This means that every time it finds suitable values for the first four xi’s, it continues the search to the lower levels of the search tree. With the Distributed algorithm the first search finds the optimal solution for the first four xi’s using a 4-level search tree and only after that will the next search look for the last four values of xi. Also with an 8-level search tree the minimum amount of points in the search tree is 15, as with two 4-level searches the minimum is 2 ∗ 7 = 14. Remark 7.1. The form of the R-matrix depends on the order of the basis matrices. Remark 7.2. The code LNF cannot be used together with Distributed algo- rithm, because its R-matrix does not have zeros in the right upper corner. 48 14 16 18 20 22 24 26 0 10 20 30 40 50 60 70 LQNF, q=8, n=1, C0=6 SNR (dB) Av er ag e co m pl ex ity (# vi sit ed po int s) Algorithm II Distributed algorithm (a) Complexity measured in average number of points in the search tree 14 16 18 20 22 24 26 1.55 1.6 1.65 1.7 1.75 1.8 1.85 1.9 1.95 2 LQNF, q=8, n=1, C0=6 SNR (dB) lo g 8 (A ve rag e C PU tim e) Algorithm II Distributed algorithm (b) Complexity measured in average CPU time Figure 18: The average complexity of Algorithm II compared with that of the Distributed algorithm in a system with one receive antenna and q = 8. 49 References [1] E. Agrell, T. Eriksson, A. Vardy and K. Zeger: Closest point search in lattices. IEEE Transactions on Information Theory, Vol. 48, pp.2201–2214, August 2002. [2] S. M. Alamouti: A Simple Transmit Diversity Scheme for Wireless Com- munications. IEEE Journal on Selected Areas in Communications, Vol. 16, pp. 1451–1458, October 1998. [3] L. Azzam and E. Ayanoglu: Reduced Complexity Sphere Decoding for Square QAM via a New Lattice Representation. arXiv:0705.2435v1 [cs.IT], May 2007. [4] E. Biglieri, Y. Hong and E. Viterbo: On Fast-Decodable Space–Time Block Codes. arXiv:0708.2804v1 [cs.IT], August 2007. [5] J.W.S. Cassels: An Introduction to the Geometry of Numbers. Springer Ver- lag, 1971. [6] M.O. Damen, H. El Gamal and G. Caire: On Maximum-Likelihood Detec- tion and the Search for the Closest Lattice Point. IEEE Transactions on Information Theory, Vol. 49, pp.2389–2402, October 2003. [7] G. Foschini, G. Golden, R. Valenzuela and P. Wolniansky: Simplified Pro- cessing for High Spectral Efficiency Wireless Communication Employing Multi-Element Arrays IEEE Journal on Selected Areas in Communications, Vol. 17, pp. 1841–1852, November 1999. [8] Gene H. Golub and Charles F. Van Loan: Matrix Computations, 3rd ed. Johns Hopkins, 1996. [9] B. Hassibi and H. Vikalo: Maximum-likelihood decoding and integer least- squares: The expected complexity. Multiantenna Channels: Capacity, Coding and Signal Processing, (editors J. Foschini and S. Verdu), AMS 2003. [10] C. Hollanti, J. Lahtonen and H.-f. Lu: Maximal Orders in the Design of Dense Space–Time Lattice Codes. Accepted for publication in IEEE Trans- actions on Information Theory (2008). [11] M. Pohst: On the Computation of Lattice Vectors of Minimal Length, Suc- cessive Minima and Reduced Basis with Applications. ACM SIGSAM, vol. 15, pp. 37–44, 1981. [12] J. Proakis: Digital Communications, 3rd ed. McGraw–Hill, 1995. 50 [13] V. Tarokh, H. Jafarkhani and A. R. Calderbank: Space–Time Block Codes from Orthogonal Designs. IEEE Transactions on Communications, Vol. 45, pp.1456–1467, July 2002. [14] V. Tarokh, N. Seshadri and A. R. Calderbank: Space–Time Codes for High Data Rate Wireless Communication: Performance Criterion and Code Con- struction. IEEE Transactions on Information Theory, Vol. 44, pp.744–765, July 1998. 51