Design and Analysis of an Area Efficient and Low Power NEW-R2MDC FFT for MIMO OFDM in wireless Communication

N. Kirubanandasarathy¹, G. P. Ramesh²
¹,²Professor, Department of Electronics and Communication Engineering, ¹Syed Ammal Engineering College, Ramanathapuram, India
²St. Peters University, Chennai, India
E-mail: drnksarathy@gmail.com, rameshgp@yahoo.co.in

Abstract - In this paper, an area-efficient low power fast fourier transform (FFT) processor is proposed for multi input multi output -orthogonal frequency division multiplexing (MIMO-OFDM) in wireless communication system. It consists of a modified architecture of radix-2 algorithm which is described as new radix-2 multipath delay commutation (New-R2MDC). Orthogonal frequency-division multiplexing is a popular method for high data rate wireless transmission. This paper describes the very large scale integration (VLSI) design of an area efficient new-r2mdc FFT for MIMO OFDM system targeted to future wireless communication systems. The very high speed integrated hardware description language (VHDL) simulation results have been tested practically by implementing in the Altera DE-2 field programmed gate array (FPGA) development board. Also the existing OFDM system has been tested with these FFT algorithms and their performances were analyzed with respect to occupation of area in FPGA and power consumption. A low-power and area efficient architecture enables the real-time operations of MIMO OFDM system.

Keywords: New-radix-2 multipath delay commutation, frequency division multiplexing, multi input multi output – orthogonal frequency division multiplexing, inverse fast fourier transform, fast fourier transform, discrete fourier transform

1. INTRODUCTION
MIMO-OFDM is the efficient solution for transmitting and receiving the data over the long distance. The sub-carrier frequency has been chosen in our proposed MIMO OFDM transceivers so that cross-talk between the sub-channels are eliminated, hence the inter carrier guard bands are not required [1]. This greatly simplifies the design of both the transmitter and the receiver; unlike conventional frequency division multiplexing (FDM), a separate filter for each sub-channel is not required [2]. The orthogonally allows for efficient modulator and demodulator implementation using the FFT algorithm [3]. OFDM transceiver is popular for wideband communications today by way of low-cost MIMO OFDM in wireless telecommunication system. It requires very accurate frequency synchronization between the receiver and they have reduced the complexity [4]. In transmitter; with frequency deviation, the sub-carriers shall no longer be orthogonal, causing inter-symbol interference (ISI) [5]. The 5/6 coding rate would be not effective for error correcting by a viterbi decoder [6]. This paper describes the VLSI implementation of the proposed new-R2MDC for MIMO OFDM systems, i.e., modified radix-2 multipath delay commutation pipeline FFT based MIMO OFDM system.
The radix-2 algorithm with multi delay commutation architecture is to support 4-channel 8, 16, 32, 64, 128, 512, 1024 and 2048-point FFT operations [7, 8]. We compare this proposed architecture with existing 8-point radix 2, radix 4 FFT and existing R2MDC FFT and also give the design and implementation results of the proposed FFT processor.

2. OVERVIEW OF MIMO OFDM
The general transceiver structure of MIMO OFDM is presented in Figure 1. The system consists of \( N \) transmitter antennas and \( M \) receiver antennas. According to [9] and [10], the cyclic prefix is assumed to be a longer than the channel delay spread. The OFDM signal for each antenna is obtained by using IFFT and can be detected by FFT. There are two methods widely used for transmitting MIMO data. If the channel has a negligible error rate, we can send several data simultaneously over multiple antennas. This is known as spatial multiplexing, which utilizes the spectrum very efficiently.

In contrast, if the environment has high error rate, we transmit the same data over multiple antennas. This is called as space-time coding. The purpose of this approach is to increase the diversity of MIMO to combat signal fading. The essential purpose of a MIMO system is to determine which antenna is corresponding to which data on the receiver side. As shown in Figure 1, Rx1 receives data from all the transmitter antennas, Tx1, Tx2, Tx3 and Tx4. Thus, we must have a special decoding algorithm to identify which antenna has transmitted which data to Rx1. \( N \times M \) MIMO-OFDM: \( N \) indicates the number of transmitter antennas and \( M \) indicates the number of receiver antennas, respectively. For example, \( 4 \times 4 \) MIMO-OFDM has 4 transmitter antennas and 4 receiver antennas as shown in Figure 1.

OFDM is a multi-carrier system where data bits are encoded to multiple sub-carriers. Unlike single carrier systems, all the frequencies are sent simultaneously in time. OFDM offers several advantages over single carrier system like better multipath effect immunity, simpler channel equalization and relaxed timing acquisition constraints. But it is more susceptible to local frequency offset and radio front-end non-linearity [11]. The frequencies used in OFDM system are orthogonal. Neighboring frequencies with overlapping spectrum can therefore be used [12].

3. IFFT/FFT ALGORITHM
In this section, a brief overview of IFFT and FFT algorithms is provided to be effectively used in OFDM applications. The N-point Discrete Fast Fourier Transform (DFT) is defined as:

\[
X[k] = \sum_{n=0}^{N-1} x[n] W_N^{nk}, \quad k = 0,1,...,N-1
\]  

Figure 1 MIMO-OFDM architecture
where $W^{nk}_N = e^{-j2\pi nk/N}$, $0 \leq k \leq N-1$ is the DFT coefficient.

$X(k)$ is the $k$th harmonic and $x(n)$ is the $n$th input sample. Direct DFT calculation requires a computational complexity of $O(N^2)$. By using The Cooley–Tukey FFT algorithm, the complexity can be reduced to $O(N \log N)$. The Cooley-Tukey FFT is the most universal of all FFT algorithms, because of any factorization of $N$ is possible.

The Cooley–Tukey algorithm is based on a divide-and-conquers approach in the frequency domain and therefore is referred to as decimation-in-frequency (DIF) FFT. The DFT formula is split into two summations:

$$X[k] = \sum_{n=0}^{N-1} x[n] W^{nk}_N + \sum_{n=N/2}^{N-1} x[n] W^{nk}_N$$

$$= \sum_{n=0}^{N/2-1} x[n] W^{nk}_N + \sum_{n=0}^{N/2-1} x[n + N/2] W^{nk}_N W^{Nk/2}_N$$

$$= \sum_{n=0}^{N/2-1} x[n] W^{nk}_N + \sum_{n=0}^{N/2-1} x[n + N/2] W^{nk}_N W^{Nk/2}_N$$

And $W^{Nk/2}_N = (-1)^k$

$$X[k] = \sum_{n=0}^{N/2-1} \left( x(n) + (-1)^k x(n + N/2) \right) W^{nk}_N$$  \hspace{1cm} (2)$$

$X[k]$ can be decimated into even-and odd indexed frequency samples: $X[2k] = \sum_{n=0}^{N/2-1} \left( x(n) + (n + N/2) \right) W^{2nk}_N$

$$= \sum_{n=0}^{N/2-1} \left( x(n) + (n + N/2) \right) W^{nk}_N$$  \hspace{1cm} (3)$$

$X[2k + 1] = \sum_{n=0}^{N/2-1} \left( x(n) - (n + N/2) \right) W^{2nk}_N$

$$= \sum_{n=0}^{N/2-1} \left( x(n) - (n + N/2) \right) W^{nk}_N$$  \hspace{1cm} (4)$$

The computational procedure can be repeated through decimation of the N/2-point DFTs $X(2k)$ and DFTs $X(2K+1)$. The entire algorithm involves log2N stages, where each stage involves N/2 operation units (butterflies). The computation of
the N point DFT via the decimation-in-frequency FFT, as in the decimation-in-time algorithm requires \((N/2)\log_2 N\) complex multiplication and \(N\log_2 N\) complex addition [13].

4. Proposed New-R2MDC Architecture
The radix-2 butterfly processor consists of a complex adder and complex subtraction. Besides that, an additional complex multiplier for the twiddle factors \(W_N\) is implemented. The complex multiplication with the twiddle factor requires four real multiplications and two add/subtract operations as shown in Figure 2.

The new radix-2 multipath delay commutation (NEW-R2MDC) is one of the commutated architectures of radix-2 FFT algorithm which is used to commutate the values as fast as possible in order to process the values and to commutate the FFT inputs, the architecture shown in the Figure 3 is consists of different blocks which must be used in the new R2MDC.

![Basic Butterfly computation](image)

One of the most straightforward approaches for pipeline implementation of radix-2 FFT algorithm is new radix-2 multipath delay commutation (NEW-R2MDC) architecture. It is the simplest way to rearrange data for the FFT/IFFT algorithm, the input data sequence are broken into two parallel data stream flowing forward, with correct distance between data elements entering the butterfly scheduled by proper delays [14]. At each stage of the 8-point FFT in new-r2mdc architecture, half of the data flow is delayed via the memory (Register) and processed with the second half data stream [15]. The A input comes from the previous component twiddle factor multipliers (TFM). The B output is fed to the next cycles, the multiplexors select the output of the adders/subtractors (position “1”), the butterfly computes a 2-point DFT with incoming data and the data stored in the feedback registers.

The architecture of butterfly I (BF I) and butterfly II (BF II) supporting two receive chains is shown in Figure 4 (a) and Figure 4 (b). In BF I structure the sample routing multiplexers and demultiplexers at the input and output of the butterfly random access memories (BF-RAMs) are controlled based on \(c_2\) and \(c_3\) control signals while the computation unit is controlled by \(c_1\) control signal. Depending on the programming of number of receive chains, the extra BF_RAMs are enabled. Based on the requirement extra buffers can be extended to the existing BF structure. Since the handling \(-1\), \(+j\) and \(-j\) multiplication is handled inside the BF II structure, two control signals \(c_1\) and \(c_2\) are used in the basic computation unit. The multiplexers and the demultiplexers are controlled by \(c_3\) and \(c_4\) control signals. The product with \(-j\) term is implemented by swapping the real and imaginary part considering the sign of the sample. The algorithm used here is to commutate the radix-2 algorithm in the IFFT architecture [2]. In order to optimize the processor, the proposed shift and add method that eliminates the non-trivial complex multiplication with the twiddle factors \((W_8^1, W_8^3)\) and implements the processor without complex multiplication. The proposed butterfly processor performs the multiplication with the trivial factor \(W_8^2 = -j\) by switching from real to imaginary part and imaginary to real part, with the factor \(W_8^3\) by a simple cable. With the non-trivial factors \(W_8^1 = e^{j\pi/4}\), \(W_8^3 = e^{j3\pi/4}\), the processor realize the multiplication by the factor \(1/\sqrt{2}\) using hardwired shift-and-add operation as shown in Figure 5.
Figure 3 Proposed Architecture block with New-R2MDC FFT

Figure 4 (a) BF I Structure and Figure 4 (b) BF II Structure

Figure 5 Butterfly processor with no complex multiplication
5. Results and Discussion

The prime objective is to construct a FFT in order to have low power consumption and lesser area. The parameters (i) power consumption (ii) area occupancy were given due consideration for comparing the proposed circuit with other FFTs. The experimental results analysis consists of six different types of architectures such as radix-2, radix-4, split radix, mixed radix4/2, r2mdc, and new r2mdc FFT that can be implemented in the Altera Cyclone II DE2 FPGA. We have designed all coding using hardware description language (HDL). To get power, and area report, we use XILINX ISE design suite 10.1 as synthesis tool and modelsim 6.3c for simulation. The purpose is to determine the resource usage of this proposed design attempts to eliminate the complex multiplication, hence avoid this expensive operation of multiplication and consumes less chip area. The proposed new-r2mdc FFT gives better result than radix-2 FFT and existing r2mdc FFT in terms of area and power consumption as shown in the Table 1.

The simulation results for various FFT algorithms have been tested practically by implementing in the Altera DE-2 FPGA development board. The Quartus - II tool is used to download the design into FPGA development board. In the FPGA board, the reset signal input is connected to the rightmost switch. For the set binary inputs at the remaining switches, after the process in the FPGA, the outputs are seen in light emitting diode (LED) displays in the board. Also these FPGA outputs can be verified with simulation results obtained using Modelsim 6.3c. The FPGA board has developed to verify their circuit behavior and implementation of MIMO OFDM in wireless telecommunication system.

Table 1: Comparison results of proposed new-r2mdc FFT with existing r2mdc and radix-2 architecture

<table>
<thead>
<tr>
<th>Methods</th>
<th>Slices</th>
<th>LUTs</th>
<th>Power(W)</th>
</tr>
</thead>
<tbody>
<tr>
<td>New-R2MDC</td>
<td>149</td>
<td>145</td>
<td>0.47</td>
</tr>
<tr>
<td>R2MDC</td>
<td>197</td>
<td>152</td>
<td>1.103</td>
</tr>
<tr>
<td>Radix-2</td>
<td>320</td>
<td>432</td>
<td>2.179</td>
</tr>
</tbody>
</table>

6. CONCLUSION

In this work, several FFT algorithms such as radix-2, r2mdc and the proposed new r2mdc FFT were designed using VLSI design process and their performances were analyzed. From the results, it was observed that the proposed new r2mdc uses least numbers of configurable logic block (CLB) slices and save the area of approximately 10% and it consumes less than 20% of power when compared to other FFT. It is seen that the new r2mdc FFT algorithm provides lesser area and low power consumption. The VHDL simulation results have been tested practically by implementing in the Altera DE-2 FPGA development board. Also the existing OFDM system has been tested with these FFT algorithms and their performance was analyzed with respect to occupation of area in FPGA and power consumption. We conclude that the proposed new-r2mdc architecture is taken a low area and less power than the existing radix-2 and r2mdc FFT algorithm architecture. The proposed architecture is shows that it can be used in for low power applications such as MIMO-OFDM in wireless communication system.
REFERENCES


