# A NEW LOW-POWER AND AREA EFFICIENT RAKE RECEIVER DESIGN WITHOUT INCURRING PERFORMANCE DEGRADATION

Hyung-Jin Lee and Dong Sam Ha

VTVT (Virginia Tech VLSI for Telecommunications) Lab Department of Electrical and Computer Engineering Virginia Tech, Blacksburg, VA 24061 Fax No. (540) 231-3362

*Abstract*—A new architecture for power and area efficient rake receivers for a third generation wireless WCDMA system is presented. The proposed approach based on parallel operations of code generators eliminate de-skew blocks of rake receivers and shares several blocks such that compensators, orthogonal variable spreading factor code generators and scrambling code generators among all fingers, which leads to power and area reduction. When the proposed architecture is applied to a rake receiver with four fingers, it reduces power dissipation by 55.2 % and the area by 38.1 % without any system performance degradation.

## I. INTRODUCTION

A rake receiver exploits the time diversity of multipaths [1,2]. A rake receiver has multiple fingers, and each finger processes an assigned multipath. Each finger consists of de-spreading blocks, correlators, code generators, compensators, de-skewers and a combiner. A rake receiver is complex in hardware and consumes a substantial amount of power. Several approaches have been proposed to reduce power dissipation of rake receivers for handset modem chips such as a method use of a strength reduction transformation [8], and look-ahead and relaxed look-ahead transformation methods [9]. Alternatively, an analog correlator was investigated to reduce power dissipation of rake receivers [10]. The analog correlator over-samples data using a sigma-delta A/D converter.

We investigated low-power design of rake receivers targeting a third generation WCDMA (wideband code division multiple access) wireless system [11] based on parallel operations of two code generators, OVSF (orthogonal variable spreading factor) code generators and scrambling code generators. The parallel operations of the two code generators enable the code generators to operate at a lower frequency, which reduces the power dissipation. The parallel operations also eliminate the need for de-skew blocks and enable the fingers to share several blocks such that compensators, orthogonal variable code generators and scrambling code generators. As a result, the total circuit complexity as well as power dissipation is also reduced. The proposed design reduces the total power by 55.2 % and the total gate count by 38.1 % for a four-finger rake. A novel aspect of our design is reduction of area as well as power dissipation without degradation of the performance, which is rarely achieved for most lowpower designs.

This paper is organized as follows. The architecture of conventional rake receivers for the WCDMA system is given in section II. Section III describes the structure of the proposed rake receivers. Section IV presents experimental results of the proposed rake receiver design and compares its performance with a conventional rake receiver. Finally, Section V concludes this paper.

# **II. PRELIMINARIES**

In this section, we describe the architecture of conventional rake receivers for the WCDMA system. We call a conventional rake receiver as serial rake receiver (SRR) in contrast to our parallel rake receiver (PRR).

# A. Review of conventional rake receivers

In WCDMA, data are spread with orthogonal variable spreading factor (OVSF) codes and are scrambled with a complex (rather than real) scrambling-code. In-phase and quadrature-phase channel data are spread with in-phase and quadrature-phase OVSF-codes, respectively. The transmitted data are expressed as in (1), where I and Q represent in-phase and quadrature-phase data, respectively. The in-phase and quadrature OVSF-codes are denoted as  $C_i$  and  $C_q$ , respectively, in the expression, and the scrambling-cods are as  $S_i$  and  $S_q$ .

$$D = D_i + j \cdot D_q = (I \cdot C_i + j \cdot Q \cdot C_q) \cdot (S_i + j \cdot S_q)$$

$$= (I \cdot C_i \cdot S_i - Q \cdot C_q \cdot S_q) + j \cdot (I \cdot C_i \cdot S_q + Q \cdot C_q \cdot S_i)$$
(1)

Then, the received data could be expressed as in (2)

$$r = r_i + j \cdot r_q = D \cdot \alpha \cdot \exp(j\theta) \cdot \exp(j\phi) + \eta_0$$
(2)

In the equation,  $\alpha$  and  $\theta$  denote the gain and the phase rotation of the channel, respectively, and  $\phi$  is the frequency offset between the transmitter and the receiver.  $\eta_0$  denotes the AWGN (additive white Gaussian noise) of the channel. Ignoring the noise term, the in-phase and quadrature-phase channel data of received data are estimated as:

$$\hat{I} = \sum (r_i \cdot C_i \cdot S_i + r_q \cdot C_i \cdot S_q) \cos(\theta + \phi)$$

$$+ \sum^{SF} (r_q \cdot C_i \cdot S_i - r_i \cdot C_i \cdot S_q) \cdot \sin(\theta + \phi)$$

$$\hat{Q} = -\sum^{SF} (r_i \cdot C_i \cdot S_q + r_q \cdot C_q \cdot S_q) \cdot \cos(\theta + \phi)$$

$$+ \sum^{SF} (r_q \cdot C_q \cdot S_i - r_i \cdot C_q \cdot S_q) \cdot \sin(\theta + \phi)$$
(4)

In the two equations, SF denotes a spreading factor. For details on the derivation of the equations, refer to [1]-[3].

The conventional structure of fingers for processing of in-phase and quadrature-phase data are based on the above two equations and is shown in Figure 1.



Figure 1: Block Diagram of a Finger for In-Phase and Quadrature-Phase Channel Data

The de-spreading blocks and the correlators for the conventional architecture operate at the chipping rate of 3.84 MHz, while the compensators operate at a frequency lower than by SF (spreading factor) times. The OVSF and scrambling code generators (which are omitted in the figure) also operate at the chipping rate. The input data are over-sampled by eight times, which requires the input registers operate at 30.72 MHz (8\*3.84 MHz).

A rake receiver has typically three to five fingers to exploit multipaths. As each multi-path data incurs a path delay during the propagation, the output data of fingers should be delayed accordingly by de-skew blocks before they are combined. The block diagram of a conventional SRR is shown in

Figure 2. It necessitates eight registers (to cover the case of SF=4) for a de-skewer to hold the data.



Figure 2: Structure of a Conventional SRR with Four Fingers

Each de-spreading block has a dedicated OVSF code generator and a scrambling code generator. Four blocks (de-spreading block, correlator, scrambling code generator, and OVSF code generator) running at the chipping rate of 3.84 MHz consumes over 50 % of the total power for a finger. It suggests that a substantial power saving is possible if we can reduce the operating frequency of those four blocks. We propose to adopt parallel operations of those blocks to reduce the operating clock frequency.

### III. PROPOSED METHOD

We propose a new architecture for rake receivers that save power and area. We illustrate the architecture using a rake receiver with four fingers. We assume that the maximum delay size of a multipath is 32 chips, which is equivalent to  $8.32 \ \mu s$ . The key idea of the new architecture is parallel operations of major blocks such that de-spreading blocks, correlators, scrambling code generators, and OVSF code generators and sharing of a de-spreading block among the four fingers. The new architecture is called parallel rake receiver (PRR) in this paper.

#### A. Block diagram of fingers

The block diagram of a finger for our PRR is shown in Figure 3. For simplicity, it shows only the inphase channel. The OVSF and the scrambling code generators generate four bits on each clock (instead of one bit for the conventional SRR). Hence, the two generators operate at a speed four times lower than the chipping rate, i.e., at 960 KHz. Four on-time I and Q channel data (of a multipath) stored in the input buffers are de-spread in parallel. The input buffers operate at the chipping rate of 3.84 MHz, but the de-spread block and the correlator operate at 960 KHz. The low operating frequency of those blocks reduces the power dissipation.



Figure 3: Block Diagram of In-Phase Channel for Basic PRR Fingers



Figure 4: Architecture of the Proposed PRR

#### **B.** Architecture of the proposed PRR

The architecture of the proposed PRR with four fingers is shown in Figure 4. Each input register holds four on-time data of a multipath, and four registers running at the chipping rate of 3.84 MHz are necessary to take care of four multipaths. There are eight copies of de-spreading blocks to cover the maximum 32 chip delays, and the de-spreading blocks are running at 480 KHz (4 \* (3.84 MHz / (4 \* 8)) - supporting four fingers). A de-spread block processes four on-time data (equivalently 4 chip delays) of each multipath at a time, and goes through the data of all four multipaths which should be spread with the same OVSF and scrambling codes. A correlator running at 960 KHz is responsible for processing a multipath data assigned to it. Therefore, the output of a de-spread block is routed to the corresponding correlator using an 8x1 multiplexer.

The proposed PRR saves hardware in three areas. First, two multipaths are apart by at least one chipping rate, so that the outputs of correlators, i.e., symbols, are generated at different chipping clocks. Therefore, the compensator can be time-shared to reduce the hardware as shown in Figure 4. Note that the compensator operates at the symbol rate, so the speed of a compensator is not critical. Second, since the data of a multipath is transferred to the corresponding despreading block and the correlator on time, it eliminates the need for de-skew blocks to save the hardware. Finally, the PRR requires only one OVSF-code generator and one scrambling-code generator, which are shared by all four fingers.

#### IV. EXPERIMENTAL RESULT

We coded the conventional SRR and the proposed PRR in VHDL and synthesized the two rake receivers targeting TSMC 0.18  $\mu$ m CMOS technology with supply voltage of 1.8 V. Both rake receivers have four fingers. The power estimation was performed at the gate level under the spreading factor 8. The experimental results are shown in Table 1. The gate count in the table is the number of equivalent two-input NAND gates.

Table 1: Performance of the Two Rake Receivers

|        | Power<br>(µW) | Gate<br>Count |  |
|--------|---------------|---------------|--|
| SRR    | 1799.0        | 561 K         |  |
| PRR    | 805.1         | 347 K         |  |
| Saving | 55.2 %        | 38.1 %        |  |

Table 2: Contribution of Individual Blocks

| Block          | SRR      |          | PRR      |          |
|----------------|----------|----------|----------|----------|
|                | Gate     | Freq.    | Gate     | Freq.    |
|                | Count    | (MHz)    | Count    | (MHz)    |
| De-spreading   | 10.0 K   | 3.84     | 172.8 K  | 0.48     |
|                | (1.8%)   |          | (49.76%) |          |
| Correlator     | 72.1 K   | 3.84     | 82.8 K   | 0.96     |
|                | (12.6%)  |          | (23.84%) |          |
| Compensator    | 195.5 K  | 480      | 48.9 K   | 1.92     |
| (SF=8)         | (34.9 %) |          | (14.08%) |          |
| Scrambling-    | 21.8 K   | 3.84     | 5.7 K    | 0.96     |
| code generator | (3.88%)  |          | (1.64%)  |          |
| OVSF-code      | 13.5 K   | 3.84     | 3.9 K    | 0.96     |
| generator      | (2.4 %)  |          | (1.12%)  |          |
| De-skewer      | 21.6 K   | 1.92     | -        | -        |
|                | (41.7%)  | (SF = 8) |          |          |
| Miscellaneous  | 32.0 K   | -        | 33.2 K   |          |
|                | (5.71%)  |          | (9.56%)  | -        |
| Total          | 561.1 K  | -        | 347.3 K  | 805.1 uW |
|                | (100%)   |          | (100%)   | (100%)   |

From the table, the proposed PRR reduces the power dissipation by 55.2 % compared with the SRR. The major source of power saving attributes to the elimination of de-skewer blocks and reduction of the operating frequencies for correlators and code generators. In fact, the circuit sizes as well as operating frequencies are reduced by about 4 times for a scrambling-code generator and an OVSF-code generator owing to the sharing of those code generators among four fingers. As expected, the proposed PRR reduces the equivalent NAND2 gate count by 38.1% from 561K to 347K. The experiment confirms the salient aspect of our design; it reduces the area as well as the power dissipation.

Finally, we show the gate counts of each major block for the two rake receivers and the operating frequencies of those blocks in Table 2. As shown in the table, the proposed PRR reduces the areas of the compensator block, the two code generators, and the deskew block compared with the conventional SRR, while the areas of the dispreading block and the correlator block increase. The area of the de-spreading block for PRR increases by 1.7 times due to the increased circuit complexity for parallelization. We believe that the overall power dissipation of de-spreading blocks would increase for the PRR, even though the operating frequency of de-spreading blocks is reduced by eight times.

## V. CONCLUSION

We presented a new architecture for power and area efficient rake receivers for a third generation wireless WCDMA system. The proposed approach based on parallel operations of code generators eliminates deskew blocks and shares several blocks such that compensators, OVSF generators and scrambling code generators among all fingers, which leads to power and area reduction. When the proposed architecture is applied to a rake receiver with four fingers, it reduces power dissipation by 55.2 % and the area by 38.1 % without any system performance degradation.

#### REFERENCE

- G. Povey, P. Grant and R.D. Pringle, "a decisiondirected spread spectrum RAKE receiver for fast fading mobile channels," IEEE Trans. On Vehicular Technology, vol. 45, pp. 491-502, August 1996.
- [2] T.S. Rappaport, Wireless Communications Principles & Practice, Prentice Hall, 1999
- [3] J.G. Proakis, Digital Communications, McGraw-Hill, 1995.
- [4] R. Price and P. Green, "A communication technique for multipath channels," Proceeding of the IRE, pp. 555-570, 1998.
- [5] M. Chandrakasam, M. Potkonjak, R. Mehra, J. Rabaey and R. Broderson, "Optimizing power

using transformation," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 14, no. 1, pp. 12-31, January 1995.

- [6] K.-C. Liu, W.-C. Lin, and C.-K. Wang, "A Pipelined Digital Differential Matched filter FPGA Implementation and VLSI Design," Proceeding of IEEE Custom Integrated Circuits Conference, pp. 75-78, 1996.
- [7] J. Rabaey, "Exploring the power dimension," Proceeding of IEEE Custom Integrated Circuits Conference, San Diego, Cal., pp. 215-220, May 1996.
- [8] R. Baghaie and T. Laakso, "Implementation of lowpower CDMA RAKE receivers using strength reduction transformation," Proceeding of IEEE Nordic Signal Processing Symposium, NORSIG'98, Vigsø, Denmark, pp. 169-172, June 1998.
- [9] R. Baghaie, "Application of transformation techniques in CDMA receivers," Proceeding of IEEE Midwest Symposium on Circuits and Systems, MWSCAS'99, Las Cruces, New Mexico, August 1999.
- [10] M. Neitola and T. Rahkonen, "An analog correlator for a WCDMA receiver", Proceeding of 17<sup>th</sup> NORCHIP conference, pp. 86-9, 1999.
- [11] Third Generation Partnership Project Technical Specification Group Radio Access Network, "Spreading and modulation (FDD)," TS 25.213 V3.5.0 (2001-03).