# AN AREA AND POWER EFFICIENT RAKE RECEIVER ARCHITECTURE FOR DSSS SYSTEMS

Hyung-Jin Lee and Dong Sam Ha

VTVT (Virginia Tech VLSI for Telecommunications) Lab Department of Electrical and Computer Engineering Virginia Tech, Blacksburg, VA 24061 E-mail: {hlee, ha}@vt.edu

*Abstract* – In this paper, an area and power efficient rake receiver architecture is proposed for base and mobile-stations employing the direct sequence spread spectrum (DSSS) technique. One common parallel de-spreader provides precomputed sub-symbols to fingers for the proposed design, and, hence, each finger can operate at a lower clock speed to save the power. Our simulation results indicate that the proposed rake architecture for a WCDMA system reduces power dissipation by 37 % and the circuit complexity by 28 % compared with that of a conventional rake receiver.

#### I. INTRODUCTION

Direct Sequence Spread Spectrum (DSSS) is one of the most pervasive communication schemes adopted for wireless communications. CDMA or WCDMA systems based on the DSSS technique support multiple users and high data rate. However, a high data rate often increases the circuit complexity and shortens the battery life. One of the key blocks for a DSSS system is a rake receiver, which is the most complex and dissipates a large amount of power. Therefore, it is critical to reduce the circuit complexity of a rake receiver as well as power dissipation. Several researchers proposed low-power design of rake receivers [1]-[3], and we also proposed a rake receiver design with low-power dissipation and low-hardware complexity in [4]. In this paper, we present a new architecture that reduces the hardware complexity for systems with long multipath latency. Like our earlier design, the salient feature of our architecture is reduction of power dissipation as well as hardware complexity, which is often difficult to achieve in a low-power design.

### II. MOTIVATION

Data at the transmitter side are spread by multiplying a fixed pseudo random sequence, often called spreading sequence, in a DSSS system. For example, the spreading sequences for 3G WCDMA systems are an OVSF (Orthogonal Variable Spreading Factor) and a scrambling code. A RAKE de-spreads a received multipath signal by multiplying the received signal by the spreading sequence. A conventional RAKE structure with four fingers (which handles four multipaths) is shown in Figure 1. A code generator produces a spreading sequence, one bit per clock, and a de-spreading block performs de-spreading and correlation operations. A de-spread signal is compensated for the gain and the frequency offset and is de-skewed appropriately before it is combined with other multipath signals. Because all blocks operate at the chipping rate, a conventional rake receiver dissipates a large amount of power. Further, multiple copies (viz. four copies for the RAKE in Figure 1) of each type of block results in high circuit complexity. Our earlier work reduces both complexity and power by operating on multiple bits per clock for the code generators (instead of one bit as in a conventional RAKE receiver) [4]. One limitation of our previous design is that the hardware complexity increases substantially as the delay between the first and the last multipath signals increases. Our new architecture aims to addresses this problem.



Figure 1: Structure of a conventional RAKE with four fingers

# III. PROPOSED RAKE RECEIVER ARCHITECTURE

### A. Parallel Operation

We illustrate the concept of the proposed design for a single RAKE with three fingers with the following example. In Figure 2, consider the sequence of data  $d_i$  on two channels<sup>1</sup> with the de-spreading code sequences  $C1_i$  and  $C2_i$  for the first multipath P1, de-spreading code sequences  $D1_i$  and  $D2_i$  for the second multipath P2, and despreading code sequences  $E1_i$  and  $E2_i$  for the third multipath P3. De-spreading for a conventional RAKE is performed by multiplying one data item with one bit of despreading code on each clock and accumulating the result, i.e.,  $\Sigma d_iC1_i$  for Channel 1 and  $\Sigma d_iC2_i$  for Channel 2. Note that the de-spreading occurs simultaneously for both channels, and the de-spreading block operates at the

<sup>&</sup>lt;sup>1</sup> The WCDMA system supports up to seven data channels for each user. [5]

chipping rate. A key feature of the proposed architecture is to process multiple bits for each clock. For example, a code generator generates four bits on each clock for our system, and the first four data items for Channel 1 are despread in one clock cycle. In other words, a de-spreading block, called **parallel de-spreader** in this paper, computes  $d_0 \cdot C1_0 + d_1 \cdot C1_1 + d_2 \cdot C1_2 + d_3 \cdot C1_3 = 7 \cdot 12 \cdot 15 + 13 = .7$  for Channel 1 during the first clock cycle and a similar computation for Channel 2. So a parallel de-spreader operates at one quarter of the chipping rate, which saves power. Note that the larger number of bits required for parallel operation increases the hardware complexity rapidly. As a compromise, we employed four bits in parallel for our system.

A parallel de-spreader receives four data items and generates 16 sub-symbols, one for each combination of a 4-bit spreading code. Note that the number of sub-symbols can be reduced to eight as to be explained later. In our system, a de-spreader slides along the data sequence with a window size of four. When the de-spreader points to the four data items 0, 1, 2, and 3, the finger for Channel 1 picks up the sub-symbol corresponding to code (1,-1,-1,1) and the finger for Channel 2 picks up the sub-symbol corresponding to the code (1,1,-1,-1). Then the despreader slides to the next four data items 4, 5, 6 and 7, and the two fingers pick up the two appropriate sub-symbols and accumulate the results.

So far, our discussion has been confined to one path, so let us extend it for the multipath case. Suppose that there are three multipaths whose starting clocks are at i=0 for multipath P1, i=4 for P2, and i=5 for P3 as shown in Figure 2. Note that starting clock for the multipath signals is the same for both channels. The spreading codes D1<sub>i</sub> and D2<sub>i</sub> for multipath P2 in Figure 2 correspond to Channel 1 and Channel 2 and are delayed versions of spreading code C1<sub>i</sub> and C2<sub>i</sub>, respectively. A RAKE needs three fingers for each channel to process three multipaths.

Unlike the single multipath case, the de-spreader slides along the input data one item per clock. When the despreader points to the four items 4, 5, 6, and 7, two subsymbols corresponding to C1 and C2 of multipath P1 and two sub-symbols corresponding to D1 and D2 of multipath P2 are picked up and accumulated by the corresponding fingers. At this point, the utilization of the de-spreader block increases to four sub-symbols. When the de-spreader points to the items 5, 6, 7, and 8, the two fingers for multipath 3 pick up the corresponding two sub-symbols and accumulate them. Note that the de-spreader operates at the chipping rate, as it slides one item per clock. However, the clock rate for each finger is still one quarter of the chipping rate to save power.

# B. De-spreader

Figure 3 shows a 4-bit parallel de-spreader block which processes four bits in parallel. Hereafter, we express spreading codes "1" and "–1" as logic values "0" and "1", respectively, for convenience. There are 16 possible combinations for a 4-bit code, and, hence, a straightforward design needs 16 outputs corresponding to the 16 combinations. However, by noting that the last half of the computations are negation of the first half, it is sufficient to compute only the first eight combinations ranging from 0000 to 0111. For example, if the resultant sub-symbol value for code 0110 is 57, the sub-symbol value for code 1001 is -57. Therefore, our 4-bit despreader computes only the first eight sub-symbol values for codes from 0000 to 0111.

The previous subsection states that a 4-bit de-spreader operates at the chipping rate to cover multipaths. In fact, a received signal for a DSSS system is usually oversampled for fine time tracking (such as early, on-time, and late signals) and multipath resolution. In this paper, we illustrate design of a 4-bit parallel de-spreader for the case of eight times oversampling of a received signal. The minimum input buffer size to cover four data items is 25, which is eight samples per item for three data items plus one additional sample. The parallel de-spreader receives four data items from the input buffer at locations 0, 8, 16, and 24, and it processes (i.e., multiplies and adds) the data with the eight codes to obtain a sub-symbol. A 4-bit parallel adder is shown in Figure 3. The adder is constructed in multi-level to avoid large fanouts at the first level. Note that the de-spreader operates at the sampling rate of a received signal, which is eight times higher than the chipping rate. However, the clock rate for fingers is still one quarter of the chipping rate.

| i               | 0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  | 8  | 9  | 10 | 11 | 12 | 13 | 13 | 13 |
|-----------------|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|
| di              | 7  | 12 | 15 | 13 | 4  | 2  | 1  | 8  | 13 | 13 | 6  | 6  | 6  | 6  | 6  | 6  |
|                 | P1 |    |    |    | P1 |    |    | P1 |    |    | P1 |    |    |    |    |    |
| C1 <sub>i</sub> | 1  | -1 | -1 | 1  | 1  | -1 | 1  | -1 | 1  | -1 | 1  | 1  | 1  | 1  | 1  | 1  |
| C2 <sub>i</sub> | 1  | 1  | -1 | -1 | -1 | 1  | 1  | -1 | 1  | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
|                 |    |    |    |    |    | P2 |    |    | P2 |    |    |    | P2 |    |    |    |
| D1 <sub>i</sub> |    |    |    |    | 1  | -1 | -1 | 1  | 1  | -1 | 1  | -1 | 1  | -1 | 1  | 1  |
| D2 <sub>i</sub> |    |    |    |    | 1  | 1  | -1 | -1 | -1 | 1  | 1  | -1 | 1  | -1 | -1 | -1 |
|                 |    |    |    | P3 |    |    |    | P3 |    |    |    | P3 |    |    |    |    |
| E1 <sub>i</sub> |    |    |    |    |    | 1  | -1 | -1 | 1  | 1  | -1 | 1  | -1 | 1  | -1 | 1  |
| E2 <sub>i</sub> |    |    |    |    |    | 1  | 1  | -1 | -1 | -1 | 1  | 1  | -1 | 1  | -1 | -1 |

Figure 2: Example data and de-spreading codes



Figure 3: A 4-bit Parallel De-spreader

#### C. Proposed Rake Receiver

A conventional rake receiver consists of multiple rakes to cover multiple data and control channels, and a conventional rake has typically three to eight fingers to process multipath signals. So there are a large number of fingers for a conventional rake receiver, and each finger performs its own de-spreading operation independent from other fingers. Consequentially, a conventional rake receiver is complex in hardware and dissipates a substantial amount of power.

The key idea for the proposed rake receiver is to share a parallel de-spreader over all the fingers. A high degree of sharing reduces the hardware complexity as well as power dissipation due to the large number of fingers for a rake receiver. A straightforward implementation of our rake receiver would be to attach all the fingers to a parallel despreader, so that each finger picks up appropriate subsymbols from the de-spreader at certain clocks. However, due to a large number of fingers, such an implementation causes congestion of routing wires at the de-spreader outputs. Further, we noticed that it increases power dissipation due to large capacitive loads. We address the problem by cascading finger blocks as illustrated in Figure 4. The rake receiver in Figure 4 is for a simple system with three multipaths and two channels such as the one described in Figure 2.



Figure 4: Architecture of the Proposed Rake Receiver

The rake receiver has three finger banks, one for each multipath. Each finger bank consists of a register block which stores all eight sub-symbols and two fingers, one for each channel. A register block receives eight sub-symbols from its left neighbor at a certain clock. A finger of Finger Bank i accumulates sub-symbols of multipath i for the channel. For example, the finger labeled CH2 of Finger Bank #1 collects sub-symbols of multipath 1 for Channel 2.

We explain the operation of the proposed rake receiver using the example in Figure 2. At the end of clock *i*=3, the first sub-symbol of Channel 1 for multipath P1 is available at output 6 of the de-spreader<sup>2</sup> and the first sub-symbol of Channel 2 is available at output 3. The results are stored at Finger Bank #3 at the rising edge of clock i=4. At the end of clock *i*=7, the second sub-symbols of C1 and C2 for multipath P1 are available, and the first sub-symbols of D1 and D2 for multipath P2 are available. At the rising edge of clock *i*=8, the results are stored at Finger Bank #3, while the current content of Finger Bank #3 is shifted to Finger Bank #2. When sub-symbols for multipath P3 are available at the end of clock *i*=8, a similar operation is performed at the rising edge of the following clock. Since all the subsymbols for all multipaths are now at the right finger bank stages, all the sub-symbols -- either in their noncomplement or complement forms -- are copied and accumulated at the corresponding fingers. Note that the copy operation of sub-symbols does not necessarily occur simultaneously at all finger banks. Since multipath data are partially de-skewed through the shift of data over finger banks, it reduces the required de-skew buffer sizes. The finger outputs of each channel are combined after the deskew and the channel compensation. Note that the deskewer and the channel compensator are omitted from Figure 4 for the sake of simplicity.

## IV. EXPERIMENTAL RESULTS

In this section, we present simulation results for the performance of the proposed rake receiver and compare the performance with that of a conventional rake receiver. The simulation environment is given as follows.

- System model: WCDMA downlink receiver with a chipping rate of 3.84 Mcps and a sampling rate of 30.72 Msps
- Technology: TSMC 0.18 μm CMOS with a supply voltage of 1.8 V

We described a 4-bit parallel de-spreader, a finger bank with four fingers, and a conventional finger at the RTL (Register Transfer Level) and synthesized them using a commercial tool. The power dissipation for those blocks was estimated through simulation at the gate level, and the maximum power dissipation was obtained for the despreader and the finger bank by activating the entire circuits on every sampling clock. Table 1 shows the simulation results. The area in the table is expressed as the

<sup>&</sup>lt;sup>2</sup> Note that the binary value of the code (1,-1,-1,1) is (0,1,10), whose equivalent decimal value is 6.

equivalent NAND2 gate count. The proposed finger bank has four fingers but dissipates only twice as much power as one conventional finger and consumes only three times as much area. This can be explained as that the fingers of a finger bank are simpler than the conventional one due to the absence of the de-spreader blocks.

|                  | Proposed<br>De-spreader | Proposed<br>Finger Bank | Conventional finger |
|------------------|-------------------------|-------------------------|---------------------|
| Power<br>(mW)    | 3.1                     | 0.51                    | 0.24                |
| Area<br>(#NAND2) | 9.83 k                  | 9.26 k                  | 3.31 k              |

Using the basic blocks, we implemented proposed rake receiver and a conventional receiver with various numbers of fingers and obtained the power dissipation and the number of gates. The results are shown Figures 5 and 6.



Figure 5: Power Dissipation of the Two Types of Rake Receivers



Figure 6: Area of the Two Types of Rake Receivers

As expected, the proposed receiver becomes more efficient in power and in area as the number of fingers increases. The crossover point is at about ten fingers for both power dissipation and area. By noting that the number of fingers for a rake receiver for a typical wireless system is often above 100, the proposed rake receiver saves power and area for practical systems. For example, a WCDMA system has eight data/control channels [5]. Assuming support of four multipaths and the process of on-time, early-time and late-time sampled pilot signals, the required number of fingers for the system is 128. The performance of the two types of rake receivers is summarized in Table 2. The proposed receiver saves 36.8 % of power by and 27.7 % of area compared with the conventional receiver. This is a notable achievement as reduction of both power and area is rarely reported in low power designs.

|                               | Power   | Area<br>(# NAND2) |
|-------------------------------|---------|-------------------|
| Proposed<br>Rake Receiver     | 19.4 mW | 306.2 K           |
| Conventional<br>Rake Receiver | 30.7 mW | 423.7 K           |
| Savings                       | 36.8 %  | 27.7 %            |

Table 2: Comparison of the Performance for a WCDMA System

# V. CONCLUSION

In this paper, an area and power efficient rake receiver architecture is proposed for base and mobile-stations employing the direct sequence spread spectrum (DSSS) technique. One common parallel de-spreader provides precomputed sub-symbols to fingers for the proposed design, and, hence, each finger can operate at a lower clock speed to save the power. Our simulation results indicate that the proposed rake architecture for a WCDMA system reduces power dissipation by 37 % and the circuit complexity by 28 % compared with a conventional rake receiver.

#### REFERENCE

- R. Baghaie and T. Laakso, "Implementation of lowpower CDMA RAKE receivers using strength reduction transformation," Proceeding of IEEE Nordic Signal Processing Symposium, NORSIG'98, Vigsø, Denmark, pp. 169-172, June 1998.
- [2] R. Baghaie, "Application of transformation techniques in CDMA receivers," Proceeding of IEEE Midwest Symposium on Circuits and Systems, MWSCAS'99, Las Cruces, New Mexico, August 1999.
- [3] M. Neitola and T. Rahkonen, "An analog correlator for a WCDMA receiver", Proceeding of 17<sup>th</sup> NORCHIP conference, pp. 86-9, 1999.
- [4] H.J. Lee and D.S. Ha, "A New Low-Power And Area Efficient Rake Receiver Design Without Incurring Performance Degradation," 2002 IEEE Int. Conf. ASIC/SOC, pp. 251-255, September, 2002
- [5] Third Generation Partnership Project Technical Specification Group Radio Access Network, "Spreading and modulation (FDD)," TS 25.213 V3.5.0 (2001-03).