Optimizing Chien Search usage in the BCH Decoder

S. Murali Narasimham\textsuperscript{1}, Ganesha A\textsuperscript{2}

Associate Professor, Dept. of E&C, Bangalore Institute of Technology, Bangalore, Karnataka, India\textsuperscript{1}

M. Tech Student [VLSI & ES], Dept. of E&C, Bangalore Institute of Technology, Bangalore, Karnataka, India\textsuperscript{2}

\textbf{ABSTRACT:} In the decoding of the Bose Chaudhuri-Hocquenghem (BCH) codes, the most complex block is the Chien search block. In the decoding process of the BCH codes, error correction is performed bit by bit, hence they require parallel implementation. The area required to implement the Chien search parallel is more, hence a strength reduced parallel architecture for the Chien search is presented. In this paper, the syndrome computation is done using conventional method, the inversion-less Berlekamp Massey Algorithm is used for the solving the key equations.

\textbf{KEYWORDS:} BCH Decoder, parallel Chien search architecture

\textbf{I. INTRODUCTION}

The algebraic codes which are widely used and most powerful are the BCH and RS codes \cite{3}. The powerful error correcting codes which are used more often in modern communication systems like wireless communication, optical communication, computer networks, magnetic recording systems, various storage devices are the BCH codes. Error pattern of size \(t\) or less can be designed in the BCH codes.

The three stages in the decoding process of the BCH codes are as follows: i) at first, the syndrome polynomial \(S(x)\) is calculated using the syndrome computation block, the received code word is given as input. The \(S(x)\) is given as input to the second stage; ii) the second block, the error locator polynomial is calculated by using different decoding algorithms such as \textit{Berlekamp Massey}, \textit{Modified Euclidean} etc. are used; iii) the last block, the Chien search block, where the roots of the error locator polynomial are calculated using the Chien search algorithm. The Chien search block involves more computations and complexity compared to other blocks.

It is frequently convenient to define error correcting codes in terms of the generator polynomials \(G(x)\). For the BCH code capable of correcting \(t\) errors, generator polynomial is taken as the LCM of the minimal polynomials.

\[ G(x) = \text{LCM} (\Phi_1, \Phi_2, \Phi_3, \Phi_4, \ldots, \Phi_{2t-1}) \]  

The binary BCH code \((n,k,t)\) exists for integer values \(m \geq 3\), \(t \leq 2^m - 1\), with there properties as given below:

\begin{align*}
    n &= 2^m - 1 & \text{length of code word} \\
    k &\geq n - mt & \text{number of information bits} \\
    d_{\text{min}} &\geq 2t+1 & \text{minimum hamming distance.} \\
    t & & \text{error correcting capability.}
\end{align*}

On the implementation of the parallel Chien search in area efficient manner is focused in this paper. Primitive binary BCH codes are considered in this paper.
BCH DECODER ARCHITECTURE

In Fig.1, the BCH decoder block diagram is shown. For a (n,k,t) BCH code in which, c(x) represents the transmitted code polynomial, r(x) the code polynomial that is received at the decoder end and e(x) represents the code polynomial where the error has occurred.

A. Syndrome Computation

The first step in the decoding of the BCH codes is to calculate 2t syndromes Sj by evaluating the received code polynomial r(x) at. The code polynomial that is received can be represented by:

\[ r(x) = (c(x) + e(x)) \tag{2} \]

The calculation of the 2t syndromes is given by

\[ S_j = \sum_{i=0}^{2t-1} r_{i-j} \tag{3} \]

Where is the root of the primitive polynomial. We can also define syndrome polynomial

\[ S_j = \sum_{i=0}^{2t-1} S_i + 1 \cdot x^i \quad 1 \leq j \leq 2t \tag{4} \]

B. Key Equation Solver

The co-efficient of the error locator polynomial in the decoding of the BCH codes are calculated in this stage. The input to this stage are the syndromes that are generated in the first stage of the decoding of the BCH codes. The error locator polynomial is given as: \( \sigma(x) = \sigma_0 + \sigma_1 x + \cdots + \sigma_t x^t \). The error locator polynomial is related with the syndromes generated in the first stage by the following relationship:

\[ \sum_{i=0}^{t} S_{t+i-j} \sigma_j \tag{5} \]

The co-efficient of the error locator polynomial can be calculated by making use different decoding algorithms. In this paper the error locator polynomial co-efficient are found by the Inversion-less Berlekamp Massey Algorithm to solve the key equation presented in the [1] is used in this paper. The output of this block is the error locator polynomial \( \sigma(x) \).

C. Chien Search

In decoding process, the error locator polynomial \( \sigma(x) \) is acquired by performing the syndrome estimation and then solving the key equation. The Chien search block comprehensively analyzes whether a root of \( \Lambda(x) \) is for \( i=0, 1, \ldots, n-1 \); i.e., it checks whether yields zero or not for the following equation:
\[ \sigma(\alpha^l) = \sum_{j=0}^{n-1} \sigma_j \alpha^{jl} \] (6)

The above equation gives the straightforward execution of the Chien search block. The routine Chien search circuit is as shown in Fig. 2.

Fig. 2. Conventional Chien Search

It produces an error-vector \( e \) in manner that, if is the root, then the \((n-i)^{th}\) component \( e_{n-i} = 1 \); otherwise \( e_{n-i} = 0 \) for all \( 0 \leq i \leq n-1 \). In the conventional Chien search circuit only single error location is checked for on clock cycle, therefore it requires \( n \) clock cycles to complete the search process. The parallel architecture for the Chien search [5] can be used in order to replace this traditional Chien search circuit as shown in Fig. 3.

Moreover, without any hardware complexity, the long critical path in the parallel design can be adequately shortened as discussed in [6], [9]. The parallelization of the Chien circuit reduces the \( n \) clock cycles required for computation to \( \frac{n}{p} \) clock cycles with the parallel factor \( p \). The hardware requirement for the parallel process also increases linearly with \( p \). Thus the efficient decoder complexity basically relies upon the design of proficient Parallel Chien Search circuit.

Fig. 3. Parallel Architecture for Chien Search

III. PROPOSED ARCHITECTURE

The given \((n,k,t)\) BCH code is built on \( GF(2^m) \). The parallel Chien search circuit is more area consuming. The strength reduced parallel Chien search process is as given in Fig. 4.
The (15,7,2) BCH code is built on GF(2^4). The proposed Chien search circuit for the BCH decoder is as given in the block diagram shown in Fig. 4.

A. Chien Search Block

The Chien search circuit and error correction block is the biggest area and timing consumption. The signals Zs and Zb is the output from control unit to indicate zero syndrome. If Zs = 1 then no error occurs in the received polynomial code due to S(x)=0. If Zb =1 then only one error occurs in the received polynomial code. We should use multiplexer to select original bit or correction bit. To reduce the critical path, pipelined register must be added to the selector of the output. It can reduce the critical path significantly.

To obtain the optimal design, the architecture of the constant finite field multiplier has to be optimized by reducing the number of XOR gate. In table I, we can see that the coefficients of constant FFM have the same pattern between the equation and each other. By implementing this circuit, the count of XOR gates can be minimized, by reducing the same pattern coefficients. To replace FFM, block Ci and block bo is used as shown in Fig. 5.

To find the roots of polynomials, each element in GF(2^4) has to be evaluated. Since there are 15 elements, we need signals C0-C14 to form Ci block. The assignment is determined based on coefficient pattern of FFM. The input of bocan be transformed to perform an optimized computation by manipulating its bits. As we know, the error locator polynomial is defined as:

\[ b(x) = x^2 + b_1 x + b_0 \]  \hspace{1cm} (7)
so the value of b0+x2 can be computed in parallel computation by transforming the bits of b0. If x is the element of GF(2^4) and α = α₀, α¹, ..., α₁₄ then

\[ x^2 = \alpha^{21} = \alpha^6, \alpha^7, ..., \alpha^{14}, \alpha^1, \alpha^3, ..., \alpha^{13}. \]

### Table I

**Equation Of Power Constant FFM Over GF(2^4)**

<table>
<thead>
<tr>
<th>FFM Type</th>
<th>GF Polynomial Equation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Pow(2)</td>
<td>a₃x³ + (a₁+a₂)x² + a₂x + (a₀+a2)</td>
</tr>
<tr>
<td>Pow(4)</td>
<td>a₃x⁴+(a₂+a₃)x³+(a₁+a₃)x+(a₀+a₁+a₂+a₃)</td>
</tr>
<tr>
<td>Const α₀</td>
<td>a₃x³+a₂x²+a₁x+a₀</td>
</tr>
<tr>
<td>Const α₁</td>
<td>a²x³+a₁x²+a₀x+a₃</td>
</tr>
<tr>
<td>Const α²</td>
<td>a₁x³+(a₀+a₃)x²+(a₂+a₃)x+a₂</td>
</tr>
<tr>
<td>Const α₃</td>
<td>(a₀+a₃)x³+(a₂+a₃)x²+(a₁+a₂)x+a₁</td>
</tr>
<tr>
<td>Const α⁴</td>
<td>(a₂+a₃)x³+(a₁+a₂)x²+(a₀+a₁+a₃)x+a₀+a₃</td>
</tr>
<tr>
<td>Const α⁵</td>
<td>(a₁+a₂)x³+(a₀+a₁+a₃)x²+(a₂+a₃)x+a₂</td>
</tr>
<tr>
<td>Const α⁶</td>
<td>(a₀+a₁+a₃)x³+(a₀+a₂)x²+(a₁+a₃)x+a₁+a₂</td>
</tr>
<tr>
<td>Const α⁷</td>
<td>(a₀+a₂)x³+(a₁+a₃)x²+(a₀+a₂+a₃)x+(a₀+a₁+a₃)</td>
</tr>
<tr>
<td>Const α⁸</td>
<td>(a₁+a₃)x³+(a₀+a₂+a₃)x²+(a₁+a₂+a₃)x+(a₀+a₂)</td>
</tr>
<tr>
<td>Const α⁹</td>
<td>(a₀+a₂+a₃)x³+(a₁+a₂+a₃)x²+(a₀+a₁+a₂+a₃)x+(a₁+a₃)</td>
</tr>
<tr>
<td>Const α₁₀</td>
<td>(a₁+a₂+a₃)x³+(a₀+a₁+a₂+a₃)x²+(a₀+a₁+a₂)x+(a₀+a₂+a₃)</td>
</tr>
<tr>
<td>Const α₁₁</td>
<td>(a₀+a₁+a₂+a₃)x³+(a₀+a₁+a₂)x²+(a₀+a₁+a₂+a₃)</td>
</tr>
<tr>
<td>Const α₁₂</td>
<td>(a₀+a₁+a₂+a₃)x³+(a₀+a₁+a₂)x²+(a₀+a₁+a₂+a₃)</td>
</tr>
<tr>
<td>Const α₁₃</td>
<td>a₀x³+a₃x²+a₂x+(a₀+a₁)</td>
</tr>
</tbody>
</table>
B. Control Unit
To perform our decoder system, we employ a simple control unit block that can support error correction. If there is no error in r(X), then the value of S1 and S3 is zero. If only one error occurs, then b0 of error locator polynomial is zero. The design of the control unit is as appeared if Fig. 6.

![Control Unit Block Diagram](image)

**Fig. 6. Control Unit Block Diagram**

IV. RESULTS

The design of the BCH decoder architectures described in the previous sections namely conventional, strength reduced architecture are expected to be simulated in Cadence NC Verilog simulator tool. In order to compare the complexity overhead and power consumptions, the two variants of the BCH decoder described in the previous in the paper are expected to be simulated using Cadence Encounter RTL compiler tool.

The power and area parameters of the BCH decoder design is given in the table II.

<table>
<thead>
<tr>
<th>Area(um²)</th>
<th>Power(nW)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Design in [8]</td>
<td>14051</td>
</tr>
<tr>
<td>Proposed</td>
<td>6961</td>
</tr>
</tbody>
</table>

V. CONCLUSION

This paper presents the strength reduced parallel Chien search architecture which uses different set of constant power FFM over GF(2^

REFERENCES