Thesis v1

An Efficient FPGA Implementation of IEEE
802.16e LDPC Encoder
Speaker: Chau-Yuan-Yu
Advisor: Mong-Kai Ku
Outline
Introduction
Low-Density Parity-Check Codes
Related work
General encoding for LDPC codes
Efficient encoding for Dual-Diagonal matrix
Better Encoder scheme
LDPC Encoder Architecture
Parallel Encoder
Serial Encoder
Result
Conclusion
Outline
Introduction
Related work
Better Encoding scheme
Parallel Encoder
Serial Encoder
Result
Conclusion
Low-Density Parity-Check Code
Benefit of LDPC Codes.
Approaching Shannon limit
Low error floor
LDPC code is adopted by various standards

(e.g. DVB-S2, 802.11n, 802.16e)
Parity check matrix H is sparse
Very few 1s in each row and column
Null space of H is the codeword space
Valid Codeword
In (n, k) block codes, k-bit information data can be
encoded as n-bit codeword.
In systematic block codes, the information bits directly

exist in the bits of codeword.
Systematic Part Parity Part

General encoding of systematic linear block codes
Finding generator matrix G via H.
C = sG = [s | p]
Issues with LDPC codes

The size of G is very large.
G is not generally sparse.
Encoding complexity will be very high.
Structured LDPC Codes
Quasi-Cyclic LDPC Codes
In QC-LDPC, H can be partitioned into square sub-blocks of size z x
z.
Each sub-blocks can be Z x Z zero sub-block or identity matrix with

permutation.
Structured LDPC Codes
QC Codes With Dual-Diagonal Structure
In IEEE standards QC-LDPC Codes have Dual-Diagonal
parity structure.
We take 802.16e code rate matrix for example.
0 represent
identity matrix.
Outline
Introduction
Related work
Parallel Encoder
Serial Encoder
Result
Conclusion
General Encoding for LDPC Codes
Richardson and Urbanke (RU) algorithm
Partition the H matrix into several sub-matrix.
In H, the part T is a low triangle matrix.
General Encoding for LDPC Codes
Richardson and Urbanke (RU) algorithm
p0 O(n+g2)
p1 O(n+g2)
Efficient Encoding for Dual-Diagonal LDPC Codes
A valid codeword c = [s|p] must satisfy
Replace by dual-diagonal matrix
Information bits Parity bits

Define lambda value as From equation, we obtained
Related Work (1) Sequential Encoding
Encoding scheme
One-way derivation
Step 1
Compute lambda value by doing
matrix operation x = HsS
Step 2
Determines parity vector P0 by
adding all the lambda value
Step 3
Rest of parity vector is obtained
by exploiting dual-diagonal
matrix T
Related Work (2) Arbitrary Bit-generation and Correction Encoding
In [1], an alternative encoding for standard matrix was presented.
Matrix will be modify by

parity portion of weight-3 A Q U
column set.
H can be sectorized into

three sub matrices
The information bit region A
The parity bit region Q
for bit-flipping operation

The parity bit region U
for non bit-flipping.

Replace with zero
cyclic shift
[1] C. Yoon, E. Choi, M. Cheong, and S.-K. Lee, "Arbitrary bit generation and correction technique for encoding QC-LDPC codes
with dual-diagonal parity structure," IEEE Wireless Communications and Networking Conference, (WCNC 2007), pp. 662-666,
March 2007.
Encoding scheme
Step 1 One-way derivation
Compute lambda value by doing
matrix operation x = As
Step 2
Set P0 as arbitrary binary
values. solve unknown parity
bits
Step 3
Computed correction vector f
from P0
Step 4
Add correction vector to parity
bits in region Q to correct them
Advantage
Low-complexity encoding
The number of addition required is less than RU scheme
Drawback
Can not directly applicable to standard code
Modifying matrix will decrease code performance
Outline
Introduction
Related work
Parallel Encoder
Serial Encoder
Result
Conclusion
Better encoding scheme
Advantages of the encoding scheme
proposed in [2]
Low-complexity encoding
Can directly applicable to matrices defined in

IEEE standards without any modification
Achieve higher level parallelism
[3] C.-Y. Lin, C.-C. Wei, and M.-K. Ku, "Efficient Encoding for Dual-Diagonal Structured LDPC Code Based on Parity bits Prediction and
Correction," IEEE Asia Pacific Conference on Circuits and Systems (APPCCAS), pp.1648-1651, Dec. 2008.
Better Encoding Scheme
Step 1
Set P0 as any binary vector
Correct prediction vector by f
Step 2
Compute lambda value by
doing matrix operation Hs
Step 3
[Forward Derivation]
Step 4
[Backward Derivation]
Step 5
Compute the P0 by adding
prediction parity vector
Step 6
Compute the correction vector f
Step 7
Compute P0 by adding prediction vector
Correct prediction parity by
adding f Compute correction vector f f = (P0)d
Better Encoding Scheme
Step 1
Set P0 as any binary vector. Reduce encoding delay !!
Step 2 Two-way derivation
Compute lambda value by
doing matrix operation Hs.
Step 3
[Forward Derivation]
Step 4
[Backward Derivation]
Step 5
Compute the P0 by adding
prediction parity vector.
Step 6
Compute the correction vector f.
Step 7
Correct prediction parity by
adding f.
Outline
Introduction
Related work
Parallel Encoder
Serial Encoder
Result
Conclusion
Based on the encoding scheme proposed bedore, we design both
parallel and serial architecture.
Parallel architecture
Achieve higher level parallelism
High-speed
Serial architecture
Barrel shifter#1
divider
Prediction
Matrix Parity
Accumulator Correct
memory
Barrel shifter#6
Input data register lambda position

Parallel architecture (Stage 1)
Barrel shifter#1
divider
Prediction
Matrix Parity
Accumulator Correct
memory
Barrel shifter#6
Benefit:
In this stage, matrix 1.When the input data
select the shift values is coming, it can work
and multiply specific immediately without all
value according to the input data are
the code length. coming.
2.Reduce the numbers
of barrel shifter.
Shifter Value Computation
Equation for computing shift value
Normal code rate :
Code rate 2 3 A code :
Two type of matrix implement result with multiple rate and length
Slice FFs LUTs CLK Total gate
(MHz) count
One matrix + 14,179 4,071 26,846 141.391 227,076
calculate IP
Using matrices to 41,409 12,078 76,977 165.591 635,691
save shifter value

Barrel shifter#1
divider
Prediction
Matrix Parity
Accumulator Correct
memory
Barrel shifter#6
Divide the datas from This module used to save

matrix. the input data. These data
are used in barrel shifters.
Barrel shifter#1
divider
Prediction
Matrix Parity
Accumulator Correct
memory
Barrel shifter#6
Lambda position = 3
These module are This module records the

used to circulated row position of the
shift the input data shifter values
Lambda position = 8
Lambda position = 11
Shifter value
Barrel shifter#1
divider
Prediction
Matrix Parity
Accumulator Correct
memory
Barrel shifter#6
According to the Computed the

lambda position, in lambda value by
this clock cycle 1, 2, accumulating the
5, 8, 9, 11 need to be shifted data after Kb
accumulated. clock cycle
Kb
Barrel shifter#1
divider
Prediction
Matrix Parity
Accumulator Correct
memory
Barrel shifter#6
Computed the
prediction vector Pi
by equation
P_0 <= acc_out0;
P_1 <= acc_out0 ^ acc_out1;
P_2 <= acc_out0 ^ acc_out1 ^ acc_out2;
P_3 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3;
P_4 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4;
P_5 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4 ^ acc_out5;
P_11 <= acc_out11;
For saving the hardware area, we use In code rate 21 / 3,

2, P_0 ~ P_3
P_11
one architecture to compute the are the prediction
P_8~P_11are the prediction
prediction values for four different
code rate.
P_0 <= acc_out0;
P_11 <= acc_out11;
For saving the hardware area, we use In code rate 53 / 6,

4, P_0 ~ P_1
P_2
one architecture to compute the P_9~P_11 are the
P_10~P_11are theprediction
prediction
prediction values for four different vectors
code rate.
Barrel shifter#1
divider
Prediction
Matrix Parity
Accumulator Correct
memory
Barrel shifter#6
Step2: Step1:
Correct the other Pi. Compute the P0. In
Using the equation code rate = 1 / 2,
Pi= Pi^ P0 P0 = P5 ^ P6
Serial architecture (Stage 1)
Barrel shifter#1 Accumulator

&
divider
Matrix
Predict Correct
memory
Barrel shifter#2
Input data Input

register control
As the stage1 in 1 In the first Kb clock

parallel architecture. cycle, encoder order are
from top->middle and
3
3 down ->middle, column
2
by column
1

&
divider
Matrix
Predict Correct
memory
Barrel shifter#2
Input data Input

register control
1 2
3
Reason: In the last clock cycle,

1.Prepare the input encoder order are from
data left->right, row by row
2.Reduce the slice
3
1 2

&
divider
Matrix
Predict Correct
memory
Barrel shifter#2
Input data Input

register control
Divide the datas from Choose the corresponding

matrix. input value to barrel shifter
(Take clock cycle #2 for
example)

&
divider
Matrix
Predict Correct
memory
Barrel shifter#2
Input data Input

register control
Shift the input data

according to the shifter
value chosen form Mux

&
divider
Matrix
Predict Correct
memory
Barrel shifter#2
Input data Input

register control
In normal, this module In this module, there

accumulate the shifted are three works:
data to compute i . 1.Compute i
When the data is the 2.Compute Pi
last value in this row, 3.Compute P0
also compute Pi.

&
divider
Matrix
Predict Correct
memory
Barrel shifter#2
Input data Input

register control
When all Pi have been

computed, compute the
P0 by Xor Px and Px+1
which are the middle
prediction vector in the
matrix.

&
divider
Matrix
Predict Correct
memory
Barrel shifter#2
Input data Input

register control
Correct the other Pi.

Using the equation
Pi= Pi^ P0
Outline
Introduction
Related work
Parallel Encoder
Serial Encoder
Result
Conclusion
Implementation Results
The proposed encoder based on IEEE 802.16e LDPC codes can

encode the code with code rate 1/2 2/3 3/4 5/6 and code length
ranging from 576 to 2304.
The hardware implementation was performed and verification on Xilinx

Virtex-4 and Altera Stratix Field Programmable Gate Array (FPGA)
device.
Rate 1/2 Rate 2/3 Rate 3/4 Rate 5/6
Z N Slice FFs LUTs CLK (MHz) IT (Gbps) IT (Gbps) IT (Gbps) IT (Gbps)
24 576 2.262 2.468 2.545 2.61
40 960 3.77 4.113 4.241 4.35
60 1440 14,179 4,071 26,846 141.391 5.656 6.17 6.363 6.526
80 1920 7.541 8.226 8.483 8.701
96 2304 9.049 9.872 10.18 10.441
Information throughput ranging from 2.262 to 10.441 Gbps

The encoder area is constant in any code rate or code length.
For a given code rate, an increase in the code length will increase the throughput.
Serial architecture
Information throughput ranging from 0.867 to 4.019 Gbps

For a given code rate, an increase in the code length will increase the
throughput.
Parallel architecture using row by row
Area comparison
IT comparison
IT/Area comparison
Table 4.5a The synthesis result of [22] at code rate 1/2
Compare to Related Work

We compare implementation with [3].
Code Length Area (LE) Clk (MHz) IT (Gbps) IT/Total Area Code Length Area (LE) Clk IT (Gbps) IT/Total Area
(Mb per Le) (MHz) Rate 1/2 (Mb per Le)
576 3391 192.23 2.129 0.0612 rate1/2
[2] 576 1.561 0.07447

960 5100 159.57 2.253 0.0648
Proposed 960 20960 97.58 2.602 0.12414
1440 7012 164.83 2.697 0.0776
1440 3.903 0.18621

1920 8924 148.72 2.644 0.0761
1920 5.204 0.24828

2304 10339 148.41 2.758 0.0793
2304 6.245 0.29794
34766
Better throughput for longer code length

Using less area to implement multiple code length and code rate
The clock cycle is shorter the [3].
[3] S. Kopparthi and D. M. Gruenbacher, "Implementation of a fiexible encoder for structured low-density parity-check codes," IEEE Pacic Rim
Conference on Communications, Computers and Signal Processing (PacRim 2007), pp.438-441, Aug. 2007.
The comparison of throughput
The proposed encoder outperforms the work in [3] in terms of throughput

when the code length longer then 1200
The proposed encoder architecture provides better throughput for a longer
code length while the work in [3] does not have this kind of speed-up
The comparison of throughput/area ratio
The proposed encoder outperforms the work in [3] in terms of

throughput/area ratio by 1.216 to 3.757 times
The proposed encoder utilizes hardware resources more efficiently
The comparison of throughput
The throughput in our proposed encoder is higher then [2] in all code rate
and code length
ratio by 1.237 to 1.963 times
The comparison of throughput/area

ratio by 2.427 to 5.256 times
The result shows that our proposed encoder utilizes hardware resources
efficiently
Compare to Related Work (Serial)
Slices FFs LUTs Block rams CLK IT
[4] 4,724 1,807 8,335 81 186 3.34
Proposed 12,567 3,885 22,050 0 123.502 4.626
Our proposed encoder achieve higher IT in low clock.
In our proposed encoder, the matrix information are built in it without

additional blockrams.
The IT/Area of our serial encoder is 0.3681(Mbps) per slice and the
IT/Area of [4] is 0.1768.
[4] Jeong Ki KIM1, Hyunseuk YOO1 and Moon Ho LEE1, "Efficient Encoding Architecture for IEEE 802.16e LDPC Codes, " IEICE Transactions
on Fundamentals of Electronics, Communications and Computer Sciences 2008.
Outline
Introduction
Related work
Proposed Encoding scheme
Parallel Encoder
Serial Encoder
Result
Conclusion
Conclusion
An efficient encoding architecture for IEEE 802.16e LDPC

codes with multiple code lengths and code rates are
implemented.
In our design, change between different code rate or code

length only to change the type in information data.
This architecture is also suitable the IEEE 802.11n standard.
Our encoder achieve higher throughput and better

throughput/area ratio than conventional encoding scheme
when code length longer than 1200.
Thank you!!

Thesis v1

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Thesis v1

Transféré par

Droits d'auteur :

Formats disponibles

An Efficient FPGA Implementation of IEEE

802.16e LDPC Encoder

Low error floor

LDPC code is adopted by various standards

Null space of H is the codeword space

In systematic block codes, the information bits directly

Systematic Part Parity Part

Issues with LDPC codes

Each sub-blocks can be Z x Z zero sub-block or identity matrix with

Replace by dual-diagonal matrix

Information bits Parity bits

Matrix will be modify by

H can be sectorized into

for bit-flipping operation

for non bit-flipping.

Can directly applicable to matrices defined in

Achieve higher level parallelism

Input data register lambda position

Input data register lambda position

Normal code rate :

Code rate 2 3 A code :

Slice FFs LUTs CLK Total gate

One matrix + 14,179 4,071 26,846 141.391 227,076

Using matrices to 41,409 12,078 76,977 165.591 635,691

save shifter value

Input data register lambda position

Divide the datas from This module used to save

Input data register lambda position

These module are This module records the

Input data register lambda position

According to the Computed the

Input data register lambda position

For saving the hardware area, we use In code rate 21 / 3,

For saving the hardware area, we use In code rate 53 / 6,

Input data register lambda position

Barrel shifter#1 Accumulator

Input data Input

As the stage1 in 1 In the first Kb clock

Barrel shifter#1 Accumulator

Input data Input

Reason: In the last clock cycle,

Barrel shifter#1 Accumulator

Input data Input

Divide the datas from Choose the corresponding

Barrel shifter#1 Accumulator

Input data Input

Shift the input data

Barrel shifter#1 Accumulator

Input data Input

In normal, this module In this module, there

Barrel shifter#1 Accumulator

Input data Input

When all Pi have been

Barrel shifter#1 Accumulator

Input data Input

Correct the other Pi.

The proposed encoder based on IEEE 802.16e LDPC codes can

The hardware implementation was performed and verification on Xilinx

Z N Slice FFs LUTs CLK (MHz) IT (Gbps) IT (Gbps) IT (Gbps) IT (Gbps)

24 576 2.262 2.468 2.545 2.61

40 960 3.77 4.113 4.241 4.35

60 1440 14,179 4,071 26,846 141.391 5.656 6.17 6.363 6.526

80 1920 7.541 8.226 8.483 8.701