Vous êtes sur la page 1sur 56

An Efficient FPGA Implementation of IEEE

802.16e LDPC Encoder

Speaker: Chau-Yuan-Yu
Advisor: Mong-Kai Ku
Outline
Introduction
Low-Density Parity-Check Codes
Related work
General encoding for LDPC codes
Efficient encoding for Dual-Diagonal matrix
Better Encoder scheme
LDPC Encoder Architecture
Parallel Encoder
Serial Encoder
Result
Conclusion
Outline
Introduction
Low-Density Parity-Check Codes
Related work
General encoding for LDPC codes
Efficient encoding for Dual-Diagonal matrix
Better Encoding scheme
LDPC Encoder Architecture
Parallel Encoder
Serial Encoder
Result
Conclusion
Low-Density Parity-Check Code
Benefit of LDPC Codes.
Approaching Shannon limit

Low error floor

LDPC code is adopted by various standards


(e.g. DVB-S2, 802.11n, 802.16e)
Low-Density Parity-Check Code
Parity check matrix H is sparse
Very few 1s in each row and column

Null space of H is the codeword space

Valid Codeword
Low-Density Parity-Check Code
In (n, k) block codes, k-bit information data can be
encoded as n-bit codeword.

In systematic block codes, the information bits directly


exist in the bits of codeword.

Systematic Part Parity Part


Low-Density Parity-Check Code
General encoding of systematic linear block codes
Finding generator matrix G via H.
C = sG = [s | p]

Issues with LDPC codes


The size of G is very large.
G is not generally sparse.
Encoding complexity will be very high.
Structured LDPC Codes
Quasi-Cyclic LDPC Codes
In QC-LDPC, H can be partitioned into square sub-blocks of size z x
z.

Each sub-blocks can be Z x Z zero sub-block or identity matrix with


permutation.
Structured LDPC Codes
QC Codes With Dual-Diagonal Structure
In IEEE standards QC-LDPC Codes have Dual-Diagonal
parity structure.
We take 802.16e code rate matrix for example.

0 represent
identity matrix.
Outline
Introduction
Low-Density Parity-Check Codes
Related work
General encoding for LDPC codes
Efficient encoding for Dual-Diagonal matrix
Better Encoding scheme
LDPC Encoder Architecture
Parallel Encoder
Serial Encoder
Result
Conclusion
General Encoding for LDPC Codes
Richardson and Urbanke (RU) algorithm
Partition the H matrix into several sub-matrix.
In H, the part T is a low triangle matrix.
General Encoding for LDPC Codes
Richardson and Urbanke (RU) algorithm

p0 O(n+g2)

p1 O(n+g2)
Efficient Encoding for Dual-Diagonal LDPC Codes
A valid codeword c = [s|p] must satisfy

Replace by dual-diagonal matrix

Information bits Parity bits


Define lambda value as From equation, we obtained
Related Work (1) Sequential Encoding

Encoding scheme
One-way derivation
Step 1
Compute lambda value by doing
matrix operation x = HsS

Step 2
Determines parity vector P0 by
adding all the lambda value

Step 3
Rest of parity vector is obtained
by exploiting dual-diagonal
matrix T
Related Work (2) Arbitrary Bit-generation and Correction Encoding
In [1], an alternative encoding for standard matrix was presented.

Matrix will be modify by


parity portion of weight-3 A Q U
column set.

H can be sectorized into


three sub matrices
The information bit region A
The parity bit region Q

for bit-flipping operation


The parity bit region U

for non bit-flipping.


Replace with zero
cyclic shift
[1] C. Yoon, E. Choi, M. Cheong, and S.-K. Lee, "Arbitrary bit generation and correction technique for encoding QC-LDPC codes
with dual-diagonal parity structure," IEEE Wireless Communications and Networking Conference, (WCNC 2007), pp. 662-666,
March 2007.
Related Work (2) Arbitrary Bit-generation and Correction Encoding
Encoding scheme
Step 1 One-way derivation
Compute lambda value by doing
matrix operation x = As

Step 2
Set P0 as arbitrary binary
values. solve unknown parity
bits

Step 3
Computed correction vector f
from P0

Step 4
Add correction vector to parity
bits in region Q to correct them
Related Work (2) Arbitrary Bit-generation and Correction Encoding

Advantage
Low-complexity encoding
The number of addition required is less than RU scheme
Drawback
Can not directly applicable to standard code
Modifying matrix will decrease code performance
Outline
Introduction
Low-Density Parity-Check Codes
Related work
General encoding for LDPC codes
Efficient encoding for Dual-Diagonal matrix
Better Encoding scheme
LDPC Encoder Architecture
Parallel Encoder
Serial Encoder
Result
Conclusion
Better encoding scheme
Advantages of the encoding scheme
proposed in [2]

Low-complexity encoding

Can directly applicable to matrices defined in


IEEE standards without any modification

Achieve higher level parallelism

[3] C.-Y. Lin, C.-C. Wei, and M.-K. Ku, "Efficient Encoding for Dual-Diagonal Structured LDPC Code Based on Parity bits Prediction and
Correction," IEEE Asia Pacific Conference on Circuits and Systems (APPCCAS), pp.1648-1651, Dec. 2008.
Better Encoding Scheme
Step 1
Set P0 as any binary vector
Correct prediction vector by f
Step 2
Compute lambda value by
doing matrix operation Hs

Step 3
[Forward Derivation]

Step 4
[Backward Derivation]

Step 5
Compute the P0 by adding
prediction parity vector

Step 6
Compute the correction vector f

Step 7
Compute P0 by adding prediction vector
Correct prediction parity by
adding f Compute correction vector f f = (P0)d
Better Encoding Scheme
Step 1
Set P0 as any binary vector. Reduce encoding delay !!
Step 2 Two-way derivation
Compute lambda value by
doing matrix operation Hs.

Step 3
[Forward Derivation]

Step 4
[Backward Derivation]

Step 5
Compute the P0 by adding
prediction parity vector.

Step 6
Compute the correction vector f.

Step 7
Correct prediction parity by
adding f.
Outline
Introduction
Low-Density Parity-Check Codes
Related work
General encoding for LDPC codes
Efficient encoding for Dual-Diagonal matrix
Better Encoding scheme
LDPC Encoder Architecture
Parallel Encoder
Serial Encoder
Result
Conclusion
LDPC Encoder Architecture
Based on the encoding scheme proposed bedore, we design both
parallel and serial architecture.

Parallel architecture
Achieve higher level parallelism
High-speed

Serial architecture
Parallel architecture

Barrel shifter#1

divider
Prediction
Matrix Parity
Accumulator Correct
memory

Barrel shifter#6

Input data register lambda position


Parallel architecture (Stage 1)

Barrel shifter#1

divider
Prediction
Matrix Parity
Accumulator Correct
memory

Barrel shifter#6

Input data register lambda position

Benefit:
In this stage, matrix 1.When the input data
select the shift values is coming, it can work
and multiply specific immediately without all
value according to the input data are
the code length. coming.
2.Reduce the numbers
of barrel shifter.
Shifter Value Computation
Equation for computing shift value

Normal code rate :

Code rate 2 3 A code :

Two type of matrix implement result with multiple rate and length

Slice FFs LUTs CLK Total gate

(MHz) count

One matrix + 14,179 4,071 26,846 141.391 227,076

calculate IP

Using matrices to 41,409 12,078 76,977 165.591 635,691

save shifter value


Parallel architecture (Stage 2)

Barrel shifter#1

divider
Prediction
Matrix Parity
Accumulator Correct
memory

Barrel shifter#6

Input data register lambda position

Divide the datas from This module used to save


matrix. the input data. These data
are used in barrel shifters.
Parallel architecture (Stage 3)

Barrel shifter#1

divider
Prediction
Matrix Parity
Accumulator Correct
memory

Barrel shifter#6

Input data register lambda position

Lambda position = 3

These module are This module records the


used to circulated row position of the
shift the input data shifter values
Lambda position = 8

Lambda position = 11

Shifter value
Parallel architecture (Stage 4)

Barrel shifter#1

divider
Prediction
Matrix Parity
Accumulator Correct
memory

Barrel shifter#6

Input data register lambda position

According to the Computed the


lambda position, in lambda value by
this clock cycle 1, 2, accumulating the
5, 8, 9, 11 need to be shifted data after Kb
accumulated. clock cycle

Kb
Parallel architecture (Stage 5)

Barrel shifter#1

divider
Prediction
Matrix Parity
Accumulator Correct
memory

Barrel shifter#6

Input data register lambda position

Computed the
prediction vector Pi
by equation
Parallel architecture (Stage 5)
P_0 <= acc_out0;
P_1 <= acc_out0 ^ acc_out1;
P_2 <= acc_out0 ^ acc_out1 ^ acc_out2;
P_3 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3;
P_4 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4;
P_5 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4 ^ acc_out5;
P_6 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7 ^ acc_out6;
P_7 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7;
P_8 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8;
P_9 <= acc_out11 ^ acc_out10 ^ acc_out9;
P_10 <= acc_out11 ^ acc_out10;
P_11 <= acc_out11;

For saving the hardware area, we use In code rate 21 / 3,


2, P_0 ~ P_3
P_11
one architecture to compute the are the prediction
P_8~P_11are the prediction
prediction values for four different
code rate.
Parallel architecture (Stage 5)
P_0 <= acc_out0;
P_1 <= acc_out0 ^ acc_out1;
P_2 <= acc_out0 ^ acc_out1 ^ acc_out2;
P_3 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3;
P_4 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4;
P_5 <= acc_out0 ^ acc_out1 ^ acc_out2 ^ acc_out3 ^ acc_out4 ^ acc_out5;
P_6 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7 ^ acc_out6;
P_7 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8 ^ acc_out7;
P_8 <= acc_out11 ^ acc_out10 ^ acc_out9 ^ acc_out8;
P_9 <= acc_out11 ^ acc_out10 ^ acc_out9;
P_10 <= acc_out11 ^ acc_out10;
P_11 <= acc_out11;

For saving the hardware area, we use In code rate 53 / 6,


4, P_0 ~ P_1
P_2
one architecture to compute the P_9~P_11 are the
P_10~P_11are theprediction
prediction
prediction values for four different vectors
code rate.
Parallel architecture (Stage 6)

Barrel shifter#1

divider
Prediction
Matrix Parity
Accumulator Correct
memory

Barrel shifter#6

Input data register lambda position

Step2: Step1:
Correct the other Pi. Compute the P0. In
Using the equation code rate = 1 / 2,
Pi= Pi^ P0 P0 = P5 ^ P6
Serial architecture (Stage 1)

Barrel shifter#1 Accumulator


&

divider
Matrix
Predict Correct
memory
Barrel shifter#2

Input data Input


register control

As the stage1 in 1 In the first Kb clock


parallel architecture. cycle, encoder order are
from top->middle and
3
3 down ->middle, column
2
by column
1
Serial architecture (Stage 1)

Barrel shifter#1 Accumulator


&

divider
Matrix
Predict Correct
memory
Barrel shifter#2

Input data Input


register control

1 2
3

Reason: In the last clock cycle,


1.Prepare the input encoder order are from
data left->right, row by row
2.Reduce the slice
3

1 2
Serial architecture (Stage 2)

Barrel shifter#1 Accumulator


&

divider
Matrix
Predict Correct
memory
Barrel shifter#2

Input data Input


register control

Divide the datas from Choose the corresponding


matrix. input value to barrel shifter
(Take clock cycle #2 for
example)
Serial architecture (Stage 3)

Barrel shifter#1 Accumulator


&

divider
Matrix
Predict Correct
memory
Barrel shifter#2

Input data Input


register control

Shift the input data


according to the shifter
value chosen form Mux
Serial architecture (Stage 4)

Barrel shifter#1 Accumulator


&

divider
Matrix
Predict Correct
memory
Barrel shifter#2

Input data Input


register control

In normal, this module In this module, there


accumulate the shifted are three works:
data to compute i . 1.Compute i
When the data is the 2.Compute Pi
last value in this row, 3.Compute P0
also compute Pi.
Serial architecture (Stage 4)

Barrel shifter#1 Accumulator


&

divider
Matrix
Predict Correct
memory
Barrel shifter#2

Input data Input


register control

When all Pi have been


computed, compute the
P0 by Xor Px and Px+1
which are the middle
prediction vector in the
matrix.
Serial architecture (Stage 5)

Barrel shifter#1 Accumulator


&

divider
Matrix
Predict Correct
memory
Barrel shifter#2

Input data Input


register control

Correct the other Pi.


Using the equation
Pi= Pi^ P0
Outline
Introduction
Low-Density Parity-Check Codes
Related work
General encoding for LDPC codes
Efficient encoding for Dual-Diagonal matrix
Better Encoding scheme
LDPC Encoder Architecture
Parallel Encoder
Serial Encoder
Result
Conclusion
Implementation Results

The proposed encoder based on IEEE 802.16e LDPC codes can


encode the code with code rate 1/2 2/3 3/4 5/6 and code length
ranging from 576 to 2304.

The hardware implementation was performed and verification on Xilinx


Virtex-4 and Altera Stratix Field Programmable Gate Array (FPGA)
device.
Implementation Results
Parallel architecture
Rate 1/2 Rate 2/3 Rate 3/4 Rate 5/6

Z N Slice FFs LUTs CLK (MHz) IT (Gbps) IT (Gbps) IT (Gbps) IT (Gbps)

24 576 2.262 2.468 2.545 2.61

40 960 3.77 4.113 4.241 4.35

60 1440 14,179 4,071 26,846 141.391 5.656 6.17 6.363 6.526

80 1920 7.541 8.226 8.483 8.701

96 2304 9.049 9.872 10.18 10.441

Information throughput ranging from 2.262 to 10.441 Gbps


The encoder area is constant in any code rate or code length.
For a given code rate, an increase in the code length will increase the throughput.
Implementation Results
Serial architecture

Information throughput ranging from 0.867 to 4.019 Gbps


For a given code rate, an increase in the code length will increase the
throughput.
Implementation Results
Parallel architecture using row by row

Area comparison
Implementation Results
IT comparison

IT/Area comparison
Table 4.5a The synthesis result of [22] at code rate 1/2

Compare to Related Work


We compare implementation with [3].
Code Length Area (LE) Clk (MHz) IT (Gbps) IT/Total Area Code Length Area (LE) Clk IT (Gbps) IT/Total Area

(Mb per Le) (MHz) Rate 1/2 (Mb per Le)

576 3391 192.23 2.129 0.0612 rate1/2

[2] 576 1.561 0.07447


960 5100 159.57 2.253 0.0648
Proposed 960 20960 97.58 2.602 0.12414
1440 7012 164.83 2.697 0.0776

1440 3.903 0.18621


1920 8924 148.72 2.644 0.0761

1920 5.204 0.24828


2304 10339 148.41 2.758 0.0793
2304 6.245 0.29794
34766

Better throughput for longer code length


Using less area to implement multiple code length and code rate
The clock cycle is shorter the [3].

[3] S. Kopparthi and D. M. Gruenbacher, "Implementation of a fiexible encoder for structured low-density parity-check codes," IEEE Pacic Rim
Conference on Communications, Computers and Signal Processing (PacRim 2007), pp.438-441, Aug. 2007.
Compare to Related Work
The comparison of throughput

The proposed encoder outperforms the work in [3] in terms of throughput


when the code length longer then 1200
The proposed encoder architecture provides better throughput for a longer
code length while the work in [3] does not have this kind of speed-up
Compare to Related Work
The comparison of throughput/area ratio

The proposed encoder outperforms the work in [3] in terms of


throughput/area ratio by 1.216 to 3.757 times
The proposed encoder utilizes hardware resources more efficiently
Compare to Related Work
We compare implementation with [2].
Compare to Related Work
The comparison of throughput

The throughput in our proposed encoder is higher then [2] in all code rate
and code length
The proposed encoder outperforms the work in [2] in terms of throughput
ratio by 1.237 to 1.963 times
Compare to Related Work
The comparison of throughput/area

The proposed encoder outperforms the work in [2] in terms of throughput


ratio by 2.427 to 5.256 times
The result shows that our proposed encoder utilizes hardware resources
efficiently
Compare to Related Work (Serial)
We compare implementation with [4].

Slices FFs LUTs Block rams CLK IT

[4] 4,724 1,807 8,335 81 186 3.34

Proposed 12,567 3,885 22,050 0 123.502 4.626

Our proposed encoder achieve higher IT in low clock.

In our proposed encoder, the matrix information are built in it without


additional blockrams.

The IT/Area of our serial encoder is 0.3681(Mbps) per slice and the
IT/Area of [4] is 0.1768.
[4] Jeong Ki KIM1, Hyunseuk YOO1 and Moon Ho LEE1, "Efficient Encoding Architecture for IEEE 802.16e LDPC Codes, " IEICE Transactions
on Fundamentals of Electronics, Communications and Computer Sciences 2008.
Outline
Introduction
Low-Density Parity-Check Codes
Related work
General encoding for LDPC codes
Efficient encoding for Dual-Diagonal matrix
Proposed Encoding scheme
LDPC Encoder Architecture
Parallel Encoder
Serial Encoder
Result
Conclusion
Conclusion

An efficient encoding architecture for IEEE 802.16e LDPC


codes with multiple code lengths and code rates are
implemented.

In our design, change between different code rate or code


length only to change the type in information data.

This architecture is also suitable the IEEE 802.11n standard.

Our encoder achieve higher throughput and better


throughput/area ratio than conventional encoding scheme
when code length longer than 1200.
Thank you!!

Vous aimerez peut-être aussi