16 - Bit RISC Processor Design For Convolution Application Using Verilog HDL

CHAPTER1
INTRODUCTION
THE RISC-16 PROCESSOR
. The RiSC- 16, for Ridiculously Simple Computer, has been developed by Prof. Bruce
Jacob at the University of Maryland with an educational aim. There are two implementations of
this architecture, a sequential one and a pipelined one. In this paper, we just give a small
description of the sequential implementation. For more information about RiSC-16, the reader is
invited to refer the three documents: [1] for the instruction set, [2] for the sequential
implementation and [3] for pipeline implementation.
The RiSC-16 is a RISC processor based upon Harvard architecture. As its name indicates, it is a
16 bits processor.
All data and instructions are in two bytes, and so, all registers and the two memories are in short-
word format. It is made up of:
one bank of eight registers, addressable in three bits. The register 0 is read-only and contains
the null value, which
is quite common among RISC processors
separated instruction and data memories. Both are addressable in sixteen bits, and hence have a
capacity of 64Kwords.
one Arithmetical-Logical Unit (ALU) that can execute three operations: addition, bitwise nand
and test of equality.
multiplexers to choose between buses. one control unit. Its functions are to decode the
Opcodes and to control the ALU, the multiplexers and the write function into the register bank
and into data memory.
a program counter (PC) and its incrementer.
an instruction register containing the instruction that is being executed.
an adder to compute jump addresses.
two sign-extended logic blocs to convert the 7 bits immediate values into the 16 bit format.
one left shift logic to convert the 10 bits immediate values into the 16 bit format.
several buses to convey data between elements.
control signals routed to the different blocs (for example, to choose the input bus of a
multiplexer).
Refer to Figure 1 to see how these are connected.

The instruction set consists of 8 instructions. Table I shows their assembler format and describes
their operation.

This processor illustrates the RISC philosophy pushed to its maximum of simplicity. In fact, the
instructions are elementary, but they are powerful enough to solve complex problems, and none
instruction can be replaced by a combination of the other ones.
The students are rapidly able to master this reduced set of 8 instructions and to write small
programs. A second strong point of the RiSC-16 is the small number of internal elements. This
permits displaying clearly all blocks on the screen. Furthermore, both the sequential and the
pipeline version were implemented on a FPGA .

CHAPTER 2
RISC( Reduced Instruction Set Computer)
An Introduction
The Reduced Instruction Set Computer, or RISC, is a microprocessor CPU
design philosophy that favors a smaller and simpler set of instructions that all take about
the same amount of time to execute. The most common RISC microprocessors are ARM,
DEC Alpha, PA-RISC, SPARC, MIPS, and IBM's PowerPC.
The idea was inspired by the discovery that many of the features that were included in traditional
CPU designs to facilitate coding were being ignored by the programs that were running on them.
Also these more complex features took several processor cycles to be performed. Additionally,
the performance gap between the processor and main memory was increasing. This led to a
number of techniques to streamline processing within the CPU, while at the same time
attempting to reduce the total number of memory accesses.
When the controller design become more complex in CISC and the performance
was also not up to expectations, people started looking on some other alternatives. It had
been found that when a processor talks to the memory the speed gets killed. So the one
improvement on CPI was to keep the instruction set very simple. Simple in not the way it
works but the way it looks. Thats why we have very few instructions in any typical
RISC architecture where processor asks data from memory probably not other than Load
and Store. We avoid keeping such addressing modes. The complexity of controller design
has been overcome with the help of operands and Opcode bits fixed in instruction
register. At the end the pipelining added a new dimension in the speed just with the help
of some additional registers. Now what pipeline does is it increases throughput by
reducing CPI. The instruction can be executed effectively in one clock cycle. The
pipelining in any kind of architecture took birth from the inherent parallelism and the idle
states of components.
The pipelined architecture could be further enhanced with the concepts known as
super-scaling. There we provide more than one execution unit. The time when one unit is

busy with the current execution task, the fetch unit can probably fetch he next instruction
which would be executed with the help of some other execution unit present in system.
Features which are generally found in RISC designs are:
uniform instruction encoding (for example the op-code is always in the same bit
position in each instruction, which is always one word long), which allows faster
decoding;
homogeneous register set, allowing any register to be used in any context and
simplifying compiler design.
simple addressing modes (complex addressing modes are replaced by sequences
of simple arithmetic instructions);
Few data types supported in hardware (for example, some CISC machines had
instructions for dealing with byte strings. Others had support for polynomials and
complex numbers. Such instructions are unlikely to be found on a RISC machine).
Over many years, RISC instruction sets have tended to grow in size. Thus, some have
started using the term "load-store" to describe RISC processors, since this is the key
element of all such designs. Instead of the CPU itself handling many addressing modes,
load-store architecture uses a separate unit dedicated to handling very simple forms of
load and store operations. CISC processors are then termed "register-memory" or
"memory-memory".
Today RISC CPUs (and microcontrollers) represent the vast majority of all CPUs in
use. The RISC design technique offers power in even small sizes, and thus has come to
completely dominate the market for low-power "embedded" CPUs. Embedded CPUs are
by far the largest market for processors. RISC had also completely taken over the market
for larger workstations for much of the 90s. After the release of the Sun SPARCstation
the other vendors rushed to compete with RISC based solutions of their own. Even the
mainframe world is now completely RISC based.
3. RISC vs CISC
3.1 CISC Designs
An overriding characteristic of CISC machines is an approach to instruction set
architecture that emphasizes doing more with each instruction. As a result, CISC
machines have a wide variety of addressing modes. CISC machines take a have it your
way approach to the location and number of operands in various instructions. As a result
instructions are of widely varying length and execution times.
3.2 The bridge toward RISC (Historical factors)
The capabilities of CISC allowed more operations to be performed into the same
program size. During that period, program and data storage were given more importance
since cost of memory was high.
An attempt was made to narrow the semantic gap, that is, the gap that existed
between machine instruction sets and high level language constructs with complicated
instructions and addressing modes to obtain performance increase. Most of these
improvements were rejected by compiler writers on the context that they did not fit
well with the language requirements and were of only limited usefulness. At the same
time, research conducted by David Patterson and Donald Knuth showed that 85% of a
programs statements were assignments, conditional or procedure calls. Nearly 80% of
the assignment statements were MOVE instructions with no arithmetic operations.
As more and more capabilities were added to the processors, it was found
increasingly difficult to support higher clock speeds that would otherwise have been
possible. Complex instructions and addressing modes worked against higher clock
speeds, because of the greater number of microscopic actions that had to be performed
per instruction. Moreover, RAM prices dropped sufficiently so that the pressure on
system designers was less to design instructions that did more that it was to design
systems that were faster. It was also becoming cost-effective to employ small amounts of
higher-speed cache memory to reduce memory latency i.e. the writing time between
when a memory is made and when it has been satisfied.

3.3 Why RISC?
Various attempts have been made to increase the instruction execution rates by
overlapping the execution of more than one instruction since the earliest day of
computing. The most common ways of overlapping are pre-fetching, pipelining and
superscalar operation.
1) Pre-fetching: The process of fetching next instruction or instructions into an
event queue before the current instruction is complete is called pre-fetching. The
earliest 16-bit microprocessor, the Intel 8086/8, pre-fetches into a non-board
queue up to six bytes following the byte currently being executed thereby making
them immediately available for decoding and execution, without latency.
2) Pipelining: Pipelining instructions means starting or issuing an instruction prior
to the completion of the currently executing one. The current generation of
machines carries this to a considerable extent. The PowerPC 601 has 20 separate
pipeline stages in which various portions of various instructions are executing
simultaneously.
3) Superscalar operation: Superscalar operation refers to a processor that can issue
more than one instruction simultaneously. The PPC 601 has independent integer,
floating-point and branch units, each of which can be executing an instruction
simultaneously.
CISC machine designers incorporated pre-fetching, pipelining and superscalar operation
in their designs but with instructions that were long and complex and operand access
depending on complex address arithmetic, it was difficult to make efficient use of these
new speed-up techniques. Furthermore, complex instructions and addressing modes hold
down clock speed compared to simple instructions. RISC machines were designed to
efficiently exploit the caching, pre-fetching, pipelining and superscalar methods that were
invented in the days of CISC machines.
4. RISC: Top level Description and guidelines
We implemented a 16-bit RISC microprocessor based on a simplified version of
the MIPS architecture. The processor has 16-bit instruction words and 16 general purpose
registers. Every instruction is completed in four cycles. An external clock is used as the
timing mechanism for the control and datapath units. This section includes a summary of
the main features of the processor, a description of the pins, a high level diagram of the
external interface of the chip, and the instruction word formats.

nstruction completion in 4 clock cycles

Fig.4 High Level Block Diagram that describes the external interface of the chip
4.1 Instruction Set Architecture (ISA)
The ISA of this processor consists of 16 instructions with a 4-bit fixed size
operation code. The instruction words are 16-bits long. The following chart describes the
instruction formats.

The Processor features five instruction classes:
1. Arithmetic (Twos Complement) ALU operation (2)
ADD: Rd = Rs + Rt
Operands A and B stored in register locations Rs and Rt are added and written to the
destination register specified by Rd.
SUB: Rd = Rs - Rt
Operand B (Rt) is subtracted from Operand A (Rs) and written to Rd.
2. Logical ALU operation (6)
AND: Rd = Rs & Rt
Operand A (Rs) is bitwise anded with Operand B (Rt) and written into Rd.
OR: Rd = Rs | Rt
Operand A (Rs) is bitwise ored with Operand B (Rt) and written into Rd.
XOR: Rd = Rs ^ Rt
Operand A (Rs) is bitwise Xored with Operand B (Rt) and written into Rd.
NOT: Rd = ~Rs
Operand A (Rs) is bitwise inverted and written into Rd.
SLA: Rd = Rs << 1
Operand A (Rs) is arithmetically shifted to the left by one bit and written into Rd.
SRA: Rd = Rs >> 1
Operand A (Rs) is arithmetically shifted to the right by one bit and written into Rd. The
MSB (sign bit) will be preserved for this operation.
3. Memory operations (3)
LI: Rd = 8-bit Sign extended Immediate
The 8-bit immediate in the Instruction word is sign-extended to 16-bits and written into
the register specified by Rd.
LW: Rd = Mem[Rs]
The memory word specified by the address in register Rs is loaded into register Rd.
SW: Mem[Rs] = Rt
The data in register Rt is stored into the memory location specified by Rs.
4. Conditional Branch operations (2)
BIZ: PC = PC + 1 + Offset if Rs = 0
If all the bits in register Rs are zero than the current Program Count (PC + 1) is offset to
PC + 1 + Offset. The count is offset from PC + 1 because it is incremented and stored
during the Fetch cycle.
BNZ: PC = PC + 1 + Offset if Rs! = 0
If all the bits in register Rs are not zero than the current Program Count (PC + 1) is offset
to PC + 1 + Offset.
5. Program Count Jump operations (3)
JAL: Rd = PC + 1 and PC = PC + 1 + Offset
Jump and Link instruction would write current Program Count in register Rd and offset
the program count to PC + 1 + Offset
JMP: PC = PC + 1 + Offset
Unconditional jump instruction will offset the program count to PC + 1 + Offset.
JR: PC = Rs
Jump Return instruction will set the Program Count to the one previously stored in JAL.
FETCH INSTRUCTION
Part 1

Part 2

18
EXECUTE INSTRUCTION
Part 1

Part 2
data into Register File

4.2 MICRO-ARCHITECTURE
The micro-architecture refers to a view of the machine that exposes the registers,
buses and all other important functional units such as ALUs and counters. The principle
subsystems of a processor are the CPU, main memory and the input/output. The data path
and the control unit interact to do the actual processing task. The control unit receives
signals from the data path and sends control signals to the data oath. These signal s
control the data flow within the CPU and between the CPU and the main memory and
Input/Output.

Program Counter

Fig.4.2.1 Program Counter
Instruction Register and Register File

Fig.4.2.2 Instruction Register and RegFile

ALU and Operand Registers

Fig.4.2.3 ALU and Operand Registers
Control Unit Design
The Control FSM has only three distinct states that determine the operation of the
processor: IDLE, FETCH and EXECUTE. Here fetch and Execute is further divided into
two states, Fetch instruction state and Fetch operands state. Similarly Execute state also
divided into two parts. When the reset signal (reset_s1) goes high from any state, the
FSM will be placed in the IDLE state. While in the IDLE state the control unit will send
the PC write enable signal (pc_wrt_s2 = 1) and select zero (pc_sel_s2 = 0) as the current
Program count.

Fig.4.2.4 Control unit and Control signals
When the reset signal goes low, the FSMs next state will be the FETCH state and
the instruction from Memory address 0 will be loaded into the Instruction Register (IR) to
begin program execution. The control looks at the next state = FETCH and generates the
IR write (ir_wrt_s1), Operand A Select (opA_sel_s1), Operand B Select (opB_sel_s1 =
0010) and the ALU add operation (alu_op_s1 = 00000001) to load the IR with the next
instruction and increment the PC by 1. These events all occur on the first clock of the
FETCH state. One-hot signals are used for alu_op_s1, opB_sel_s1, and data_sel_s2 to
make for easier decoding in the datapath units. The operation at the next phase of FETCH
will be determined by the opcode (opcode_s2) from the IR, except for the incremented
PC that is written in from the ALU ouput latch in all cases. The ALU Operations will
load in Operands A and B from the Register File. The Load word will only need Operand
A, while the Store word will need both operands (one for the address and one for the data
word). The Branch instructions will use the offset in its instruction word and PC + 1
count as operands into the ALU. The JAL stores the incremented PC in the Register File,
while the JR loads the return address into Operand A.
After phase two of the FETCH state, the FSM enters the EXECUTE state. During
the first phase for an ALU operation, the appropriate alu_op_s1 control signals are sent to
the ALU as decoded from the opcode. The operand mux (opA_sel_s1 & opB_sel_s1)
control signals are also generated to select the latch outputs. For the other operations
(except LI), an add operation is required from the ALU. The operands chosen for the add
are determined by the operation specified. The Load and Store words will access Memory
on this first phase as well. The second phase of EXECUTE writes data into the register
file or writes a new address into the PC. For the branch instruction, the control will look
at the check zero signal from operand A to determine if the branch should be taken and
the new PC should be written. The control returns the next state to FETCH to repeat the
process for the next instruction.

Arithmetic Logic Unit
An arithmetic and logic unit (ALU) is contained within a central processor unit (CPU).
The ALU is a dedicated collection of high speed circuits that performs the arithmetic and logical
operations of a computer. The ALU can be physically located adjacent to, or underneath, the
processor register. The ALU can be formed in the shape of a square grid.
The arithmetic and logic unit works in concert with a control unit, internal memory, and
registers. All together, these functions comprise the CPU. Where the ALU performs
mathematical computation, logic decisions and processing of data taken from the registers, the
control unit itself will read program instructions, farm out tasks of processing to the ALU, and
ensure that the proper sequence is followed according to program instructions.
Block Diagram

Signals from ID
We receive and use the following signals from the ID stage:
ALUSrc
o Determines whether or not the second operand of the register is the immediate
value.
ALUOp+func
o The combined signal of the ALUOp and func fields of each instruction.
BNE?
o Indicates if an instruction is BNE.
BEQ?
o Indicates if an instruction is BEQ.
Immediate
o The sign extended immediate field from ID.
Jump?
o Indicates if an instruction is j-type.
JR?
o Indicates if an instruction is jr.
Linked?
o Indicates if an instruction needs a return address.
LUI?
o Indicates if an instruction is LUI.
Op1
o The value of the register in the first operand field.
Op2
o The value of the register in the second operand field.
PC+4
o The location of the next instruction when the instruction first comes from IF.
Shamt
o The shift amount given to the ALU.
ALU Overview
Below is a table of addresses with their corresponding control bits:
Instruction BEQ? BNE? JUMP? LINK? IMM? JR? LUI ALUOp
ADDU 0 0 0 0 0 0 0 ADD
SUBU 0 0 0 0 0 0 0 SUB
AND 0 0 0 0 0 0 0 AND
OR 0 0 0 0 0 0 0 OR
NOR 0 0 0 0 0 0 0 NOR
SLT 0 0 0 0 0 0 0 SLT
SLTU 0 0 0 0 0 0 0 SLTU
SLL 0 0 0 0 0 0 0 SLL
SRL 0 0 0 0 0 0 0 SRL
JR 0 0 1 0 0 1 0 ~
JALR 0 0 1 1 0 0 0 ~
J 0 0 1 0 1 0 0 ~
JAL 0 0 1 1 0 0 0 ~
ADDIU 0 0 0 0 1 0 0 ADD
ANDI 0 0 0 0 1 0 0 AND
ORI 0 0 0 0 1 0 0 OR
SLTI 0 0 0 0 1 0 0 SLT
SLTIU 0 0 0 0 1 0 0 SLTU
LUI 0 0 0 0 1 0 1 LUI
LW 0 0 0 0 1 0 0 ADD
SW 0 0 0 0 1 0 0 ADD
BEQ 1 0 0 0 0 0 0 SUB
BNE 0 1 0 0 0 0 0 SUB
We designed our ALU to respond to the following OpCodes:
Op Code SIG0
ADD 000000001 0
SUB 000000001 1
OR 000000010 0
AND 000000100 0
NOR 000001000 0
SLT 000010000 1
SLTU 000100000 1
SLL 001000000 0
SRL 010000000 0
LUI 100000000 0
SIG0 is used to indicate sign in the adder.
ALU Design
Arithmetic
Addition
The first operation we designed the ALU to do was addition. We decided to implement
this using a ripple carry adder. The advantages of a ripple carry adder are the simplicity of its
logic and the ease of extending its logic more places. It's disadvantage is that it's slow. Adders
like the carry look ahead adder are much faster. For our design, though, we believe that the ID
and MEM stages will be taking up enough time such that the speed differences of the two are not
a huge concern. Seeing as it is the first instruction, we gave it the OpCode
0000000010

Our adder diagram
The last zero will be made apparent in the next section.
Subtraction
The next operation we designed was the awkward stepbrother of addition: subtraction. Rather
than implementing a separate adder dedicated solely to doing addition, we decided to modify our
existing adder to do both.
Subtraction is simple to addition of one number and the two's complement of another number.
That is, the second input is inverted and then a one is added to it. So, we have
A-B=A+(!B+1)
Because it is addition, we can rearrange it to be
A-B=(A+!B)+1
Now it is easy to see that the subtraction of A and B is the addition of A and the inversion of B
plus 1. So, we place a multiplexor in front of the second input of the adder: one choice is to
chose the signal unmodified; the other is to choose the inverted part of B. Then to add one, we
set the carry in bit of the adder high. So, in the interest of saving a little bit of logic, we tied the
carry in signal of the ALU and the deciding input of the multiplexor to the same bit ... (insert
dramatic overture here) ... the last one. So, the subtraction gets the OpCode:
0000000011

Addition and subtraction combination circuit.
From here on, we will refer to the last bit as SIG0 because it sounds ominous.
.

Set Less Than
SLT
The purpose of the set less than operation is te determine whether the first input of the ALU is
less than the second, or A<?B. So, if the statement is true, A<B -> A-B<0. So, if A-B is negative,
then SLT is true. As it turns out, the last sum bit of the adder is the sign of the result. So, if we tie
the least significant bit of the SLT operation to the last bit of the adder and then the rest to
ground (becauseSLT only outputs 1 or 0) we have given the ability to compare to values to our
adder. We gave this operation the OpCode:
0000100001

The addition of SLT to our previous circuit

SLTU
There arises a slight bit of more difficulty in implementing unsigned set less than. Let us discuss
the issues of simply implementing the exact same logic used in the SLT operation for the SLTU
operation by looking at a 4-bit example. Suppose you have two binary numbers: 1100 and 0011.
In unsigned decimal, those numbers are 12 and 3. Obviously 12<3 is false. If we implement it
with the same logic as above, though, it becomes:
1100-0011 = 1100 + 1101 = (1)001

The addition of SLTU to our previous circuit resulting in the complete adder component.
This would imply that 12 is less than 3. This is a result of the fact that 1100 is actually -4.
So, -4<3 is true.
The solution to this problem actually implements only a handful more gates. If we extend our
adder to 33-bits, 1100 can be explicitly 01100 which is 12, thus getting rid of it's ambiguity with
-4. Also, since this 33rd gate is only used in SLTU instructions, we can tie it's other input to 1.
So, now we have
01100-10011 = 01100+11101 = (0)1001
This is obviously the result we need. The savvy reader will note that this is only an XOR
between 0, 1 and the 32nd carry bit. The savvier readier will note that that is just inversion of the
32nd carry bit. We gave this operation the OpCode:
0001000001
Logic
OR
The thing to know about the OR function is that just like the AND function the OR function is a
bitwise function that will OR each individual bit. In order to have the ALU handle this we used
32 or gates in parallel for each individual bit. The OR operation gets the OpCode:
00000101

OR gate
AND
The thing to know about the and function inside the ALU is that it is a bitwise AND meaning that it
takes each bit and AND's with the same bit from another input. In order to accomplish this with
inside of a 32-bit adder would mean using 32 and gates all in parallel with each other as shown
below. The AND operation gets the OpCode:
0000001001

AND Gate
NOR
The thing to know about the NOR function inside the ALU is that the NOR function is a bitwise
function where each and every bit will be NOR'd. The ALU has 32 NOR gates that are going to
handle this function.The NOR operation gets the OpCode:
0000010001

NOR Gate
Shifting
The MIPS processor implements only one type of shift: logical. After shifting bits in the necessary
direction, the output bits that did not receive a value from the input are given the value 0.
SLL
To shift left, each bit in the output determines it's value by a multiplexor that chooses between one of
the 32 input bits and 0's. In order to simplify the logic, we have a block whose job it is to change the
input 5-bit shamt to a one-hot encoded shamt. This reduces the amount of time it takes to process the
shift command. Like shifting left in base
10
is multiplying by ten , shifting left in base
2
is
multiplication. A word of caution, though: using this method to multiply large numbers may result in
incorrect answers as bits important to the value of the number in the most significant portion of the
word may be cut off.
We gave this the OpCode:
0010000000

Shift left logical circuit using one hot encoding
SRL
Shifting right is the same as above, but with the bits chosen in reverse.
Like dividing by ten in base
10
is simply a shift right of all the digits, a shit right in base
2
is a division
by two. But a word of caution: this operation puts zeros in place of bits not defined by the input.
Thus, using this method to divide negative numbers will change the value and proper steps must be
take to fix that.
This has the OpCode:
0100000000

Shift right logical circuit using one hot encoding.

LUI
When loading an immediate into the upper 16 bits of a register, the sign extended immediate needs to
be shifted left 16 times. So, that is exactly what we designed. We decided to implement an explicit shift
left 16 module to reduce the amount of logic we needed. This has the OpCode:
1000000000

Our load upper immediate circuit

ALU Datapath

Immediates
Immediate Implementation
Immediates are handled outside of the ALU. Rather than creating separate functions for immediate
operations, we placed a MUX in front of the second operand of the ALU that would choose between
the immediate field and register value using the signal ALUSrc to choose between the two.

An ALU with a multiplexor on the input to allow working with immediate.

Immediate Datapath

Not Zero?
The real question for this section is, "Why not just have a line that is high when the result of the
ALU is 0?" The response, to find out if the result is not zero, we need only have a 32-bit wide
OR gate. If any of the bits is not zero, it will be high. On the other hand, if we wanted a ZERO?
line, we would have to NOT this gate. Our result allows us to leave out this extra gate.

CHAPTER 3
The trend in the recent past shows the RISC processors clearly outsmarting the earlier CISC
processor architectures. The reasons have been the advantages, such as its simple, flexible and
fixed instruction format and hardwired control logic, which paves for higher clock speed, by
eliminating the need for microprogramming. The combined advantages of high speed, low
power, area efficient and operation-specific design possibilities have made the RISC processor
ubiquitous.
The main feature of the RISC processor is its ability to support single cycle operation, meaning
that the instruction is fetched from the instruction memory at the maximum speed of the
memory. RISC processors in general, are designed to achieve this by pipelining, where there is a
possibility of stalling of clock cycles due to wrong instruction fetch when jump type instructions
are encountered. This reduces the efficiency of the processors. This paper describes a RISC
architecture in which, single cycle operation is obtained without using a pipelined design. It
averts possible stalling of clock cycles in effect [1]-[3].
The development of CMOS technology provides very high density and high performance
integrated circuits. The performance provided by the existing devices has created a never-ending
greed for increasingly better performing devices.
This predicts the use of a whole RISC processor as a basic device by the year 2020. However, as
the density of IC increases, the power consumption becomes a major threatening issue along
with the complexity of the circuits. Hence, it becomes necessary to implement less complex, low
power processor designs. Energy recovery is proving to be a promising approach
for the design of low power VLSI circuits. In recent years, studies on adiabatic computing have
grown for low power systems and several adiabatic logic families have been proposed [4]-[6]. In
this regard, we have utilized 2N-2N2P quasi-adiabatic logic in the design of incrementer circuits
and carry select adder circuits along with architectural changes, in order to prove the power
efficiency of the proposed structures with those found in the literatures.
Program counter is one of the most complex building blocks of the processor design. It performs
mainly two operations, namely, incrementing and loading. In order to address this issue, the
present work establishes a novel design of an incrementer structure [7]. It is realized using the
quasiadiabatic 2N-2N2P logic structure. The structure incurs 17 times reduction in power, while
comparing against conventional CMOS counterpart. A speed increase of about 25% and a
marginal amount of 8% area saving are achieved.
The second part of this work concentrates on the complexity reduction in ALU by optimizing the
design of arithmetic circuits. The previous works in literature focus on energy efficient
arithmetic circuits. In order to increase the operating speed and power efficiency of the
processor, we have come out with a novel design of a carry select adder structure [8]. This has
also been compared with the conventional adders to validate the design in terms of powerdelay
product.
In order to employ the processor for signal processing applications, we have integrated a
modified Wallace tree multiplier that uses compressor circuits to achieve low power,
high speed operation [9], in the ALU. Reference [10] suggests that it is possible to achieve the
high speed, low power and area efficient operations by reducing the stronger operations such as
multiplication, at the cost of increasing the weaker operations such as addition.

Fig.1 16-bit Non-Pipelined RISC Processor
Convolution is an important signal processing application which is used in filter design. Many
algorithms have been proposed in order to achieve an optimized performance of the filters by
optimizing the convolution design. Modified Winograd algorithm is notable among them. They
require 4 multiplication operations only, when compared to 6 for a 3*2 normal convolution
methodology with an additional rise in the number of adders to 9 from 4. In this work, we have
designed and developed a 16-bit single cycle non-pipelined RISC processor. In order to
improve the performance, modification on incrementer circuit and carry select adder circuit have
been done and modified structure has been integrated into the design and the performance is
validated. A multiplier structure has been developed and modified Winograd algorithm is
executed in order to validate our claim.

DESIGN OF 16-BIT RISC CPU
Architecture
The architecture of the proposed RISC CPU is a uniform 16-bit instruction format, single cycle
non-pipelined processor. It has a load/store architecture, where the operations will only
be performed on registers, and not on memory locations. It follows the classical von-Neumann
architecture with just one common memory bus for both instructions and data. A total of
27 instructions are designed as a first step in the process of development of the processor. The
instruction set consists of Logical, Immediate, Jump, Load, store and HALT type of instructions.
The Halt instruction acts as a border line between the instruction and data memory. This offers
the flexibility to the programmer, who uses this processor core to define their own instruction
and data memory within the allotted 64 memory registers. Each of the register is of 16-bits width
capacity. The bit widths of each unit are as follows. Instruction Unit : 16 bits
Execution Unit : 8 bits
Memory Unit : 16 bits
Op-code Width : 5 bits.
Detail of Logical Blocks
Figure 2 illustrates the block diagram of the 16-bit RISC CPU. The proposed RISC CPU consists
of five blocks, namely, Arithmetic and Logical Unit (ALU), Program Counter (PC), Register file
(REG), Instruction Decoder Unit (IDU) and Clock Control Unit (CCU). The data-path of the
proposed CPU in Fig. 1 is explained as follows.
1) Program Counter: The Program Counter (PC) is a 16-bit latch that holds the memory address
of location, from which the next machine language instruction will be fetched by the
processor. The proposed PC is the largest sub-block and second to the control unit in complexity.
It controls the flow of the instructions execution and it ensures the logical operation flow of the
processor. It performs the two operations, namely, incrementing and loading. For most
instructions, the PC is simply incremented in preparation for the following instruction or the
following instruction nibbles. In general, a normal conventional adder circuit will be used for
incrementing action. However, it leads to increased hardware use along with more power
dissipation. Hence, this work strives for a low power and novel incrementer circuit design.
In this design, we employ a 6-bit pointer to indicate the instruction memory. It additionally uses
a 6-bit pointer to point to the data memory, which will be used only when a Load/Store
instruction is encountered for execution.
2) Arithmetic and Logic unit: The arithmetic and logic unit (ALU) performs arithmetic and logic
operations. It also performs the bit operations such as rotate and shift by a defined number of bit
positions. The proposed ALU contains three sub-modules, viz. arithmetic, logic and shift
modules.
The arithmetic unit involves the execution of addition operations and generates Sign flag and
Zero flag as per the result shown in the process. In order to reduce the complexity
of the adder circuits used in the arithmetic unit of the RISC CPU, a very fast and low power
carry select adder circuit has been introduced. The ALU also consists of a modified
Wallace tree multiplier, which uses compressor circuits to achieve low power and improved
speed of operation. The multiplier is designed to execute in a single cycle. Hence, it
satisfies the requirement of the RISC design, to execute single cycle instructions.
The shift module is used for executing instructions such as rotation and shift operations. The
shift module is mandatory for signal processing applications, which needs division by 2.
This is achieved by a single right shift operation. The logic unit is used to perform logical
operations, such as, Ex-or, OR, and AND. The Data out of each ALU operation is written
back into the corresponding destination register, along with the flags updated. In order to
maintain simplicity of the design, the carry out of the ALU is not taken into consideration.
3) Register File: The register file consists of 8 general purpose registers of 16-bits capacity each.
These register files are utilized during the execution of arithmetic and data-centric instructions. It
is fully visible to the programmer. It can be addressed as both source and destination using a 3-
bit identifier. The register addresses are of 3-bit length, with the range of 000 to 111. The load
instruction is used to load the values into the registers and store instruction is used to
retrieve the values back to the memory to obtain the processed outputs back from the processor.
The Link register is used to hold the addresses of the corresponding memory locations.
4) Instruction Decoder Unit: Our instruction set is limited yet comprehensive. Since our data bus
is only 5 bits wide, it was decided to keep the number of instructions supported within 32 for
easier implementation. At present, only 27 instructions have been implemented. The rest have
been reserved for porting digital processing applications into our processor. The decoder units
decodes the instruction and gives out the 3-bit source and destination addresses respectively,
depending on the op-codes operation and it also decides whether the writeback circuit has to be
enabled or not.
In case of Load/Store instructions, the IDU updates the Link register. In case of Jump
instructions, if the conditions are satisfied, the IDU updates the PC register with the new
address from where the next instruction has to be retrieved rather than the normal incremented
value.
Figure 2(a) shows the instruction format followed by Logical instructions and Data transfer
instructions, such as, MOV, AND, OR, XOR, ADD, SUB, SL (Shift Left), RL
(Rotate Right), SR (Shift Right), RR (Right Rotate), SWAP and Multiply instructions. Fig. 2(b)
shows the instruction format followed by Immediate instructions. This type has the
data vested into the instructions, such as LHI (Load 8-bit value into the eight higher significant
bits of the given register), LLI (Load 8-bit value into given registers 8 least significant bits),
ANDI, ORI, XORI, ADDI, SUBI. Figure 2(c) depicts the format of Load instruction. The
instruction format for Store instruction is given in Fig. 2(d). Fig. 2(e) depicts the instruction
format for JUMP, JZ (Jump if Zero), JNZ (Jump if not Zero), JP (Jump if positive), JN
(Jump if Negative) instructions. Fig. 2(f) shows the format for the HALT command.
5) Clock Control Unit: Efficient phase scheduling is required to optimize the throughput and the
energy consumption of the processor. In this paper, we propose a clock control unit (CCU)
which is tasked with efficient phase scheduling, to select the various blocks of the processor.

Fig. 2 (a) to (f) 16-Bit Instruction Format
III. IMPLEMENTATION OF MODIFIED WINOGRAD
CONVOLUTION ALGORITHM
To describe the functionality of the processor towards its use for signal processing applications,
we have executed a 3*2 modified Winograd algorithm by implementing its pneumonic
code as shown below:
1. Load x0
2. Load x1
3. Load x2
4. Mov x2
5. Add x0, x2
6. Mov x1
7. Add (5), x1
8. Sub (5), x1
9. Mov Reg D to Reg E
10. Load H0
11. Load H1
12. Load H2
13. Load H3
14. Mul H0, X0
15. Mul H1, X1
16. Mul H2, X2
17. Mul H3, X3
18. SR (13) gives S0
20. Add (16), (17)
21. Sub (20), (15)
23. Add (15), (16)
24. Sub (14), (15)
26. HLT
The respective op-codes are initially stored in the instruction memory of the processor. The
inputs are stored in the corresponding data memory, which should, in this case lie beyond the
memory location 26. The maximum size of inputs x0, x1, x2 can each be of 6 bits width, while
the values of h0 and h1 can each be of 3 bits width.
The execution of the program gives the required results S0, S1, S2, and S3 after 26 clock cycles,
thus validating the single cycle instruction execution of the processor. It also proves its use in a
typical signal processing application. Repeated execution of multiplication and addition
instructions can be utilized in order to execute the MAC operation, which is widely used in
signal processing applications. It would thus eliminate the need for a dedicated MAC unit.

INTRODUCTION TO VERILOG HDL
What is HDL
A typical Hardware Description Language (HDL) supports a mixed-level description in
which gate and netlist constructs are used with functional descriptions. This mixed-level
capability enables you to describe system architectures at a high level of abstraction, then
incrementally refine a designs detailed gate-level implementation.
HDL descriptions offer the following advantages:
We can verify design functionality early in the design process. A design written as an HDL
description can be simulated immediately. Design simulation at this high level at the
gate-level before implementation allows you to evaluate architectural and design decisions.
An HDL description is more easily read and understood than a netlist or schematic description.
HDL descriptions provide technology-independent documentation of a design and its
functionality. Because the initial HDL design description is technology independent, you
can use it again to generate the design in a different technology, without having to
translate it from the original technology.
Large designs are easier to handle with HDL tools than schematic tools.
Verilog Overview :
Introduction
Verilog is a HARDWARE DESCRIPTION LANGUAGE (HDL). A hardware description
Language is a language used to describe a digital system, for example, a microprocessor or
a memory or a simple flip-flop. This just means that, by using a HDL one can describe any
hardware (digital ) at any level.
Verilog provides both behavioral and structural language structures. These structures allow
expressing design objects at high and low levels of abstraction. Designing hardware with a
language such as Verilog allows using software concepts such as parallel processing and
object-oriented programming. Verilog has a syntax similar to C and Pascal.
Design Styles
Verilog like any other hardware description language permits the designers to create a design in
either Bottom-up or Top-down methodology.
Bottom-Up Design
The traditional method of electronic design is bottom-up. Each design is performed at the
gate-level using the standard gates. With increasing complexity of new designs this
approach is nearly impossible to maintain. New systems consist of ASIC or
microprocessors with a complexity of thousands of transistors. These traditional bottom-up
designs have to give way to new structural, hierarchical design methods. Without these new
design practices it would be impossible to handle the new complexity.
Top-Down Design
The desired design-style of all designers is the top-down design. A real top-down design
allows early testing, easy change of different technologies, a structured system design and
offers many other advantages. But it is very difficult to follow a pure top-down design. Due
to this fact most designs are mix of both the methods, implementing some key elements of
both design styles.
Complex circuits are commonly designed using the top down methodology. Various
specification levels are required at each stage of the design process.
Abstraction Levels of Verilog
Verilog supports a design at many different levels of abstraction. Three of them are very
important:

-Transfer Level
Gate Level
Behavioral level
This level describes a system by concurrent algorithms (Behavioral). Each algorithm itself is
sequential, that means it consists of a set of instructions that are executed one after the other.
Functions, Tasks and Always blocks are the main elements. There is no regard to the structural
realization of the design.
Register-Transfer Level
Designs using the Register-Transfer Level specify the characteristics of a circuit by operations
and the transfer of data between the registers. An explicit clock is used. RTL design contains
exact timing possibility; operations are scheduled to occur at certain times. Modern definition
of a RTL code is "Any code that is synthesizable is called RTL code".
Gate Level
Within the logic level the characteristics of a system are described by logical links and
their timing properties. All signals are discrete signals. They can only have definite logical
values (`0', `1', `X', `Z`). The usable operations are predefined logic primitives (AND, OR,
NOT etc gates). Using gate level modeling might not be a good idea for any level of logic
design. Gate level code is generated by tools like synthesis tools and this Netlist is used for gate
level simulation and for backend.
vlsi design flow
Introduction
Design is the most significant human endeavor: It is the channel through which creativity is
realized. Design determines our every activity as well as the results of those activities; thus it
includes planning, problem solving, and producing. Typically, the term "design" is applied
to the planning and production of artifacts such as jewelry, houses, cars, and cities. Design is
also found in problem-solving tasks such as mathematical proofs and games. Finally,
design is found in pure planning activities such as making a law or throwing a party.
More specific to the matter at hand is the design of manufacturable artifacts. This
activity uses all facets of design because, in addition to the specification of a producible
object, it requires the planning of that object's manufacture, and much problem solving
along the way. Design of objects usually begins with a rough sketch that is refined by
adding precise dimensions. The final plan must not only specify exact sizes, but also include a
scheme for ordering the steps of production. Additional considerations depend on the
production environment; for example, whether one or ten million will be made, and how
precisely the manufacturing environment can be controlled.
A semiconductor process technology is a method by which working circuits can be
manufactured from designed specifications. There are many such technologies, each of
which creates a different environment or style of design.

XILINX
Migrating Projects from Previous ISE Software Releases
When you open a project file from a previous release, the ISE software prompts you to migrate
your project. If you click Backup and Migrate or Migrate Only, the software automatically
converts your project file to the current release. If you click Cancel, the software does not
convert your project and, instead, opens Project Navigator with no project loaded.
Note: After you convert your project, you cannot open it in previous versions of the ISE
software, such as the ISE 11 software. However, you can optionally create a backup of the
original project as part of project migration, as described below.
To Migrate a Project
1. In the ISE 12 Project Navigator, select File > Open Project.
2. In the Open Project dialog box, select the .xise file to migrate.
Note You may need to change the extension in the Files of type field to display .npl
(ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE 10 software) project files.
3. In the dialog box that appears, select Backup and Migrate or Migrate Only.
4. The ISE software automatically converts your project to an ISE 12 project.
Note If you chose to Backup and Migrate, a backup of the original project is created at
project_name_ise12migration.zip.
5. Implement the design using the new version of the software.
Note Implementation status is not maintained after migration.

Properties
For information on properties that have changed in the ISE 12 software, see ISE 11 to ISE 12
Properties Conversion.
IP Modules
If your design includes IP modules that were created using CORE Generator software or
Xilinx Platform Studio (XPS) and you need to modify these modules, you may be required to
update the core. However, if the core netlist is present and you do not need to modify the core,
updates are not required and the existing netlist is used during implementation.
Obsolete Source File Types
The ISE 12 software supports all of the source types that were supported in the ISE 11
software.
If you are working with projects from previous releases, state diagram source files (.dia), ABEL
source files (.abl), and test bench waveform source files (.tbw) are no longer supported. For state
diagram and ABEL source files, the software finds an associated HDL file and adds it to the
project, if possible. For test bench waveform files, the software automatically converts the TBW
file to an HDL test bench and adds it to the project. To convert a TBW file after project
migration, see Converting a TBW File to an HDL Test Bench.
Migrating Projects from Previous ISE Software Releases
When you open a project file from a previous release, the ISE software prompts you to migrate
your project. If you click Backup and Migrate or Migrate Only, the software automatically
converts your project file to the current release. If you click Cancel, the software does not
convert your project and, instead, opens Project Navigator with no project loaded.
Note After you convert your project, you cannot open it in previous versions of the ISE
software, such as the ISE 11 software. However, you can optionally create a backup of the
original project as part of project migration, as described below.
To Migrate a Project
1. In the ISE 12 Project Navigator, select File > Open Project.
2. In the Open Project dialog box, select the .xise file to migrate.
Note You may need to change the extension in the Files of type field to display .npl
(ISE 5 and ISE 6 software) or .ise (ISE 7 through ISE 10 software) project files.
3. In the dialog box that appears, select Backup and Migrate or Migrate Only.
4. The ISE software automatically converts your project to an ISE 12 project.
Note If you chose to Backup and Migrate, a backup of the original project is created at
project_name_ise12migration.zip.
5. Implement the design using the new version of the software.
Note Implementation status is not maintained after migration.
Properties
For information on properties that have changed in the ISE 12 software, see ISE 11 to ISE 12
Properties Conversion.
IP Modules
If your design includes IP modules that were created using CORE Generator software or
Xilinx Platform Studio (XPS) and you need to modify these modules, you may be required to
update the core. However, if the core netlist is present and you do not need to modify the core,
updates are not required and the existing netlist is used during implementation.
Obsolete Source File Types
The ISE 12 software supports all of the source types that were supported in the ISE 11
software.
If you are working with projects from previous releases, state diagram source files (.dia), ABEL
source files (.abl), and test bench waveform source files (.tbw) are no longer supported. For state
diagram and ABEL source files, the software finds an associated HDL file and adds it to the
project, if possible. For test bench waveform files, the software automatically converts the TBW
file to an HDL test bench and adds it to the project. To convert a TBW file after project
migration, see Converting a TBW File to an HDL Test Bench.
Using ISE Example Projects
To help familiarize you with the ISE software and with FPGA and CPLD designs, a set of
example designs is provided with Project Navigator. The examples show different design
techniques and source types, such as VHDL, Verilog, schematic, or EDIF, and include different
constraints and IP.
To Open an Example
1. Select File > Open Example.
2. In the Open Example dialog box, select the Sample Project Name.
Note To help you choose an example project, the Project Description field describes
each project. In addition, you can scroll to the right to see additional fields, which
provide details about the project.
3. In the Destination Directory field, enter a directory name or browse to the
directory.
4. Click OK.
The example project is extracted to the directory you specified in the Destination Directory
field and is automatically opened in Project Navigator. You can then run processes on the
example project and save any changes.
Note If you modified an example project and want to overwrite it with the original example
project, select File > Open Example, select the Sample Project Name, and specify the same
Destination Directory you originally used. In the dialog box that appears, select Overwrite the
existing project and click OK.
Creating a Project
Project Navigator allows you to manage your FPGA and CPLD designs using an ISE project,
which contains all the source files and settings specific to your design. First, you must create a
project and then, add source files, and set process properties. After you create a project, you can
run processes to implement, constrain, and analyze your design. Project Navigator provides a
wizard to help you create a project as follows.
Note If you prefer, you can create a project using the New Project dialog box instead of the
New Project Wizard. To use the New Project dialog box, deselect the Use New Project wizard
option in the ISE General page of the Preferences dialog box.

To Create a Project
1. Select File > New Project to launch the New Project Wizard.
2. In the Create New Project page, set the name, location, and project type, and
click Next.
3. For EDIF or NGC/NGO projects only: In the Import EDIF/NGC Project page,
select the input and constraint file for the project, and click Next.
4. In the Project Settings page, set the device and project properties, and click
Next.
5. In the Project Summary page, review the information, and click Finish to
create the project.
Project Navigator creates the project file (project_name.xise) in the directory you specified.
After you add source files to the project, the files appear in the Hierarchy pane of the Design
panel. Project Navigator manages your project based on the design properties (top-level
module type, device type, synthesis tool, and language) you selected when you created the
project. It organizes all the parts of your design and keeps track of the processes necessary to
move the design from design entry through implementation to programming the targeted
Xilinx device.
Note For information on changing design properties, see Changing Design Properties.
You can now perform any of the following:
Create new source files for your project.
Add existing source files to your project.
Run processes on your source files.
Modify process properties.
Creating a Copy of a Project
You can create a copy of a project to experiment with different source options and
implementations. Depending on your needs, the design source files for the copied project and
their location can vary as follows:
Design source files are left in their existing location, and the copied project
points to these files.
Design source files, including generated files, are copied and placed in a
specified directory.
Design source files, excluding generated files, are copied and placed in a
specified directory.
Copied projects are the same as other projects in both form and function. For example, you can
do the following with copied projects:
Open the copied project using the File > Open Project menu command.
View, modify, and implement the copied project.
Use the Project Browser to view key summary data for the copied project and
then, open the copied project for further analysis and implementation, as described in
Using the Project Browser.
Note Alternatively, you can create an archive of your project, which puts all of the project
contents into a ZIP file. Archived projects must be unzipped before being opened in Project
Navigator. For information on archiving, see Creating a Project Archive.
To Create a Copy of a Project
1. Select File > Copy Project.
2. In the Copy Project dialog box, enter the Name for the copy.
Note The name for the copy can be the same as the name for the project, as long as you
specify a different location.
3. Enter a directory Location to store the copied project.
4. Optionally, enter a Working directory.
By default, this is blank, and the working directory is the same as the project directory.
However, you can specify a working directory if you want to keep your ISE project
file (.xise extension) separate from your working area.
5. Optionally, enter a Description for the copy.
The description can be useful in identifying key traits of the project for reference later.
6. In the Source options area, do the following:
o Select one of the following options:
o Keep sources in their current locations - to leave the design
source files in their existing location.
If you select this option, the copied project points to the files in their
existing location. If you edit the files in the copied project, the changes
also appear in the original project, because the source files are shared
between the two projects.
o Copy sources to the new location - to make a copy of all the
design source files and place them in the specified Location directory.
If you select this option, the copied project points to the files in the specified
directory. If you edit the files in the copied project, the changes do not appear in
the original project, because the source files are not shared between the two
projects.
o Optionally, select Copy files from Macro Search Path directories to
copy files from the directories you specify in the Macro Search Path property in
the Translate Properties dialog box. All files from the specified directories are
copied, not just the files used by the design.
Note If you added a netlist source file directly to the project as described in
Working with Netlist-Based IP, the file is automatically copied as part of Copy
Project because it is a project source file. Adding netlist source files to the project
is the preferred method for incorporating netlist modules into your design,
because the files are managed automatically by Project Navigator.
o Optionally, click Copy Additional Files to copy files that were not
included in the original project. In the Copy Additional Files dialog box, use the
Add Files and Remove Files buttons to update the list of additional files to copy.
Additional files are copied to the copied project location after all other files are
copied.
7. To exclude generated files from the copy, such as implementation results and
reports, select Exclude generated files from the copy.
When you select this option, the copied project opens in a state in which processes have
not yet been run.
8. To automatically open the copy after creating it, select Open the copied project.
Note By default, this option is disabled. If you leave this option disabled, the original
project remains open after the copy is made.
Click OK.
Creating a Project Archive
A project archive is a single, compressed ZIP file with a .zip extension. By default, it contains all
project files, source files, and generated files, including the following:
User-added sources and associated files
Remote sources
Verilog `include files
Files in the macro search path
Generated files
Non-project files
To Archive a Project
1. Select Project > Archive.
2. In the Project Archive dialog box, specify a file name and directory for the ZIP
file.
3. Optionally, select Exclude generated files from the archive to exclude
generated files and non-project files from the archive.
4. Click OK.
A ZIP file is created in the specified directory. To open the archived project, you must first
unzip the ZIP file, and then, you can open the project.
Note Sources that reside outside of the project directory are copied into a remote_sources
subdirectory in the project archive. When the archive is unzipped and opened, you must either
specify the location of these files in the remote_sources subdirectory for the unzipped project, or
manually copy the sources into their original location.

RESULTS

CONCLUSIONS
The design of a single cycle 16-Bit non-pipelined RISC processor for its application towards
convolution application has been presented. Novel adder and multiplier structures have been
employed in the RISC architecture. The processor has been designed for executing the
instruction set comprising of 27 instructions in total. It is shown expandable up to 32
instructions, based on the user requirements. The processor design promises its use towards any
signal processing applications.

REFERENCES
[1] Robert S. Plachno, VP of Audio A True Single Cycle RISC Processor without Pipelining.
ESS Design White Paper RISC Embedded Controller.
[2] Youngjoon Shin, Chanho Lee, and Yong Moon, A Low Power 16-Bit RISC Microprocessor
Using ECRL Circuits, ETRI Journal, Volume 26, Number 6, December 2004.
[3] Yasuhiro Takahashi, Toshikazu Sekine, and Michio Yokoyama, Design of a 16-bit Non-
pipelined RISC CPU in a Two Phase Drive Adiabatic Dynamic CMOS Logic, International
Journal of Computer and Electrical Engineering, Vol. 1, No. 1, April 2009 1793-8198.
[4] V. B. Saambhavi and V. S. Kanchana Bhaaskaran, A 16-Bit RISC Microprocessor Using
DCPAL Circuits. International Journal of Advanced Engineering and Technology (IJAET), E-
ISSN-0976-3945, Vol.II, Issue I, January-March 2011, pp. 154-162
[5] J.S. Denker, A Review of Adiabatic Computing, IEEE Symp. Low Power Electronics,
1994, pp. 94-97.
[6] H. Mahmoodi-Meinnand, A. Afzali-Kusha, and M. Nourani, Adiabatic Carry Look-Ahead
Adder with Efficient Power Clock Generator, IEEE Proc., vol. 148, 2001, pp. 229-234.
[7] K. Nishimura, T. Kudo, and H. Amano, Educational 16-bit microprocessor PICO-16, Proc.
3rd Japanese FPGA/PLD design conference and exhibit (Japanese Edition), Tokyo, July 1921,
1995, pp. 589595.
[8] Samiappa Sakthikumaran et al., A Very Fast and Low Power Incrementer and Decrementer
Circuits, International Journal of Computer Communication and Information System (IJCCIS)
Vol2. No.1 2011, pp. 200-203.
[9] Samiappa Sakthikumaran et al., A Very Fast and Low Power Carry Select Adder Circuits,
3rd International Conference on Electronics Computer Technology - ICECT 2011.
[10] Samiappa Sakthikumaran et al., A Novel Low Power and High Speed Wallace Tree
Multiplier for RISC Processor, 3rd International Conference on Electronics Computer
Technology - ICECT 2011.
[11] Keshab K.Parhi, VLSI Digital Signal Processing Systems, Wiley India Edition,1999.

16 - Bit RISC Processor Design For Convolution Application Using Verilog HDL

Transféré par

Informations du document

Description originale:

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

16 - Bit RISC Processor Design For Convolution Application Using Verilog HDL

Transféré par

Droits d'auteur :

Formats disponibles

CHAPTER1

Vous aimerez peut-être aussi