Proceedings I Ber Chip 2012

IBERCHIP 2012
PROCEEDINGS
IBERCHIP 2012 Program Co-Chairs Message

On behalf of the Organizing and Program Committees, the General and Program
Chairs welcome all the participants to the XVIII IBERCHIP Workshop. This year, the
Workshop will continue with the tradition of being held in a beautiful Latin American
venue: Playa del Carmen, Mexico.
This edition of the IBERCHIP Workshop will be held in parallel with the 3rd Latin
American Symposium on Circuits and Systems (LASCAS 2012), since both events
provide a forum for the presentation and discussion of the advances in the analysis,
modeling, characterization, design, implementation and application of microelectronic
circuits and systems in Iberoamerica. For the case of IBERCHIP, the Technical
Program includes five sessions of oral presentations and an open forum for one-to-one
discussions on contributions presented as posters. In this regard, twenty-seven
contributions are included as oral presentations and thirteen as posters.
The topics included in the Technical Program of IBERCHIP cover different
disciplines within the field of electronics, ranging from FPGAs and Signal Processing to
Semiconductor Materials, Circuits and Devices. We were glad of having received
excellent papers from several Latin American countries, Spain and Japan, which allows
us to reach one of the key objectives of this workshop: bringing together researchers,
students, and practicing engineers from industry, universities, and government
laboratories from all over Iberoamerica to address the current and future trends of
microelectronics.
A further objective of the workshop is the encouragement of informal interaction
among the participants during the time between sessions and evening forum. In our
experience, such interactions result in productive collaborations and lead to significant
inter-institutional approaching for the advancement of science and technology in our
countries.
In addition, within the frame of IIBERCHIP and LASCAS, six keynote speeches
will be given by: Prof. Pedro Julin (Universidad Nacional del Sur, Argentina), Dr.
Maynard Falconer (Intel Labs, USA), Dr. Alvaro Maury (SILTERRA, Malaysia), Prof.
Mounir Boukadoum (Universit du Qubec Montral, Canada), Prof. Maciej J.
Ogorzalek (Jagiellonian University, Krakow, Poland), and Dr. Fernando Guarn (IBM
Microelectronics Semiconductor Research, USA). We are glad to have confirmed the
participation of these internationally renowned specialists in the field of Electronics.
Last, but not least, the Steering Committee for the XVIII IBERCHIP Workshop
wishes to thank the invited speakers, authors, and the members of the Technical
Program Committee for their contributions in allowing to prepare this year's exciting
technical program. The sponsorship of INAOE, IEEE and ISTEC is also acknowledged
and greatly appreciated as well as all the unrelenting support provided by INAOE's staff.
Again, be all welcome to Mexico and enjoy the Workshop!
Roberto Murphy
INAOE, Mexico
Reydezel Torres
INAOE, Mexico
Jos Luis Huertas

Universidad de Sevilla, Spain
Table of Contents
Tatsuya Maruyama and Alberto Palacios Pawlovsky. A Study of Methods to
Improve an Immune Algorithm for Searching for the Pair of Inputs that Cause
the Maximum Number of Switching Gates in a Combinational Circuit ------------------------ 1
Laurentiu Acasandrei and Angel Barriga. Implementacin sobre FPGA de un
sistema de deteccin de caras basado en LEON3 ------------------------------------------------ 6
Jess Balosa, Francisco Jos Crespo and Angel Barriga. Sistema empotrado
de reconocimiento de voz sobre FPGA ------------------------------------------------------------ 10
Juan Nez, Mara J. Avedillo and Jos M. Quintana. Bifurcation Diagrams in
MOS-NDR Frequency Divider Circuits-------------------------------------------------------------- 14
Romulo Volpato, Tales Pimenta, Filipe Ramos, Michel Santana and Paulo
Crepaldi. Prediction of Energy Transfer in Implantable Devices ----------------------------- 18
David Cabral, Leonardo Zoccal, Paulo Crepaldi and Tales Pimenta. Schottky
Barrier Diodes SBD in Standard CMOS Process---------------------------------------------- 21
Joel Molina, Rafael Ortega, Wilfrido Calleja, Pedro Rosales, Carlos Zuniga
and Alfonso Torres. Performance of a MOHOS-type Memory by Using
Different Tunneling Oxide Thickness. --------------------------------------------------------------- 26
Elisa Calvo Gallego, Piedad Brox Jimnez and Santiago Snchez-Solano.
Un algoritmo en tiempo real para etiquetado de componentes conectados en
imgenes --------------------------------------------------------------------------------------------------- 30
Alexander Zemliak, Antonio Michua and Tatiana Markina. Behavior of
Lyapunov Function for Different Strategies of the Circuit Optimization
Problem ---------------------------------------------------------------------------------------------------- 36
Adriana Aparecida Dos Santos Izidoro, Eduardo Souza Dias, Fernando A
Cardoso and Tales Cleber Pimenta. Digital Multiplexer of an EEG Signal
Acquisition System -------------------------------------------------------------------------------------- 40
Diego Brengi, Salvador Tropea and Christian Huy. S3Proto-mini: Tarjeta de
Hardware Libre con FPGA de encapsulado BGA------------------------------------------------ 43
Fernando Urbano, Vladimir Trujillo and Jaime Velasco. Implementacin
Hardware de un Multiplicador Serial Basado en Bases Normales sobre
GF(2^163) ------------------------------------------------------------------------------------------------- 47
ii
Carlos Torres-Torres. Electromagnetic blooming by vectorial laser irradiation

in semiconductive nanostructures ------------------------------------------------------------------- 52
Andres Marquez, Wilfrido Moreno and Ramiro Jordan. Experiences Teaching
a Rapid System Prototyping Class ------------------------------------------------------------------ 55
Mauricio Huixtlaca-Quintana, Jose Miguel Rocha-Perez, Alejandro DiazSanchez and Carlos Muiz-Montero. Circuito para la Extraccin de Raz
Fraccional Usando Transistores en Inversin Dbil --------------------------------------------- 60
Adalbery Rodrigues Castro, Lilian Coelho De Freitas, Claudomir Cardoso
and Aldebaro Klautau. Classificao de Modulao em Rdio Cognitivo:
Uma Implementao em FPGA ---------------------------------------------------------------------- 64
Duarte Oliveira, Luiz Ferreira, Lester Faria and No Alles. Asynchronous
Control in CMOS Technology for Synchronous FSMs Partitioned with OneHot Encoding ---------------------------------------------------------------------------------------------- 70
Duarte Oliveira, No Alles, Lester Faria, Diego Bompean, Thiago Curtinhas
and Luiz Ferreira. Synthesis of Asynchronous Digital Systems of High
Performance using Simpler Approach -------------------------------------------------------------- 74
Gracieli Posser, Guilherme Flach, Gustavo Wilke and Ricardo Reis.
Dimensionamento de Portas Lgicas e de Transistores Minimizando Atraso e
rea --------------------------------------------------------------------------------------------------------- 80
Walter Enrique Calienes Bartra, Fernanda Lima Kastensmidt and Ricardo
Reis. Simulacin de Fallas SET en un Oscilador Controlado por Voltaje------------------ 86
Ma. Del Roco De Jess, Isaac Esa Jimnez, Luis Hernndez and Mnico
Linares. Implementacin De Una Celda CNN Digital en FPGA para la
Aplicacin en Deteccin de Bordes ----------------------------------------------------------------- 92
Jones Yudi Mori, Camilo Snchez-Ferreira and Carlos Humberto Llanos.
Real-Time Image Processing based on Neighborhood Operations using
FPGA ------------------------------------------------------------------------------------------------------- 97
Hernando Gonzlez Acevedo. Diseo de controladores difusos utilizando
algoritmos genticos ----------------------------------------------------------------------------------- 103
Anelise Kologeski, Caroline Concatto, Fernanda Kastensmidt and Luigi
Carro. Combinao de Estratgias para Lidar com Falhas Permanentes nas
Interconexes de uma Rede Intra-Chip ----------------------------------------------------------- 109
iii
Thiago Brito Bezerra, Antonio Petraglia and Sebastian Yuri Cavalcante

Catunda. Transistor Level Design of a Switched Capacitor Integrating ADC
with Programmable Input Range and Resolution ----------------------------------------------- 115
Lucas L. Gambarra, Jos A. G. Lima, Hamilton S. Silva, Leonardo V. Batista
and Daniel S. Marques. Otimizao do algoritmo Non-local means utilizando
uma implementao em FPGA ---------------------------------------------------------------------- 121
Marcio Bender Machado, Marcio Cherem Schneider and Alfredo Arnaud. A
Battery Charge Monitor Topology for Implantable Medical Devices ----------------------- 126
Gabriela A Rodrguez, Arturo Sarmiento and Edmundo Gutirrez. A
Schrdinger-Poisson CAD tool for calculation of tunneling through different
gate-oxide potential profiles -------------------------------------------------------------------------- 131
Tania Mara Ferla, Guilherme Flach and Ricardo Reis. Uma Ferramenta
Educacional para o Ensino de Simulated Annealing e Posicionamento ------------------ 135
Georgina Rosas, Roberto Murphy and Wilfrido Moreno. Novel Fabrication
Tecniques for RF-MEMS Devices ------------------------------------------------------------------ 140
Fabio Pereira, Edward David and Fernando Yokota. Algoritmo SHA-3 Keccak
em Smart Card: Implementao e Desempenho ----------------------------------------------- 145
Germn lvarez-Botero, Roberto Murphy and Reydezel Torres. Accurate
Modeling to Characterize the Distributed Substrate Effects in SiGe HBTs --------------- 151
Fabin Zrate-Rincn, Germn lvarez-Botero, Reydezel Torres-Torres and
Roberto Murphy-Arteaga. Extraction Methodology of the Substrate Parasitic
Network of an RF-MOSFET with Separate Substrate DC Connection -------------------- 155
Roberto Castaeda-Sheissa, Hctor Vzquez-Leal, Ahmet Yildirim, Uriel
Filobello-Nio, Arturo Sarmiento-Reyes and Luis Hernndez-Martnez.
Mtodo modificado de hiperesferas aplicado a homotopas biparamtricas:
simulacin de circuitos con transistores bipolares ---------------------------------------------- 160
Gerson Scartezzini and Ricardo Reis. Dissipao de potncia em Redes de
Transistores versus Clulas Padro --------------------------------------------------------------- 164
Adriana Carolina Sanabria Borbon, Jaime Vitola Oyaga, Cesar Pedraza
Bonilla and Martha Johanna Sepulveda. Algoritmo rpido para la bsqueda
de audio por contenido -------------------------------------------------------------------------------- 169
Svetlana Sejas and Reydezel Torres-Torres. Deembedding On-Wafer Sparameters: Is a One-Step Procedure Enough? ------------------------------------------------ 174
iv
Salvador Antonio Arroyo Daz, Alejandro Diaz Sanchez and Apolo Zeus
Escudero Uribe. Architecture for myolectric features extraction by H.O.S. of
four sMES channels ------------------------------------------------------------------------------------ 179
Salvador Antonio Arroyo Daz and Karina Rosas Paleta. Control de Robot
Mvil Basado en FPGA-------------------------------------------------------------------------------- 183
Victor H. Vega G. and Edmundo Gutierrez. Multiport Analysis of TwoDimensional Nanosystems in a Magnetic Field Based on the NEGF
Formalism ------------------------------------------------------------------------------------------------- 188
Proceedings of the XVIII International IBERCHIP Workshop
Playa del Carmen, Mexico, February 29-March 2, 2012
ISSN 977-2177-128009
ISSN 977-2177-128009
ISSN 977-2177-128009
ISSN 977-2177-128009
ISSN 977-2177-128009
Implementacin sobre FPGA de un sistema de

deteccin de caras basado en LEON3
Laurentiu Acasandrei, Angel Barriga
IMSE-CNM-CSIC/Univ. Sevilla
Sevilla, Espaa
{laurentiu,barriga}@imse-cnm.csic.es
sistema recibe la imagen de una cmara y la almacena en la

memoria interna del FPGA. Otras realizaciones estn basadas
en GPUs. En [2] se propone acelerar la deteccin de caras
distribuyendo el clculo entre 4 GPUs. Esta comunicacin
presenta el diseo de un sistema empotrado de deteccin de
caras basado en el procesador LEON3. El sistema de deteccin
implementa el algoritmo de deteccin de objetos de ViolaJones.
AbstractEn esta comunicacin se presenta un sistema

empotrado de deteccin de caras sobre FPGA. Con objeto de
disponer de aceleracin en el proceso de deteccin de caras se
propone un sistema basado en tcnicas de codiseo hardwaresoftware. Se detalla el mecanismo de aceleracin en la deteccin
de caras. Tambin se describe la implementacin de un mdulo
IP que permite la aceleracin hardware as como los resultados
obtenidos.
I.
INTRODUCCIN
II.
La deteccin de caras constituye una tarea importante en

aplicaciones biomtricas y de interaccin hombre-mquina.
Debido a la complejidad de los algoritmos de deteccin de
caras se requiere una gran cantidad de recursos de computacin
y memoria. Por lo tanto las implementaciones software de los
algoritmos de deteccin resultan poco eficientes cuando deben
ejecutarse sobre sistemas SoC (System on Chip) que requieran
alta velocidad de procesado, pocos recursos y bajo consumo de
potencia. En estos casos el uso de tcnicas de codiseo
hardware-software pueden aplicarse para el desarrollo de
aceleradores hardware para las partes que requieren de mayor
consumo computacional en los algoritmos de deteccin.
i( x' , y' )
ii ( x , y )
(1)
x ' x, y ' y
En segundo lugar se usa un clasificador simple y eficiente

construido mediante el algoritmo de aprendizaje AdaBoost [4].
Esto permite seleccionar un nmero reducido de caractersticas
visuales de un conjunto muy alto de caractersticas potenciales.
Ejemplos de las caractersticas tipo Haar usados por el
clasificador se ilustran en la figura 2. Consisten en reas
rectangulares cuyo procesado requiere operaciones aritmticas
simples. En el clculo se aplica un umbral a las sumas y
diferencias de las regiones rectangulares de la imagen (las
regiones oscuras se restan de la regiones blancas).
El principal xito de un algoritmo de deteccin de caras

consiste en conseguir un balance adecuado entre la precisin en
la deteccin (robustez) y una operacin eficiente (coste
computacional). Los detectores heursticos o basados en
conocimiento, tales como las tcnicas basadas en el color de la
piel y las tcnicas basadas en plantillas, utilizan el
conocimiento directo de las caras y a veces dan lugar a mejores
prestaciones. Sin embargo estas tcnicas son poco robustas
respecto a variaciones en las caras y a interferencias del
entorno (como es el caso de los cambios de iluminacin). Por
otro lado los mtodos estadsticos o los basados en aprendizaje,
como las redes neuronales o los detectores SVM, hacen uso de
algoritmos de clasificacin y dan lugar a mejores prestaciones a
la hora de discriminar entre patrones de caras y de no-caras. Sin
embargo estos algoritmos basados en aprendizaje requieren una
alta complejidad de procesado por lo que resultan muy costosos
para aplicaciones en sistemas empotrados.
A
C
B
ii1
ii3
ii2
ii4
Fig. 1. La suma de pxeles del rectangulo D se calcula mediante la siguiente

operacin en la imagen integral: ii4+ii1-(ii2+ii3)
En tercer lugar el clasificador est constituido combinando

clasificadores sencillos en cascada (figura 3). La tcnica ViolaJones se basa en explorar la imagen mediante una ventana en
busca de caractersticas. Dicha ventana se escala con objeto de
Recientemente se han realizado algunas propuestas de

implementaciones hardware de sistemas de deteccin de caras.
As en [1] se presenta un sistema dedicado sobre FPGA. Dicho
ALGORITMO DE DETECCIN DE CARAS DE VIOLA-JONES
El algoritmo de Viola-Jones [3] permite procesar imgenes

de manera muy rpida y consigue una razn de deteccin alta.
La velocidad en el procesado se debe a tres elementos claves de
dicho algoritmo. En primer lugar la imagen se transforma en
una imagen integral lo que permite calcular el rea de
rectngulos en un tiempo constante como se muestra en la
figura 1. Dicha imagen integral consiste en acumular para cada
pixel el valor de los pxeles previos:
ISSN 977-2177-128009
localizar caras de diferentes tamaos. Las primeras etapas

consisten en detectores simples, muy rpidos y de bajo coste.
Esto permite eliminar aquellas ventanas que no contienen caras
y deja pasar las que son candidatas a tener caras. Los siguientes
detectores aumentan en complejidad con objeto de realizar un
anlisis ms detallado en un conjunto menor de las zonas de
inters. Las caras son detectadas en el final de la cascada.
A. Aceleracin del algoritmo de Viola-Jones

La tcnica de deteccin de caras de Viola-Jones ha sido
implementada en la librera OpenCV (Open Source Computer
Vision) [5]. Esta librera contiene funciones para visin
artificial en tiempo real. Teniendo en cuenta que la librera
contiene una aplicacin para deteccin de caras en video, se
decidi utilizar los archivos fuente de la aplicacin como punto
de partida en el desarrollo del sistema de deteccin de vdeo
para sistemas empotrados basados en LEON3.
Fig. 2. Ejemplos de caractersticas de tipo Haar.
Las caractersticas tipo Haar de la distribucin OpenCV han

sido entrenadas para aplicarse a ventanas rectangulares de
20x20 pxeles. Para otras dimensiones de ventanas las
caractersticas tipo Haar deben ser escaladas. El sistema de
deteccin de caras consiste en 22 detectores en cascada que
contienen 2135 caractersticas tipo Haar.
D2
D1
D3
DN
Fig. 3. Arquitectura de detectores en cascada
III.
La aplicacin de deteccin de caras de OpenCV se

implementa en dos modos de operacin diferentes (figura 5):
CODISEO HARDWARE-SOFTWARE
El algoritmo de deteccin de objetos de Viola-Jones

requiere un nmero muy alto de recursos de computacin y un
alto ancho de banda de memoria. Esto constituye un
impedimento a la hora de construir sistemas de deteccin de
objetos en tiempo real. Con objeto de disponer de aceleracin
en el proceso de deteccin de caras se propone un sistema
basado en tcnicas de codiseo hardware-software. La figura 4
muestra un esquema de dicho sistema. Aquellas partes del
algoritmo de deteccin que requieren flexibilidad son
implementadas en software mientras que las partes crticas son
implementadas como acelerador hardware.
Modo 1: escalando la imagen. En este modo la imagen es

escalada mediante interpolacin hasta que se alcanza un
tamao mnimo predefinido. En cada momento del escalado se
necesitan dos imgenes integrales (normal=x y
cuadrtica=x2) para calcular la varianza. La ventana de
bsqueda tiene un tamao fijo en todo el proceso de deteccin.
Modo 2: escalando los clasificadores. En este modo las
imgenes integrales (normal=x y cuadrtica=x2) necesarias
para
calcular
la
normalizacin
de
la
varianza
( 2 x2
W H W H
) se obtienen una sola vez de la imagen
original. Sin embargo las caractersticas tipo Haar de los

clasificadores son escaladas progresivamente hasta que sus
dimensiones son similares a las de la imagen original. La
ventana de bsqueda tiene, por lo tanto, dimensin variable
durante el proceso de deteccin.
En los dos modos los componentes de las caractersticas de
tipo Haar (pesos y dimensiones) son escalados progresivamente
con las dimensiones de la ventana de bsqueda. Esto significa
que para una ventana de bsqueda de dimensin WxH el peso
de cada rectngulo de la caracterstica de tipo Haar son
escalados por el valor WxH. En la ventana de bsqueda el
clculo de la caracterstica de tipo Haar se realiza mediante:
3
HaarFeature Sum Area I Weight Iscaled

I 1
(2)
Area representa la suma de todos los pxeles dentro de un

rectngulo e I=1,2, 3 representa el nmero de rectngulos de la
caracterstica de tipo Haar. Con objeto de determinar el valor
del peso para la suma cada HaarFeatureSum es comparado con
el umbral normalizado de la correspondiente caracterstica de
tipo Haar:
2
StageSum StageWeight
, if HaarFeatureJSum Thres Jnorm
J
StageSum
Weight1
StageSum
Stage
,
if HaarFeatureJSum Thres Jnorm
donde J=[12135] representa el ndice de la caracterstica

de tipo Haar y ThresJnorm= ThresJHaarFeature, ( es la desviacin
estndar de la ventana de bsqueda).
Fig. 4. Sistema hardware-software para deteccin de caras
(3)
ISSN 977-2177-128009
Si no se escalan los pesos de las caractersticas de tipo de

Haar y se realiza el clculo de la varianza mediante la
expresin 2adjusted W H ( x 2 x 2 ) entonces se reduce
tanto el nmero de operaciones aritmticas (divisin,
multiplicacin) como los accesos a memoria. Esto hace que el
algoritmo sea ms rpido [6].
controlado por el componente Imse_stage_evaluator_unit

(figura 6). Esta unidad es el motor del sistema de aceleracin
en la deteccin de caras. Al finalizar el proceso de deteccin el
componente indica si hay caras actualizando el registro de
estado con el resultado obtenido y se genera una interrupcin.
La unidad contiene varias unidades de control con objeto de
soportar las diferentes latencias en los accesos a la memoria del
sistema (externa al mdulo IP). Junto a las diferentes mquinas
de estado que constituyen las unidades de control este
componente tambin contiene mdulos especializados: el
mdulo que calcula el rea de los rectngulos en la imagen
integral usando solo los datos de las esquinas (como se explica
en la figura 1), un modulo pipeline para el escalado y clculo
de la direccin de las caractersticas de tipo Haar, un circuito
que realiza la raz cuadrada entera de nmeros de 64 bits (tiene
una latencia de 16 ciclos de reloj) y un multiplicador de
nmeros con signo de 41x33 bits.
B. Mdulo IP para aceleracin hardware

Tras un anlisis de la aplicacin de deteccin de caras se ha
encontrado que el cuello de botella del software reside en la
gran cantidad de accesos de lectura a memoria, las operaciones
de multiplicacin y raz cuadrada, que son necesarias para la
evaluacin de las ventanas de bsqueda. As para detectar una
cara en una imagen se requiere evaluar cientos de miles de
ventanas de bsqueda y ello representa el mayor consumo del
tiempo por parte de la aplicacin. Por ello se ha decidi
acelerar la evaluacin de las ventanas de bsqueda mediante un
mdulo IP hardware [7].
Fig. 6.Diagrama de bloques del mdulo IP IMSE_OBJECT_DETECTION
El mdulo IP tiene una memoria compartida basada en una

RAM de doble puerto con interfaz AHB. Dicha memoria sirve
para almacenar las caractersticas de tipo Haar. LEON3 puede
usarla como memoria adicional en el modo de operacin libre.
Fig. 5. Algoritmo propuesto para la aceleracin en la deteccin de caras
Para mantener un alto grado de flexibilidad y poder

compartir los recursos hardware con el resto del sistema basado
en LEON3 se ha decidido que el mdulo IP
IMSE_OBJECT_DETECTION tenga dos modos de operacin
[9]: el modo libre en el cual los recursos del IP (multiplicador,
memoria, etc) pueden ser usados por LEON3 para implementar
otra funcionalidad y el modo de deteccin de caras. El mdulo
IP opera con el reloj del sistema (80 MHz).
IV.
El sistema de deteccin de caras propuesto trabaja con

imgenes (en color o gris) con una resolucin inferior a
1024x1024 pxeles. Usa la cascada completa de clasificadores
para caras frontales de OpenCV y puede almacenar en la
memoria interna aproximadamente 2730 clasificadores de tipo
Haar. Podra trabajar con mas clasificadores pero estos se
tendran que almacenar en la memoria de programa para ser
cargados en la memoria interna cuando se requieran.
Como se ha mencionado anteriormente el mdulo IP

implementa el algoritmo de ventanas de bsqueda. La
aplicacin software es la encargada de almacenar las
caractersticas de tipo Haar en la memoria del mdulo.
Tambin se encarga de establecer los valores de los registros de
configuracin (escala, coordenadas x-y, dimensin de la
imagen, etc.). Cuando la aplicacin software realiza el
comando de inicio el procedimiento de deteccin de caras es
RESULTADOS
El sistema ha sido implementado en una FPGA de Xilinx

XC5VLX50. El sistema de deteccin basado en LEON3
completo ocupa 6,435 slices (89% del dispositivo) y contiene
ISSN 977-2177-128009
10,962 flip-flops (38% del dispositivo). El consumo de

potencia estimado (medido con Xpower Analyzer de Xilinx) ha
sido de 603 mW. El componente de mayor consumo es el
controlador de memoria DDR2 (216 mW), la interfaz DVI (136
mW) y los generadores de reloj. El procesador LEON3
consume 32.39 mW. Por su parte el modulo IP
IMSE_OBJECT_DETECTION, que requiere ms flip-flops y
tiene aproximadamente la misma cantidad de lgica que
LEON3, consume 6 veces menos (5.39 mW) que el procesador.
V.
Con objeto de medir las prestaciones del sistema empotrado

de deteccin de caras propuesto se han comparado tres
implementaciones:
AGRADECIMIENTOS
Este trabajo ha sido soportado parcialmente por el proyecto
financiado por la Unin Europea MOBY-DIC Project FP7IST-248858, por el Ministerio de Ciencia y Tecnologa bajo el
proyecto
TEC2008-04920
y
TEC2011-24319
con
cofinanciacin FEDER y por la Junta de Andaluca bajo el
proyecto P08-TIC-03674.
El software OpenCV portado al sistema empotrado.

La versin software acelerada del sistema software portado
al sistema empotrado.
La versin acelerada hardware-software basada en el
mdulo IP.
Las mtricas empleadas para medir las prestaciones han
sido el tiempo de ejecucin y el nmero de ventanas de
bsquedas aplicadas. Para las dos primeras implementaciones
(software) se han considerado los dos modos de deteccin
(modo 1 y 2). En el caso de la implementacin hardwaresoftware solo opera en el modo 2. En cada modo se han
considerado 4 tipos de parmetros (setup 1 al 4) para el tamao
de la ventana de bsqueda mnima (S) y el paso de escala
(step): 1) S=30x30, step=1.2; 2) S=30x30, step=1.1; 3)
S=20x20, step=1.2; 4) S=20x20, step=1.1.
REFERENCIAS
[1]
[2]
[3]
En la figura 7 se muestran los resultados obtenidos. Puede

observarse que la aplicacin software que acelera la deteccin
de caras es 3-4 veces ms rpida que la aplicacin portada
OpenCV en ambos modos. Usando el mdulo IP de aceleracin
hardware se consiguen velocidades 7-9 veces mayores.
[4]
[5]
[6]
[7]
a)
CONCLUSIONES
Se ha descrito el diseo e implementacin de un sistema de

deteccin de caras basado en una estrategia hardware-software
sobre FPGA. Para ello se ha presentado el mecanismo de
aceleracin propuesto y el diseo de un mdulo IP para el
procesador LEON3 que permite realizar la aceleracin
hardware. Los resultado de test del sistema muestran unos
Buenos tiempos de respuesta lo que hace que el sistema sea
adecuado en aplicaciones biomtricas de tiempo real.
J. Cho, S. Mirzaei, J. Oberg, and R. Kastner, Fgpa-based face detection

system using haar classifiers, in FPGA 09: Proceeding of the
ACM/SIGDA international symposium on Field programmable gate
arrays. New York, NY, USA: ACM, 2009, pp. 103112.
D. Hefenbrock, J. Oberg, N.T.N. Thanh, R. Kastner, S.B. Baden,
Accelerating Viola-Jones Face Detection to FPGA-Level using GPUs,
Proc. IEEE Annual International Symposium on Field-Programmable
Custom Computing Machines, 2010.
P. Viola, M.J. Jones, Robust Real-Time Face Detection, International
Journal of Computer Vision, v.57 n.2, pp.137-154, May 2004.
R.E. Schapire, Y. Freund, P. Bartlett, W.S. Lee, Boosting the Margin:
A New Explanation for the Effectiveness of Voting Methods, The
Annals of Statistics, pp. 1651-1686, 1998.
OpenCV: http://sourceforge.net/projects/opencvlibrary/
L. Acasandrei, A. Barriga: Accelerating Viola-Jones Face Detection for
Embedded and SoC Environments, Fifth ACM/IEEE International
Conference on Distributed Smart Cameras (ICDSC2011), Ghent,
Belgium, Aug. 2011.
L. Acasandrei: Embedded Face Detection System Implemented on
LEON3 Microprocessor, Master Thesis, Univ. Seville, 2011.
b)
Fig. 7. Tiempos de deteccin de imagenes VGA, a) modo de escalado de la

imagen, b) modo de escalado de las caractersticas de tipo Haar
ISSN 977-2177-128009
Sistema empotrado de reconocimiento de voz sobre

FPGA
Jess Balosa, Francisco J. Crespo, Angel Barriga
IMSE-CNM-CSIC/Univ. Sevilla
Sevilla, Espaa
barriga@imse-cnm.csic.es
AbstractEn esta comunicacin se presenta un sistema

empotrado sobre FPGA de reconocimiento de voz que aplica el
algoritmo LPC (Linear Predictive Coding). El sistema est basado
en el procesador MicroBlaze de Xilinx. Se describe el desarrollo
del sistema desde la implementacin del controlador del cdec de
audio (tanto el hardware como el desarrollo de los drivers) hasta
la adaptacin del algoritmo LPC a los requerimientos de la
arquitectura hardware.
I.
II.
Una de las tcnicas ms empleadas en la codificacin del

modelo de voz corresponde a la tcnica LPC (Linear Predictive
Coding). Dicho algoritmo permite representar una seal de voz
de 160 muestras en tan slo 13 datos. Esto permite su
aplicacin a la compresin de voz, transmisin digital (voz
sobre IP, PCS, GSM) y, en nuestro caso, reconocimiento.
Los elementos que mejor caracterizan la voz humana son
las frecuencias de los formantes del tracto vocal y las
propiedades de la seal de excitacin generada por las cuerdas
vocales. La tcnica LPC permite representar la voz mediante un
nmero reducido de parmetros en lugar de tener que
almacenar la forma de onda [1]. Esto se debe a que es posible
predecir una seal mediante la siguiente expresin:
INTRODUCCIN
Los sistemas automticos de reconocimiento de voz (ASR,

Automatic Speech Recognition) han experimentado un notable
avance en las ltimas dcadas. Sin embargo la mayora de las
aplicaciones de los sistemas ASR estn basadas en
realizaciones software sobre ordenadores. El desarrollo de
aplicaciones sobre sistemas empotrados hace necesaria la
adaptacin de los sistemas ASR. El objetivo de esta
comunicacin consiste en presentar la puesta a punto de un
entorno de desarrollo para aplicaciones ASR empotradas. Esto
supone la especificacin de la plataforma de desarrollo y la
adaptacin de tcnicas ASR para dicha plataforma. Por lo tanto
en esta comunicacin se va a presentar por un lado la
adaptacin de una plataforma hardware reconfigurable basada
en FPGA que permita configurar diferentes arquitecturas de
sistemas empotrados y, por otro lado, el desarrollo de un
algoritmo de reconocimiento de voz sobre dicha plataforma
hardware.
y ( n ) ai x ( n i )
(1)
i 1
donde y(n) es la muestra predicha en el instante n y ai son

los coeficientes LPC.
La diferencia entre la muestra predicha y la seal se
denomina error de prediccin:
N
e( n ) x ( n ) y ( n ) x ( n ) ai x ( n i )
i 1
Se ha seleccionado como plataforma de desarrollo una

placa basada en FPGA de Xilinx ML505 que, entre otros
componentes, dispone de entradas y salidas de audio, cdec de
audio, dispositivos de comunicacin, memorias (SRAM,
DDR2, CompactFlash), etc. Se ha desarrollado un controlador
de audio como perifrico del procesador MicroBlaze.
(2)
El objetivo del algoritmo LPC es minimizar ese error. Por

lo tanto, con objeto de tener coeficientes LPC que minimicen
ese error es necesario diferenciar la ecuacin 1 respecto a cada
coeficiente e igualar a cero. Esto da lugar a un sistema de N
ecuaciones lineales [2]:
Ra r
El desarrollo del algoritmo de reconocimiento de voz ha

requerido su adaptacin a los requerimientos del sistema
empotrado (recursos reducidos de memoria, procesador sin
unidad de manejo de memoria, etc). Como resultado se dispone
de una plataforma de desarrollo de aplicaciones ASR que
facilita tanto la exploracin de nuevas arquitecturas hardware
como el desarrollo de otros algoritmos empotrados de
reconocimiento de voz.
RECONOCIMIENTO DE VOZ
(3)
donde
R (0)
R(1)
R
R( N 1)
10
R (1)
R (0)
R( N 2)
R ( N 1)
R( N 2)
R (0)
ISSN 977-2177-128009
alineacin de las lecturas de datos en serie, que son

desplazados, lo que provoca que el controlador se encuentre sin
el estado vlido en el registro de entrada o en los datos de
sonido.
R (1)
R (2)
R( N )
El controlador recibe/transmite los datos desde/hacia el

cdec AC97 usando dos flujos de datos series unidireccionales,
SData_In y SData_Out. El dato es separado en tramas
simultneas en las que cada trama est dividida en 12 ranuras
de datos. Adems un reloj Bit_Clk, es generado por el cdec
y transmitido al controlador. Por su parte el controlador enva
una seal Sync para indicar que empieza una trama.
El algoritmo de Levinson-Durbin es un mtodo iterativo

que permite obtener los coeficientes LPC [2].
PLATAFORMA DE DESARROLLO
A. Codec de audio
La placa de desarrollo ML505 dispone de una FPGA de
Xilinx de la familia Virtex-5. Dicha placa contiene entradas y
salidas de audio controladas mediante el cdec AC97 de
Analog Device AD1981B [3]. Entre las caractersticas de este
cdec destacamos las siguientes:
El flujo de datos de entrada se lee desplazando

secuencialmente el flujo de datos de SData_In y escribiendo
dicho dato en el registro apropiado al final de una ranura por
medio de la seal Slot_End generada en el controlador. Esta
seal se activa un ciclo de reloj antes del final de la actual
ranura porque el flujo de salida debe ser cargado antes del
comienzo de la siguiente ranura. El problema de la mala
alineacin de las lecturas de datos en serie se solucion
aadiendo un ciclo de reloj de retraso antes de leer el registro
de desplazamientos de datos.
Salida S/PDIF de 20 bits para el formato de dato con

frecuencia de muestreo de 48 kHz y 44.1 kHz.
Muestreo de frecuencia variable para salida y entrada de
audio.
Cdec estreo full-duplex.
Tasa variable de las muestras de 7040 Hz a 48 kHz con
resolucin de 1Hz.
Micrfono estreo con funcin de preamplificado.
FPGA
MicroBlaze
OPB
Controlador
AC97
Cdec AC97
Por otro lado el mdulo de conversin analgico-digital

recibe los datos de entrada de audio y convierte la seal
analgica en una seal digital. Este mdulo est constituido por
dos convertidores - de 16 bits.
Figura 1. Arquitectura del sistema de captura de sonidos
Las entradas y salidas de audio estn conectadas a 4 puertos

tipo jack. Dispone de dos entradas de micrfono y conexin en
lnea (Line In) y dos conectores de salidas de audio (para
auriculares y conexin en lnea).
C. Desarrollo de los drivers

Los drivers son aplicaciones software que permiten
configurar y controlar el controlador AC97. Estas aplicaciones
facilitan el desarrollo de aplicaciones de usuario. En concreto
los drivers desarrollados contienen la declaracin de las
direcciones de memoria donde se encuentran los registros
internos del controlador AC97, as como las macros para el
manejo dichos registros y los valores constantes ms
significativos. La tabla 1 describe las principales funciones.
B. Mdulo IP del controlador AC97

La versin del mdulo IP del controlador AC97 de Xilinx
empleado es la versin 1.00a. Dicho controlador ha sido
diseado para configurar el cdec AC97, grabar (capturar)
sonido usando el bus OPB y reproducirlo usando el bus FSL.
Puesto que en las actuales versiones de Microblaze el bus OPB
ya est obsoleto se ha adaptado el controlador mediante un
puente PLB-OPB. Tambin se han corregido errores en la
descripcin de dicho componentes que originaban que tanto la
funcin de grabacin de sonido como la lectura del registro de
estado del AC97 no funcionaran.
Estas funciones ofrecen las facilidades para inicializar e

interaccionar con el cdec AC97. El cdec se inicializa con las
funciones XAC97_SoftReset, XAC97_HardReset. Estas
funciones inicializan los registros del cdec a los valores
adecuados para realizar la captura de sonidos. Las funciones
WriteAC97Reg y ReadAC97Reg son las funciones principales
para realizar la grabacin y reproduccin de sonido.
Los problemas con la grabacin y con la lectura del registro

de estado del AC97 se encuentran relacionados entre s. Ambos
son causados por un error en el fichero de descripcin del
hardware del controlador. Dicho error se debe a una mala
plbv46_opb
bridge
AC97Reset n
El cdec dispone de mdulos de conversin analgicodigital (ADC) y digital-analgico (DCA). Los convertidores
ADC y DCA estn basados en convertidores -. El mdulo de
conversin digital-analgico se utiliza para generar la salida de
audio, es decir, para reproducir sonidos. Est compuesto por 4
convertidores -, dos de 16 bits y dos de 20 bits.
PLB
Bit Clk
Sync
SData_Out
SData_In
III.
Junto a estas funciones existen una serie de constantes y

macros que facilitan el uso del cdec y su configuracin.
11
ISSN 977-2177-128009
Adems de las funciones de bajo nivel mostradas en la tabla

1 se han desarrollado funciones de alto nivel que el usuario
puede utilizar para manejar el cdec desde la aplicacin
software. Dichas funciones son:
Seal de audio
Digitalizacin
init_sound: Inicializa el dispositivo. Para ello aplica un

reset hardware, se limpia la FIFO y se hace un reset
software. La tarea de inicializacin requiere establecer
una secuencia de operaciones en un determinado orden
de ejecucin. Tras la inicializacin el cdec informa al
controlador que est operativo. A continuacin se
activan los convertidores ADC y DAC y se establece la
frecuencia de muestreo. Finalmente se configuran los
controles de volumen y del micrfono.
rec_sound: Se capturan los datos de la entrada de

audio y se almacena en memoria.
play_sound: Se habilitan los controles de sonido para

la reproduccin por la salida correspondiente. Se
envan al cdec los datos de reproduccin almacenados
en memoria.
Pre-nfasis
Ventana de Hamming
Autocorrelacin
Clculo de coeficientes LPC
Decisin
Figura 2. Flujo del proceso de identificacin
La etapa de pre-nfasis realiza un filtrado para hacer ms

significativas las frecuencias altas de la seal de voz. Este filtro
de prenfasis obedece a la expresin siguiente:
TABLA 1. DRIVERS DEL CONTROLADOR DEL CDEC AC97
Funcin
Descripcin
s(n) = v(n) - a s(n -1)
WriteAC97Reg
Escribe un valor en la direccin

indicada del cdec AC97
ReadAC97Reg
Lee un dato de la direccin indicada del
cdec AC97
XAC97_ClearFifos Limpia la memoria FIFO del
controlador AC97
XAC97_SoftReset Realiza un reset software del cdec
AC97
XAC97_HardReset Realiza un reset hardware del cdec
AC97
(4)
donde v(n) es la seal de voz de entrada y s(n) la seal

filtrada.
La siguiente etapa corresponde a una segmentacin en
ventanas de Hamming en intervalos de 20 ms. Durante ese
tiempo la seal se considera cuasi estacionaria. A continuacin
se aplica el producto de la seal con la ventana de Hamming
con objeto de suavizar los bordes de dicha ventana:
2n
Wn 0.54 0.46
0n N
(5)
siendo Wn=0 en cualquier otro caso.

IV.
IMPLEMENTACIN DEL ALGORITMO LPC
A continuacin se calcula la autocorrelacin de la seal:
Existen multitud de implementaciones software del

algoritmo LPC tanto en aplicaciones especficas como
formando parte de libreras para el tratamiento de seales de
audio. Estas implementaciones son inadecuadas para un
sistema empotrado debido a diversos aspectos relacionados con
la implementacin del algoritmo. As, por ejemplo, dichas
realizaciones suelen hacer uso de memoria dinmica tanto para
la seal de entrada de audio como para los vectores intermedios
y para la base de datos con la cual se compara el locutor a
identificar. Otro aspecto a tener en cuenta corresponde a la
limitacin de tamao de memoria del sistema empotrado.
rl (m)
[ x l (n) x l (n m)]
m 0,1,..., p
n0
(6)
donde p es el orden del anlisis LPC [4]. Valores tpicos de

p van entre 8 y 16. En el desarrollo que se describe en esta
comunicacin se ha empleado p=13.
El algoritmo de Levinson-Durbin [2] permite calcular en
forma recursiva la solucin de una ecuacin que involucra una
matriz Toeplitz (ecuacin 3).
La seal de audio que se procesa en el algoritmo tanto para

el entrenamiento como para el reconocimiento es adquirida
mediante un micrfono conectado a la placa de desarrollo
basada en FPGA. El cdec AC97 convierte dicha seal en una
seal digital que es capturada por el controlador AC97 y
almacenada en memoria. A continuacin se realiza el
procesado de los datos de acuerdo con el esquema mostrado en
la figura 2.
N 1 m
V.
OPERACIN DEL SISTEMA
El sistema de identificacin tiene dos fases de operacin:

entrenamiento e identificacin.
La etapa de entrenamiento consiste en crear una base de
datos de los coeficientes LPC de individuos que debern ser
identificados por el sistema. El entrenamiento requiere realizar
un conjunto de iteraciones en las que los coeficientes se ajustan
realizando la media aritmtica con los coeficientes asociados.
12
ISSN 977-2177-128009
La figura 3 muestra un ejemplo que ilustra como a medida que

se incrementa el nmero de iteraciones en el entrenamiento se
incrementa la precisin en la identificacin.
VI.
CONCLUSIONES
Se ha descrito un sistema de identificacin de voz basado

en el procesador MicroBlaze de Xilinx. El sistema ha sido
implementado sobre la placa de desarrollo ML505 que dispone
de un FPGA Virtex5 as como entradas y salidas de audio
controladas mediante el cdec AC97 de Analog Device
AD1981B. La puesta a punto del sistema ha requerido adaptar
un controlador del cdec de audio y desarrollar los drivers que
permiten controlar la captura de audio desde la aplicacin
software. La aplicacin software que implementa el algoritmo
LPC se ha adaptado a la arquitectura del procesador. Por lo
tanto se dispone de un sistema empotrado que permite
desarrollar aplicaciones biomtricas de reconocimiento de voz.
AGRADECIMIENTOS
Este trabajo ha sido soportado parcialmente por el proyecto
financiado por la Unin Europea MOBY-DIC Project FP7IST-248858, por el Ministerio de Ciencia y Tecnologa bajo el
proyecto
TEC2008-04920
y
TEC2011-24319
con
cofinanciacin FEDER y por la Junta de Andaluca bajo el
proyecto P08-TIC-03674.
Figura 3. Tasa de reconocimiento en funcin de la iteraciones en el

entrenamiento.
La fase de reconocimiento consiste en determinar la

similitud entre los coeficientes LPC del locutor con los
contenidos en la base de datos de los individuos entrenados. En
este caso se aplica la distancia euclidiana y tomando como
referencia dos umbrales, uno de acierto (Ta) y otro de similitud
(Ts), se decidir si se ha identificado al individuo dentro de la
base de datos. Los umbrales seleccionados han sido los
siguientes:
REFERENCIAS
[1]
Ta, umbral de acierto: Ta < 0.02
[2]
Ts, acierto parcial (similitud): 0.02 < Ts < 0.03
[3]
en otro caso significa que no se ha encontrado un

individuo registrado que corresponda al locutor que se
quiere identificar
[4]
13
Priyabrata Sinha, Speech Processing in Embedded Systems, Springer,

2010
L. Rabiner and B. H. Juang. Fundamentals of Speech Recognition.
Prentice-Hall, Englewood Cli_s, NJ, 1993.
AD1981B: AC97 SoundMAX Codec Datasheet, Analog Devices Inc,
Rev. C, 2005.
Milan G. Mehta: Speech recognition system, Master Thesis, Texas
Tech University, 1996.
ISSN 977-2177-128009
Bifurcation Diagrams in MOS-NDR Frequency

Divider Circuits
Juan Nez, Mara J. Avedillo, and Jos M. Quintana
Instituto de Microelectrnica de Sevilla-Centro Nacional de Microelectrnica
Consejo Superior de Investigaciones Cientificas (IMSE-CNM-CSIC), Univ. de Sevilla (US)
jnunez{avedillo, josem}@imse-cnm.csic.es
Abstract The behavior of a circuit able to implement

frequency division is studied. It is composed of a block
with an IV characteristic exhibiting Negative
Differential Resistance (NDR) built from MOS transistors
plus an inductor and a resistor. Frequency division is
obtained from the period adding sequences which appear
in its bifurcation diagram. The analyzed circuit is an all
MOS version of one previously reported which use
Resonant Tunneling Diodes (RTDs) The results show that
the dividing ratio can be selected by modulating the input
signal frequency, in a similar way to the RTD-based
circuit.
from MOS transistors that can be used to design a frequency divider circuit which operation is based on the
same principle as the RTD.
II. THE MOS-NDR STRUCTURE
Figure 1a shows the structure of the MOS-NDR device we have used, which is based on the circuits described in [4]. It consists of one CMOS inverter, made
up by NMOS1 and PMOS1 and biased by VINV, and one
NMOS transistor (NMOS2) whose gate-to-source voltage is modulated by the output voltage of the CMOS inverter. Figure 1b shows the simulated current-voltage
characteristic of the MOS-NDR device for transistor
parameters
given
by
WPMOS,1 = 0.16m,
WNMOS,1 = 2m, and WNMOS,2 = 5.4m; channel
length for all of them is 0.12m. For this case, the peak
current (Ip) is 2.11mA for a peak voltage (Vp) of 0.25V.
Both, the PDR and the NDR zones of the IV characteristics are obtained through the current of the NMOS2
transistor.
I. INTRODUCTION
Non-autonomous, very simple frequency divider
circuit based on the period-adding bifurcation sequences which appear in an RTD chaos circuit have been reported in the past few years [1], [2], [3], and
experimental results have been obtained, showing that
the phase noise is comparable to that of conventional dividers. These circuits exploit the NDR region in the
RTD IV characteristic to obtain autonomous nonlinear
oscillators and, in general, extremely complex behaviors with applications in diverse fields. Additionally,
when an external periodic excitation signal is used, such
circuits exhibit an increased variety of bifurcations sequences.
iNDR(mA)
2 Ip
PMOS1
VNDR
INDR
The basic block for these circuits is the RTD,

which consists of an emitter and a collector region and
a double tunnel barrier structure. This contains a lowbandgap narrow quantum well, which allows electrons
to travel through only at the resonant energy level. The
thickness of the barrier layers as well as the width of the
well are in the nanometer range. The RTD IV characteristic presents a typical N-shape with two positive differential resistance (PDR) and one NDR zones. Most of
the reported working circuits have been fabricated in
III/V materials and require a technological process usually expensive and difficult to compatibilize with MOS
processes. In this paper we show an NDR device made
Vp
VINV
- 0.2 - 0.1
NMOS2
0.1 0.2 0.3 0.4 0.
vNDR(V)
-1
NMOS1
-2
(b)
(a)
(c)
This work has been partially supported by the Spanish Government
Figure 1: (a) MOS-NDR device, (b) IV characteristic, (c)

Measured IV characteristics.
under projects TEC2007-67245/MIC and P07-TIC-02961.
14
ISSN 977-2177-128009
The peak voltage and current of the IV characteristic in Figure 1b can be modified by setting up properly
the sizes of the transistor. In this way, Ip is increased
with the width of NMOS2. Assuming that all transistors
have the same gate length, the position of Vp is controlled by the ratio between the widths of NMOS1 and
PMOS1. In this way, higher values of Vp are obtained by
decreasing the ratio W NMOS 1 W PMOS 1 . Additionally, we can obtain higher peak current values by increasing VINV [5]. Compared to the RTD device, the IV
characteristic in the MOS-NDR device lacks the second
PDR zone and presents a typical -shape. Figure 1c depicts the I-V characteristic, measured by a HP-4145A
parameters analyzer, of a MOS-NDR which we have
designed and fabricated in a standard commercial
0.13m CMOS process.
pendence of time t in the expression for excitation signal E, and periodic with period T = 1 f . If we choose
EM and IM as scale parameters with physical dimensions of voltage and current, respectively, and rescale
v C = xE M , i L = yI M , t = LC , E DC = E M ,
E A = E M , and f = LC , then variables x, y, ,
and will be dimensionless. Redefining now as t, the
following set of normalized equations are obtained:
1
x = --- y g x
y = + sin 2t x
where = E M I M C L and g come from the

normalization of G . The dynamics of Eq. (2) now
depends on parameters , and . As the circuit is
periodically driven, planes t = 0 and t = 1 can be
identified, and the 3D Euclidean phase space x y t
can be transformed into the cylindrical space
2
1
x y R S [6] by defining the new variable
= 2t which transforms the non-autonomous system of Eq. (2) into an autonomous one, now with three
equations, by adding = 2 . In this space, time

1
turns around the unit circle S , x is represented in the
horizontal axis of the cylinder and y in the vertical one.
III. ANALYSIS OF THE FREQUENCY DIVIDER.

The circuit topology used for the frequency divider
is shown in Figure 2. It is composed of an inductor L in
series with the parallel of the MOS-NDR device (the
device) and one capacitor C. It is driven by an external
periodic excitation signal E, with a DC bias EDC, an amand
a
frequency
f
plitude
EA,
( E = E DC + E A sin 2ft ).
The dynamics of Eq. (2) has been extensively studied for different parameter values. By fixing
and , the frequency of the external periodic signal has been used as the control parameter, and numerical integration using an adaptive-step Runge-Kutta
algorithm has been carried out to build one-parameter
bifurcation diagrams in the x plane. These diagrams plot the normalized output voltage x sampled at
a fixed phase of the normalized input signal for each
normalized frequency , and solutions during the first
60 periods of the input signal have been discarded to
avoid transient behaviour. Figure 3 shows a typical bifurcation diagram for the circuit computed when the
normalized frequency is swept in the range (0.01,1.5)
with the following values for the remaining parameters:
= 2 , = 0.3 , and = 0.4 . In circuit parameter values, this could correspond, among other possible set of
values, to C = 4pF, L = 1H, EM = 1V, IM = 1mA; an
external periodic signal with EDC = 0.3V, EA = 0.4V,
and a frequency between 10MHz and 1.5GHz. The
MOS-NDR device is biased in the negative resistance
region and a swing is applied. In this bifurcation diagram two kinds of regions are identified: those with a
continuum of points for a given value of , where the
behaviour is quasi-periodic or chaotic, and regions with
a finite number of points, where a periodic solution with
a period which is a multiple of the driving signal is
found.
By applying Kirchoffs laws to this circuit, the

equations for the voltage vC across the capacitor C and
the current iL through the inductor L are given by the
following set of two first-order coupled non-autonomous differential equations:
dv
11
--------C- = --i i
= ---- i L G v C
C L RTD
C
dt
di L
1 ----- E vC
dt- = -L
(1)
where G is the mathematical representation of the

driving point characteristic of the MOS-NDR device.
The system is non-autonomous due to the explicit deL
iL
vC
iNDR
vL
C
iC
MOS-NDR
device
Figure 2: Frequency divider circuit using a MOS-NDR

device (-device).
(2)
15
ISSN 977-2177-128009
0.45 0.53 (period-6 to period-7), we find a frequency locked window of period-13 (=6+7) in the region 0.508 0.514 . Farey sequences also appear
when finer regions are considered, as is also shown in
Figure 6, where a period as high as 19 (6+13) appears in
the region 0.503 0.505 . Finally, we have also
performed some simulations to confirm our previous
theoretical study. Figure 7 shows the obtained results
for the circuit parameters in Section III (C = 4pF,
L = 1H, EM = 1V, IM = 1mA, EDC = 0.3V, and
EA = 0.4V). Figure 7a shows a division by 3 of a external periodic signal of frequency 125MHz ( = 0.25),
and Figure 7b the division by 5 of a signal which frequency is 10MHz ( = 0.38).
Figure 3: Bifurcation diagram for circuit in Figure 2
( = 2 , = 0.3 , and = 0.4 )
IV. CONCLUSIONS
From Figure 3, the solution is a periodic orbit with

a period equal to the one of the driving signal for up to
0.12 . From this until 0.21 , the period of the
solution is the original one divided by 2. Then, until
0.28 , the division is by 3. For 0.30 , frequency-locking states of variable period which are separated
by quasi-periodicity regions are obtained. Additionally,
it is interesting to note that the number of branches in
the periodic windows increases when the frequency increases: this behaviour is known as period-adding bifurcations [7]. In these frequency-locking regions we can
obtain frequency division from the driving signal, with
the division factor being the number of points in such a
regions. Thus, between 0.30 and 0.35 the frequency-locking state has period-4, for 0.37 0.42
has period-5, period-6 for 0.44 0.50 , etc. Over
1 , the bifurcation diagram is mainly formed by regions of quasi-periodicity. Examples of numerical results for = 0.32 (period 4) and = 0.64 (quasiperiodic behaviour) are shown in Figure 4 and Figure 5,
respectively. In particular, a period-4 trajectory in the
cylindrical space is shown in Figure 4a, while Figure 4b
plots this trajectory in plane x y , and the corresponding Poincar map with four points is shown in
Figure 4c. The quasi-periodic trajectory motion for
= 0.64 is on the surface of the torus shown in Figure
5a (cylindrical space). Figure 5b shows such a trajectory in plane x y . Figure 5c shows the corresponding
Poincar section, which is composed of an infinite set of
points which belong to an invariant closed curve, which
is typical in a quasi-periodic behaviour.
We have shown an all-MOS device with an IV

characteristic exhibiting NDR which can be used to
build a frequency divider. One difference with previously reported RTD-based circuits is the possibility of
using a simple MOS process for its implementation. In
an analog way to these previous circuits, the frequency
division is also obtained from the period adding sequences which appear in its bifurcation diagram. Additionally, we have shown the Farey sequences which
appear in the dynamic behavior of the circuit.
References
[1] Y. Kawano, Y. Ohno, S. Kishimoto, K. Maezawa, and T.
Mizutani, 50 GHz frequency divider using resonant tunnelling chaos circuit, IEE Electronics Letters, Vol. 38,
no. 7, pp. 305-306, 2002.
[2] Y. Kawano, Y. Ohno, S. Kishimoto, K. Maezawa, T. Mizutani, and K. Sano, 88GHz dynamic 2:1 frequency divider using resonant tunnelling chaos circuit, IEE Electronics Letters, Vol. 39, no. 21, pp. 1546-1548, 2003.
[3] J.M. Quintana and M.J. Avedillo, Analysis of frequency divider RTD circuits, IEEE Trans. on Circuits and Systems I:
Regular Papers, vol. 52, no. 10, 2005, pp. 2234-2247.
[4] C. Wu, and K.-N. Lai, Integrated -type differential negative resistance MOSFET device, IEEE J. Solid-State Circuits, Vol. SC-14, pp. 1094-1101, Dec. 1979.
[5] W.-L. Guo, CMOS-NDR transistor, 9th International
Conference on Solid-State and Integrated-Circuit Technology, (ICSICT), pp. 92-95, Oct. 2008.
[6] T. S. Parker and L. O. Chua, Practical Numerical Algorithms for Chaotic Systems. New York: Springer-Verlag,
1989.
[7] L.-Q. Pei, F. Guo, S.-X. Wu, and L. Chua, Experimental
confirmation of the period-adding route to chaos in a nonlinear circuit, IEEE Trans. on Circuits and Systems, Vol.
33, no. 4, pp. 438-442, April 1986.
[8] K. Kaneko, Collapse of Tori and Genesis of Chaos in Dissipative Systems. Singapore: World Scientific, 1986.
In addition to the period-adding sequences, Figure

3 also shows that periods of some of the windows satisfy the Farey sequence [8]. Between the period- and period- windows, there exists a period-(+) window.
Some examples of such Farey sequences in the bifurcation diagram of Figure 3 are shown in Figure 6: when
the input normalized frequency increases in the range
16
ISSN 977-2177-128009
Figure 4: Trajectories from the numerical results for diagram for = 0.32 , ( = 2 , = 0.3 , and = 0.4 ).(a) in the
cylindrical space, (b) in the plane x y , and (c) Poincar map.
Figure 5: Trajectories from the numerical results for diagram for = 0.64 , ( = 2 , = 0.3 , and = 0.4 ).(a) in the cylindrical space, (b) in the plane x y , and (c) Poincar map.
vC(V)
19
13
(a)
t (s)
vC(V)
Figure 6: Bifurcation diagrams illustrating Farey sequences

for the circuit in Figure 2.
(b)
t (s)
Figure 7: Simulation results giving a frequency

division of (a) 3, and (b) 5.
17
ISSN 977-2177-128009
Prediction of Energy Transfer in Implantable Devices

Tales Pimenta, Filipe Ramos, Michel Santana and Paulo Crepaldi
UNIFEI
Itajuba, Brazil
Romulo Volpato
INATEL
Santa Rita do Sapucai, Brazil
II.
AbstractThis work presents an approach for the evaluation of

energy transfer in implantable device for medical applications.
By using the inductive coupling it is possible to predict the
voltage in the powerless tag inside the human body. The
measurement result indicates that it is possible calculated the
influence of the meat in magnetic field between the reader and
the tag.
ENERGY TRANSFER MODELING USING THE CLASSIC

MAGNETIC THEORY
The energy is transferred from the reader to the tag by

magnetic coupling. In free space, the magnetic field generated
by inductor L1 induces a current in inductor L2. From the
classic magnetic theory, the equivalent circuit is show in
Figure 2.
Keywords-RFID, inductive coupling, RF, Implantable device
I.
INTRODUCTION
The demand for Radio-Frequency Identification RFID

technology has been constantly increasing in various areas of
knowledge and applications, including human health.
Biomedical data, such as blood levels of cholesterol, urea,
oxygen and sugar (glucose), among others, could be measured
by using tag sensors inside the human body. The data
measurements are sent out by RFID technology and the power
is send in also by the RF link. This process uses the magnetic
field link between two inductors, one in the reader and the
other in the implantable device (tag). Here, the main challenge
is to find the best intensity of magnetic field by the reader for
the perfect tag operation. According to [1] [2], variations in
distance and misalignment between the inductors results in
variations of the coupling factor, which in turn causes a
voltage variation on the tag that. It could promote improper or
even lack of operation.
Fig. 2. Equivalent circuit to inductive reader-tag coupling.
The analysis can be greatly simplified by using the series

equivalent circuit for the tag load Rtag. Thus the quality factor
of the capacitor C2 can be expressed in terms of capacitive and
resistive parameters, as indicated in Figure 3.
This work presents an approach to find the voltage in

implantable tag by using the classic magnetic theory and
measurements conducted in actual experiments. Figure 1
presents the simplified coupling between reader and tag [2].
Fig. 3. Capacitive equivalent circuits.
In order to find equivalence between the series and parallel

representation, the real and the imaginary parts of each other
must be the same. Therefore, from basic circuit theory:
y p G p jC p
Rs
Fig. 1. Simplified coupling between reader and tag.
18
(1)
Gp
2
G p (C p ) 2
ISSN 977-2177-128009
(2)
Cs
G p 2 (C p ) 2
III. COMPARISON BETWEEN SIMULATION AND

MEASUREMENTS UNDER FREE SPACE CONDITIONS
(3)
2C p
We have used the measurement set up shown in Figure 5

where L1=1.77 H and L2=5.4 H. Note that it is
implemented axial alignment. This orientation was chosen due
to the higher induced magnetic flow, considering the
relationship between the length and diameter of the inductors.
The series resistances obtained from the network analyzer are
R1=1.14 and R2=2.2 at 13.56 MHz.
where:
Gp
1
Rp
(4)
That approach allows taking the tag load as Rp, as show by

expression (2), since the quality factor of the capacitor is very
high. Observe also that the capacitor value in equation (3) is
frequency dependent. Based on this approach, the equivalent
circuit is presented in Figure 4.
Fig. 4. Equivalent circuit modified to inductive reader-tag coupling.
The circuit presented in Fig. 4 can be equated as:
Rt R2 R p
Fig. 5. Free space set up measurement.
(5)
The measurement and simulation were conducted
considering the voltage of generation as 1 V, the parallel load
of tag as 1k and the generator resistance as 50 .
Thus:
V1 I1 ( Rs1 jL1
1
) I 2 jM
jC1
(6)
The voltage at the tag is approximately 1V, but the

resonance frequency varies along with the mutual inductance.
Therefore, for each mutual inductance, capacitors C1 and C2
can be adjusted for the maximum voltage at the tag. Table 1
shows the comparison between simulated and measured
results. The adjustments of C1 and C2 are very difficult,
therefore the voltage at the tag will vary along the
measurements. The amount of energy transmitted from the
reader to the tag can be easily obtained, since power is the
squared voltage divided by the tag load (1k).
where M is the mutual inductance between L1 and L2.
0 I1 jM I 2 ( Rs 2 ` jL2
1
)
jC2
(7)
Equations (6) and (7) yield:

V2
V1M
1
1
[(M ) 2 ( Rs 2 ' jL2
)( Rs1 jL1
)]C2
jC2
jC1
(8)
TABLE 1 COMPARISON RESULT FOR SIMULATION AND MEASUREMENT
Distance
25mm
20mm
15mm
10mm
5mm
It can be observed from equation (8) that it is possible to

obtain the voltage tag, in free space conditions. Nevertheless,
in implantable applications, the tag is placed inside the human
body. Usually a device used to measure blood glucose level is
placed in the abdomen (belly) whose tissues offer an electrical
behavior similar of pig tissue. Therefore we have used pork
meat in our experiments.
Measurement
769mV
943mV
1038mV
1026mV
1007mV
Simulation
1001mV
1061mV
1063mV
1055mV
1052mV
M
0.287H
0.370 H
0.532 H
0.692 H
0.917 H
Observe that the simulated and measured values are very

close when the reader and the tag are between 5 to 15 mm
apart.
19
ISSN 977-2177-128009
IV.
As can be observed, there is a small difference on the

voltage at the reader under the presence of pig tissue. Table 3
presents the tag voltage error at free space and with pig tissue.
The error is usually smaller than 5% and reaches 7.3% only at
20mm.
MEASUREMENTS COMPARISON IN FREE SPACE AND

WITH T ISSUE
It was observed the tag voltage difference between

measured and simulated results for free space conditions. The
results are now compared to new measurements considering
the presence of pig tissue between the tag and reader. Figure 6
and 7 show the measured set up.
TABLE 3 MAXIMUM ERROR
Distance
10mm
15mm
20mm
25mm
Error
3.26%
4.39%
7.3%
3.61%
This result suggests that the tissue changes the magnetic

permeability relative to free space conditions.
V.
CONCLUSIONS
The presence of pig tissue between the reader and the tag
causes a voltage tag reduction and a consequently reduction of
power. This power reduction is, in the worst case of 14%.
Therefore, the voltage (or power) drop is known, it should be
taken into consideration during RFID systems design.
It also can be inferred the small influence of any tissue at
13.57 Mhz. It is very important to know the maximum voltage
in the tag, mainly for implantable devices, so that the designer
can take the proper precautions.
Fig. 6 Measurement set up between tag and reader for free space.
ACKNOWLEDGMENT
The authors acknowledge CAPES, CNPq and FAPEMIG
for their financial support.
REFERENCES
[1]
[2]
[3]
[4]
Fig. 7. Measurement set up with tissue between tag and reader.
[5]
Tables 2 shows the tag voltage values at free space and

with pig tissue. The tissue is about 10mm thick and long
enough to cover the tag completely, as shown in Figure 6.
[6]
[7]
TABLE 2 COMPARISON OF FREE SPACE AND TISSUE CONDITIONS
Distance
10mm
15mm
20mm
25mm
Free space
1932mV
1869mV
1671mV
1247mV
[8]
Pork tissue
1869mV
1787mV
1549mV
1202mV
[9]
20
M. Kiani and M. Ghovanloo, An RFID based Closed Loop Wireless

Power Transmission System for Biomedical Application, IEEE trans
on circuits and system II- Express Brief, Vol 57, NO 4, April 2010.
K. Finkenzeller, RFID Handbook Fundamental and Application in
Contactless Smart Cards and Identifications, 2010, John Wiley &
Sons.
F. Ramos, M. Santana, R. M. Volpato, R. L. Moreno and T. C.
Pimenta, Front End of an Implantable Medical Device, Wireless
Systems International Meeting, May 26-28, 2010, Campina Grande ,
Brazil
Frederick Emmons Terman and Joseph Mayo Pettit, Electronic
Measurements, McGraw-Hill 1952.
K. Kenneth and T. Donald, Communication Circuits Analysis and
Design, 1971.
W. E. Everitt e G. E. Anner, Communication Engineering, McGrawHill, 1956.
Simon Haykin and Barry Van Veen, Signals and Systems, John
Wiley & Sons, 1999.
Johnson I. Agbinya,Nitthya Selvaraj, Arthur Ollett, Stephane Ibos,
Yasmin Ooi-Sanchez, Mark Brennan and Zenon Chaczko, Size and
Characteristics of the Cone of Silence in Near Field Magnetic
Communications, MILCIS, Canberra, 10-12 November 2009.
Andr Kurs, Aristeidis Karalis, Robert Moffatt, J. D. Joannopoulos,
Peter Fisher and Marin Soljacic, Wireless Power Transfer via Strongly
Couplet Magnetic Resonances, Sience Vol 317, july 2007
ISSN 977-2177-128009
Schottky Barrier Diodes (SBD)

in Standard CMOS Process
David Cabral, Leonardo Zoccal, Paulo Crepaldi and Tales Pimenta
UNIFEI
Itajuba, Brazil
location. The temperature variation over the time indicates the
local heating factor. Both relations are given by [2]:
Abstract This paper discusses the implementation of a

Schottky Barrier Diode SBD in standard CMOS process as a
way to optimize the overall performance of a passive RadioFrequency Identification RFID based biomedical implant. For
this kind of systems, it is essential that the transmitted power is
kept within acceptable levels so there is no damage in human
tissues.
SAR
Keywords-Schottky Barrier Diode, RFID, Implantable device
I.
dT SAR
dt
c
INTRODUCTION
Cost, size, lifetime and safety are important parameters

when designing a RFID based biomedical system, especially if
the receiver is an implanted device.
Base Unit
Skin
DC
Sensors
+
Aquisition
+
Signal
Conditioning
+
Processing
Transponder (tag)
Implanted Device
Figure 1. Typical RFID system.
The energy is transmitted by a pair of coupled coils. For

patient safety it is important to keep the induced
electromagnetic fields at lower levels, to avoid tissues
damages by raising the local temperature. The Specific
Absorption Rate (SAR) represents direct measurements of the
electric field (indirect measurement of the magnetic field) and
induced current density over the human tissue at the implant
(2)
Transponder front-end circuits include a rectifier to make

the AC-DC conversion to provide unregulated power supply
to the tag. In CMOS technology, NMOS and PMOS (more
often than one of each type) transistors are used in different
topologies to implement the rectifier circuit. These devices,
however, have the disadvantage of presenting a turn on
threshold voltage (Vth) that may require larger induced
voltage in the receiver coil. Although the CMOS technology
has been minimizing the transistors geometries, the Vth
voltage does not scale down at the same rate. The use of a
Schottky Barrier Diode SBD is an alternative way to design
the rectifier circuit in order to improve its efficiency. A more
efficient rectifier will reduce the voltage drop between the tag
input and the rest of the circuitry, thus reducing the power
demand from the transmitter. For low currents levels, typically
found in this kind of application, the SBD voltage drop can be
lower than a CMOS Vth voltage.
Antennas
Information
Besides cost and size reduction, microelectronics also

allows the implementation of the transponder circuits into a
single die. The lowest prices can be achieved by using low
cost standard CMOS technology.
RF to DC
Rectification
RF
Cs
(1)
where , and c represent the conductivity, the human tissue

mass density and specific heat capacity, respectively, at the
implant location. E is the incident electric field intensity
(RMS). Based on equations (1) and (2), a safe value for the
power transferred by the RF link is 10mW/cm2 [2].
For size reduction and extended lifetime the receiver can

be implemented without batteries, thus characterizing a
passive tag. Therefore, it is necessary an RF link to make the
communication path between the base unit and the
transponder. Through the link and a proper protocol,
information can be exchanged and energy can be delivered to
the implant for its activation. Fig. 1 shows a typical RFID
topology for biomedical applications [1].
Energy + Information
Kg
The SBD is not readily implemented on a standard CMOS

technology but after a few adjustments in the masking process,
it can be implemented. In this work, a mask sequence is
21
ISSN 977-2177-128009
as low as 280mV. Considering eq. (3), the built in voltage Vbi

can be further reduced by controlling the doping level ND.
presented to implement SBDs in CMOS processes and a

simple model for this device is discussed.
II.
In practice, however, there will be imperfections in the

boundary between metal-semiconductor contact (like surface
states) that contributes to raise that voltage.
SCHOTTKY BARRIER DIODE
The metal-semiconductor contact has some features that

allow its use in high frequency applications and systems that
must operate at low voltage levels. These features are,
basically, the low level of minority charge accumulation
during commutation, which leads to high switching speeds
and the low voltage drop between its terminals [3].
It must be necessary to evaluate the doping concentration

ND in order to use eq. (3) to obtain Vbi. For this procedure a CV curve for a Metal-Insulator-Semiconductor (MIS capacitor))
structure is used. Fig 3 and Fig. 4 show a typical MIS
capacitor (with N type semiconductor) and a C-V curve,
respectively.
Low turn on voltage, fast recovery time and low junction

capacitance are advantages that SBD offer over other types of
PN diodes. These are the main reasons that SBD are so
popular in RF applications. As a consequence of high
switching speed, ability to operate at high frequencies and low
turn on voltage SBD are applied to RF mixer and detector
diodes [4].
Metal
Insulator
SDB are also used in rectifiers for high and low voltage
because of their high current density and low voltage drop.
For solar cell applications, any voltage drop will result in a
reduction in efficiency and therefore a low voltage drop diode
is essential [5].
N type Semiconductor
Figure 3. MIS capacitor.
Fig. 2 shows the energy band diagram for this kind of

contact at thermal equilibrium. In this work, since we have
used TSMC process, the metal is aluminium and the
semiconductor is N-type, moderately doped, obtained by N
Well diffusion.
qBN qM
High frequency
qVbi
EC
Metal
Low frequency
Inversion
EF
Depletion
Figure 4. MIS capacitor C-V curve.
The C-V curve can be obtained by simulating a PMOS

transistor with TSMC model parameters as indicated in Fig. 5.
Semiconductor
EV
M1
Figure 2. SBD finger structure.
As can be seen, there is a build in contact potential Vbi,

expressed by:
Vbi BN
kT N C
ln
q N D
Accumulation
R1
W=100m
L=100m
R2
(3)
where kT/q is the thermal voltage (25.9mV at T=300K), NC is

the effective state density (constant for a given temperature) in
the conduction band, ND is the donor doping level and BN is
maximum height of the potential barrier[6].
Fig. 5. PMOS transistor used to obtain C-V behavior.
In this simulation, resistors R1 and R2 are very large in

order to restrict the action of PMOS in the channel area. The
DC source is swept from negative to positive voltage values,
to validate all regions of operation of the MOS capacitor
formed by the gate, insulator and channel. The AC source can
Eq. (3) shows that the built in voltage can be reduced if a

moderate to low doping semiconductor is used. Considering
ideal values for metal work function (M4.28V) and
semiconductor electron affinity (4V), the barrier BN can be
22
ISSN 977-2177-128009
be set into low or high frequencies; therefore, it is possible to

obtain the C-V curve behaviour as illustrated in Fig. 4.
P+
N+
P+
The capacitor area (A) is determined by the geometric

aspects W and L.
The following equations are used to extract the doping
concentration ND in a recursive way:
Nwell
Substrate P
Figure 7. SBD cross section view.
C
A
W D max 1 0 S
C min
C max
4 S F
2
qW D
kT N D
ln
q ni
III.
(4)
In order to implement the metal-semiconductor junction it
is necessary the following mask sequence. First, an NWELL
(layer #42, TSMC) and an ACTIVE (#43) are used to delimit
the area that will contain the multi-finger SBD and guard
ring. Then an NPLUS layer (#45) is used to indicate N+
regions that will be the SBD ohmic contacts (cathode). The
next step is the CONTATCT layer (#48) that will be filled
with METAL 1.
(5)
(6)
It is necessary to define the Schottky and guard ring

contacts, and the contacts must reach the N well and the
active regions directly. Others contacts must coincide with
the previous N+ diffusions to effectively make the ohmic
contacts. With the METAL1 layer (#49), the SBD is
complete. Additionally, a VIA layer (#50) is used to provide
interconnections between the metal levels. Finally, the
METAL 2 layer (# 51) is applied to metal 2 to provide access
to the ring guard. Fig. 8 shows a 3D view of layers and Fig 9
a 2D view of the basic SBD structure without PADS and
interconnections (MASK #42 to #49).
MASK FLOW
Ideally, the SBD would be implemented by a metal layer

deposition over a low doping N or P type semiconductor well.
In order to reduce the series resistance to improve the
efficiency, the SBD, actually, is an arrangement of fingers as
can be seen in Fig. 6 and Fig. 7.
In this design we have used 0.5m CMOS TSMC process
through MOSIS educational program.
The whole SBD structure is surrounded by a guard ring,
which is used basically to avoid latch-up and to separate the
SBD from the other tag circuits that present different analog
and/or digital functions.
Figure 6. SBD finger structure.
Figure 8. 3D layers view for the complete SBD structure.
23
ISSN 977-2177-128009
IV.
P type wafer
SBD SMALL-SIGNAL MODEL
The most important difference between Schottky and PN

structures is the lack junction capacitance that eliminates the
electrons recovery time.
From the previous explanations, it is now possible to
construct the equivalent small signal circuit model of the
SBD, as shown in the Fig. 10.
Silicon Dioxide
CGEOM
CT
LS
N type silicon
(MASK #42)
RS
RD
Figure 10. Small-signal SBD equivalent circuit.
In this equivalent circuit there is a capacitance (CGEOM)

that arises from the device geometry. It can be calculated
from expression (7).
CGEOM
P+ diffusion
(MASK #43)
A s
L
(7)
where L is the length of the device of cross-sectional area A,

and s is the silicon electric permittivity (11.7 0).
There is a Rd resistance and a CT capacitance both from
the depletion region, respectively equated by:
N+ diffusion
(MASK #45)
Rd
dV
dI
(4)
and
Via oppening
(MASK #48)
1/ 2
qN D s
CT A
2Vbi V
(5)
where q is the electric charge and the V is the voltage applied

across the SBD.
Metal deposition
(MASK #49)
Finishing the model, the resistance (Rs) in series with the

components represents the depletion region, corresponding to
the contacts resistance and the neutral region of the
semiconductor. A parasitic impedance (Ls) influences the
device operating at high frequencies [7].
Figure 9. 2D layers view for the basic SBD structure.
It is important to observe that it is not necessary to modify the

CMOS process in terms of doping levels, metal type or other
process parameters.
V.
MEASURED RESULTS
The SBD was diffused in three different versions; with 5

fingers, 10 fingers and 15 fingers. Table I shows the
calculated small-signal parameters from a set of DC
24
ISSN 977-2177-128009
measurements. The presented values are an average on 40

samples. These parameters will be used in future work to
optimize the final geometry of the SBD.
The main goal at this point is to validate the fabrication

process. As can be seen it is possible to fabricate this kind of
device in a standard, and it can be optimized through its
geometric aspects.
Table I. SBD small-Signal parameters.

CGEOM [pF]
Rd []
CT [pF]
5 fingers
160
2275
0.4
10 fingers
100
1039
0.8
15 fingers
69
79
1.2
VI.
CONCLUSIONS
In this work, a diffusion of Schottky Barrier Diode in

CMOS standard process is discussed. This component is very
attractive to be used in applications such as biomedical tags,
where the cost, efficiency and power consumption are
important boundary conditions to avoid patients tissues
damage as well as to keep the transmitter power at minimum.
Figure 11 shows the IV characteristic for the three diffused

SBD. The curves represent
average value for 40 samples.
SBD an
IV Characteristic
4,5E-02
ACKNOWLEDGMENT
4,0E-02
The authors acknowledge CAPES, CNPq, FAPEMIG and

MOSIS for their financial support.
3,5E-02
Average15
REFERENCES
Avarage10
3,0E-02
Current (A)
Avarage5
[1]
2,5E-02
2,0E-02
[2]
1,5E-02
1,0E-02
[3]
5,0E-03
[4]
Bias (V)
0,0E+00
-1,0
0,0
1,0
2,0
3,0
4,0
[5]
1,0E-03
[6]
[7]
8,0E-04
Brandl, M. et all, Low-Cost Wireless Transponder System for

Industrial and Biomedical Applications Information, Communications
and Signal Processing, 2005 Fifth International Conference on, 06-09
Dec. 2005 Page(s): 1444-1447
Pradier, A. et all, Rigorous Evaluation of Specific Absorption Rate
(SAR) Induced in a Multilayer Biological Structure Wireless
Technology, 2005. The European Conference on, 3-4 Oct. 2005
Page(s): 197-200
Janam Ku and Seonghearn Lee, Novel SPICE Macro Modeling for an
Integrated Si Schottky Barrier Diode, EGAAS 2005.
Rainee N. Simons and Philip G. Neudeck, Intermodulation-Distortion
Performance of SiliconCarbide Schottky-Barrier RF Mixer Diodes,
Microwave Theory and Techniques: Jorunals, IEEE Transactions
Volume 51, ISSue 2, February 2003 Page(s): 669 - 672.
S. V. Averin, Fast-Response Photo-detectors with a Large Active
Area, Based on Schottky-Barrier Semiconductor Structure",
Kvantovaya Electronika, vol. 23(3), 284, (1996).
Streetman, Solid State Eletronic Devices, Vol.2, pp. 185-190 (1980).
Pascal Philippe, Walid El-Kamali and Vlad Pauker, Physical
Equivalent Circuit Model for Planar Schottky Varactor Diode,
Microwave Theory and Techniques: Jorunals, IEEE Transactions
Volume 36, ISSue 2, August 2002 Page(s): 250 - 255.
6,0E-04
4,0E-04
2,0E-04
0,0E+00
0,10
0,20
0,30
0,40
0,50
Figure 11. SBD IV characteristic.
Fig. 11 also shows that the SBD barrier is approximately

250mV, thus is in accordance with eq. (3).
25
ISSN 977-2177-128009
Performance of a MOHOS-type Memory by

Using Different Tunneling Oxide Thickness
Joel Molina, Rafael Ortega, Wilfrido Calleja, Pedro Rosales, Carlos Zuniga and Alfonso Torres.
National Institute of Astrophysics, Optics and Electronics (INAOE), Electronics Department.
Luis Enrique Erro #1, C.P. 72000, Tonantzintla, Puebla. Mxico.
jmolina@inaoep.mx
II. Experimental procedure

The fabrication of the MOHOS-type memory devices started
with chemical oxide growth on n-type (100) 7-10-cm resistivity
silicon substrate. This thin chemical oxide is obtained by exposing
substrates to H2O2 heated at 85C. Different chemical oxide
thicknesses are obtained when samples are removing from H2O2 at
4min, 16 min and 32 min. The thickness of the chemical oxide is
measured by ellipsometry. The charge trapping layer is prepared
using the sol-gel spin-coating method. HfO2 nanopowder of
American Elements PN HF: OX-03-NP with purity concentration of
99.9% is used as the precursor of the high-k dielectric nanoparticles
(np-HfO2). Initially, np-HfO2 and DI water are mixed with reactive
acetic acid (CH3COOH) in order to dissolve the nanoparticles. This
first solution is heated at temperature of 40C by one hour. Then, the
Filmtronics spin-on glass (SOG) 15A, of silicate family, is added to
the first solution and it is heated to 80C one hour more. The final
solution is deposited at 7000rpm for 20s at ambient temperature.
Then, the samples are annealed at 200C for 10min in N2 ambient. A
second thermal annealing is performed at 600C for 30min in N2
ambient too. The final blocking oxide is deposited using only the
spin-on glass Filmtronics solution at 5000rpm by 20s. The whole
structure is then annealed at 1000C. Finally, Aluminum deposition,
gate pattering, back contact and passivation in forming gas at 460C
are used to fabricate the final MOHOS-type memory devices. Figure
1 shows the structure of the fabricated devices.
Abstract- In this work, we present the use of HfO2 nanoparticles

(np- HfO2) embedded in a spin-on glass oxide matrix as an active
charge trapping layer for a metal/oxide/high- oxide/tunneling
oxide/silicon (MOHOS) type-memory structure. The active
trapping layer of high oxide is deposited on a thin layer of
chemical silicon oxide (SiOx) of less than 5nm. According to the
chemical oxide thickness, we could observe some differences in
figures of merit for memories like writing/erase time and
retention time.
I. Introduction
The technological progress and the scaling down of electronic
devices have carried out to new research in nonvolatile memory
industry. The typical silicon-oxide-nitride-oxide-silicon (SONOS)
charge trapping-based nonvolatile memories [1] have been widely
studied in past years. According to the International Technology
Roadmap for Semiconductors, the main problem of SONOS devices
is the prohibitive scaling down of tunneling oxide thickness due to
high leakage currents [2]. There are a wide variety of films with
higher dielectric constant values () than SiO2 which seems to solve
the leakage current problem. However, many of these films are not
thermodynamically stable on silicon or are lacking in other properties
such as high breakdown voltage, low deposition temperature and
compatibility with MOS process. Currently interest is centered on
films such as HfO2 with dielectric constant () value of 25 [3] which
seem to be a promising candidate to replace Si3N4 films as the charge
trapping layer of SONOS-type memory devices. These high-
materials lead to a new type of memory structure silicon/oxide/high/oxide/silicon (SOHOS)-type memory [4].
M
O
The active trapping layer based on HfO2 nanoparticles (npHfO2) embedded in a SiO2 matrix layer, deposited by spin-on glass,
is deposited on thin chemical oxide. After some thermal annealing
treatments, a final oxide is deposited by a sol-gel method in order to
block the generated charge toward the gate. This layer is called
blocking oxide. The high oxide acts as a charge storage which can
develop memory effects by modulating the density of trapped charge
according to the applied gate voltage. The thickness of the active
oxide layer is limited by the grain size of the np-HfO2 which is about
100 nm in diameter. It is expected that, by using a high quality
blocking oxide, the density of trapped charge present by the active
high layer can be retained and get wider hysteresis window because
of its ability for charge retention. Measurements of capacitancevoltage and current-voltage are performed in order to extract the
performance of these devices. Also, figures of merit like
program/erase time and retention time are obtained.
H
O
S
Metal Gate
Blocking oxide
Charge-trapping layer
Tunneling oxide
N-type (100) Si
Back contact
Fig. 1 Schematic of MOHOS-type memory fabricated at INAOE.
26
ISSN 977-2177-128009
Figure 4 shows the current-voltage (I-V) characteristics of

MOHOS- type memory structures. First, a positive voltage was
applied on the gate of fresh devices. As voltage increases, the gate
current also increases until breakdown voltage is reached. When
voltage decreases, current decreases but it remains in a higher level
which makes a difference between two current directions to form a
resistivity window. Then, a negative voltage is applied in order to
return to a low current level; however, this state is not reached
anymore due to oxide breakdown. Sweep direction is indicated by
arrows in the graph. The breakdown voltage of 3V indicates that
writing voltage must be lower than this value. According to the
curves, for high electric fields a Fowler-Nordheim tunneling (FN)
mechanism [7] is present. For this reason, 2V is used as writing
voltage without inducing breakdown.
III. Results and Discussion

Three different chemical oxide thicknesses and refractive
index were obtained through ellipsometry. Figure 2 shows these
oxide thicknesses. The thickest oxide of 2nm was found in samples
with 32 min of exposing in H2O2. These thin chemical oxides act as
tunneling oxide when charge is injected to the high-k layer from the
substrate. Also, after charge injection, the oxide acts as a barrier to
prevent already injected charge to tunnel back to the silicon
substrate, thus improving the retention time.
2.4
SiOx thickness measured by ellipsometry

Nf=1.46
2.2
1.8
1.6
1.4
IG (A)
thickness (nm)
2.0
1.2
1.0
0
12
16
20
24
28
32
36
time (min)
Fig. 2. Chemical oxide thicknesses
Fourier
Transform
Infrared
Spectroscopy
(FTIR)
measurements were performed to charge-trapping layer in order to
observe composition and chemical bonding. Figure 3 shows the
spectrum of samples and the related absorbance peaks. As can be
seen, the characteristic absorption peaks of thermal SiO2 associated
whit vibration modes of stretching LO (1252 cm-1), stretching TO
(1075 cm-1) and rocking (458 cm-1) are observed [5,6]. Besides, two
absorbance peaks which belong to Hf-O bonding (752 cm-1 and 515
cm-1) and some peaks of organic impurities, related to solution
preparation, of C=C and C=O bonding (1300 cm-1 and 1800 cm-1) are
also present [6]. The np-HfO2 embedded in a SiO2 matrix are present
in the charge-trapping layer.
10
-2
10
-3
10
-4
10
-5
10
-6
10
-7
10
-8
10
-9
10
-10
10
-11
10
-12
I-V characteristics of
SOHOS-type memories
FN tunneling
mechanism
VBD=3V
-10
0.02
Si-O
rocking
C=C
C=O
-4
-2
10
Cox=165pF
140p
Si-O
stretching TO
Si-O
stretching LO
-6
As a result of charge-trapping at high-k layer, a shift is

observed in the capacitance-voltage (C-V) curves. The flatband
voltage shift (VFB) is calculated in order to measure the trapped
charge [8]. According to shift direction, we can see that negative
charge is trapped due to positive gate voltage as shown by the arrow.
160p
0.04
-8
Fig. 4. I-V characteristics of MOHOS-type memory. At 2V FN

tunneling mechanism of charge injection is present.
180p
Charge-trapping layer
deposition at 7000rpm
cured 600C
0.06
absorbance (u.a.)
-1
VG (V)
Capacitance (F)
0.08
10
Hf-O
VFB=0.25V
120p
100p
80p
60p
gate area=13.44E-4cm
40p
fresh condition
writing VG=2V
20p
0.00
VFB=-0.4V
erase VG=-2V
0
-2.0
2000
1800
1600
1400
1200
1000
800
600
400
-1.5
-1.0
-0.5
0.0
0.5
1.0
VG (V)
-1
wavenumber (cm )
Fig. 5. C-V characteristics of MOHOS-type memories. The flatband voltage

shift is indicated by the arrow.
Fig. 3. Chemical bonding of charge-trapping layer after FTIR spectrum.
27
ISSN 977-2177-128009
Writing process is then applied to the devices by using a

positive gate voltage of 2V. Then a negative gate voltage of -2V is
applied to erase or remove the injected charge, erasing process. It
was expected that, after removing charge, the return of flatband
voltage to its initial value. However, as can be seen in figure 5, that is
not the cause because of deep trap levels at the np-HfO2 [9]. The
injected charge causes a shift of 0.25V which is helpful to
differentiate between write and erase states. The high-k of the
charge-trapping layer causes an increment in the oxide capacitance.
0.60
(a)
0.55
writing
VG=2V
0.50
0.45
tunneling oxide
2nm
1.45 nm
1.2nm
0.40
VFB (V)
0.35
The write speed characteristic of SOHOS-type memories is

shown in figure 6. The FN tunneling mechanism induced by the gate
voltage of 2V is used for writing. The V FB is due to electron
trapping in the high-k layer. During writing, figure 6(a), the electrons
in the substrate, gain enough energy from the applied voltage to cross
the tunneling oxide to reach the charge-trapping layer. When erasing,
figure 6(b), a negative gate voltage of -2V is applied. This voltage
causes an inverse effect in the charge trapping layer. The electrons
from the high-k layer are injected to the substrate by crossing the
tunneling oxide. We can see from the erase curves that V FB of fresh
condition is not reached. This reason is because some electrons are
trapped in the deep trap level of np-HfO2 as mentioned before. The
tunneling oxide thickness causes a variation in programming. When
the tunneling oxide is thinner, faster and higher density of charge for
writing/erase behavior is observed. Also, an increment in VFB shift of
0.54V is reached for structures with tunneling oxide of 1.2nm.
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0.1
10
100
1000
time (s)
0.60
erase
VG=-2V
(b)
0.55
0.50
tunneling oxide
2nm
1.45 nm
1.2nm
0.45
0.40
0.35
VFB (V)
Charge retention characteristic of MOHOS-type memories

measured at room temperature are shown in figure 7. The normalized
VFB shift is obtained as the ratio of VFB shift at the time of interest
and at the beginning [10]. The curves were obtained in writing
condition of 2V for 300 sec. For these devices, the thin tunneling
oxide thickness causes a fast loss of charge. The best retention
characteristics were found in devices with tunneling oxide of 2nm.
For measurements done after 10000 sec of writing, only 20% of
charge was lost. Hence, the trapped electrons by the np-HfO2 do not
escape easily. Figure 7 indicates that as the tunneling oxide becomes
thinner, a fast back tunneling of charge to the substrate develops.
0.30
0.25
0.20
0.15
0.10
0.05
0.00
0.1
10
100
1000
time (s)
Fig. 6. (a) Writing time characteristics and (b) erase time characteristics of
MOHOS-type memories with different tunneling oxide thicknesses.
Figure 8 (a) and 8 (b) show the flatband voltage shifts as a

function of charge density for writing and erase conditions. The
charge density is obtained by integrating gate current density with
programming time [11]. A high charge density injection causes a
high flatband voltage shift. Also, the injected charge density
increases when tunneling oxide is thinner. By the other hand, a thin
tunneling oxide causes a fast discharge of devices. It can be seen that
a high loss of charge increases the erase time but decreases the
charge retention time.
100
90
Normalized VFB (%)
80
70
60
50
40
Charge retention
30
tunneling oxide
2nm
1.45nm
1.2nm
20
10
0
1
10
100
1000
10000
time (seg)
Fig. 7. Charge retention characteristics for different tunneling oxide

thicknesses.
28
ISSN 977-2177-128009
thinnest tunneling oxide, we think that tunneling oxides of at least

2nm are needed in order to improve the long-term reliability of these
memories.
0.60
(a)
writing
VG=2V
0.55
0.50
0.45
0.40
0.35
VFB (V)
Acknowledgements
tunneling oxide
2nm
1.45 nm
1.2nm
0.30
This work was supported by the Mexican Council of Science

and Technology (CONACYT) through master scholarship No.
235292. To the Council of Science and Technology of Puebla
(CONCYTEP) through scholarship for thesis preparation.
0.25
0.20
0.15
References
0.10
[1] Suzuki, E., Hiraishi, H., Ishi, K., and Hayashi, Y. A low voltage alterable
EEPROM
with
silicon-oxide-nitride-oxide-semiconductor
(SONOS)
structure IEEE Transactions on Electron Devices. ED-30, 122. 1983
[2] ITRS 2010 Update Overview. 2010
[3] Kingon, A. I. Maria, J. P. Streiffer, S. K. Alternative dielectrics to silicon
dioxide for memory and logic devices, Nature, vol. 406, no.6799, pp. 1032
1038, 2000.
[4] Sugizaki T., Kobayashi M., Ishidao, M., Minakata, H., Yamaguchi, M.,
Tamura, Y., Sugiyama, Y., Nakanishi, T., y Tanaka, H. Novel multibit
SONOS type flash memory using a high-k charge trapping layer VLSI Symp
Tech Dig, 27-28. 2003
[5] V. P. Tolostoy, I. V. Chernyshova, V. A. Skryshevsky. Handbook of
infrared spectroscopy of ultrathin films. Ed. Wiley & Sons. 1985
[6] N. B. Colthup, L. H. Daly. Introduction to infrared an raman
spectroscopy. Ed. Academic press. 1982
[7] R. Perera, A. Ikeda, R. Hattori, Y. Kuroki. Trap assisted leakage current
conduction in thin silicon oxynitride films grown by rapid thermal oxidation
combined microwave excited plasma nitridation. Elsevier. 2003
[8] D. K. Schroder, Semiconductor material and device characterization
3rd. Ed. Wiley-Interscience.2006
[9] Y.N. Tan, W.K. Chim, B. J. Cho, W. K. Choi Over-erase phenomenon in
SONOS-type flash memory and its minimization using a hafnium oxide charge
storage layer IEEE Trans. Electron Dev. Vol. 51, no. 7. 2004
[10] D. S. Golubovic, E. Vianello, A. Arreghini, F. Driussi, M. J. van
Duuren, N. Akil, L. Selmi and D. Esseni. Programme and retention
characteristics of SONOS memory arrayswith layered tunnel barrier
Semicond. Sci. Technol. 23. 2008
[11] M. Sze, and K. Kwok. Physics of semiconductor devices WileyInterscience 3rd. ed. 2007
0.05
0.00
-11
-10
10
10
-9
10
-8
-7
10
10
-6
10
-5
10
-4
-3
10
10
-2
10
-1
10
-1
10
10
charge density (C/cm )
0.60
0.55
erase
VG=2V
(b)
0.50
tunneling oxide
2nm
1.45 nm
1.2nm
0.45
0.40
VFB (V)
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
10
-11
10
-10
10
-9
10
-8
10
-7
10
-6
10
-5
10
-4
10
-3
10
-2
10
charge density (C/cm )
Fig. 8. (a) Writing time characteristics and (b) erase time characteristics of
MOHOS-type memories with different tunneling oxide thicknesses after
integrating gate current density with programming time.
IV. Conclusions
A MOHOS-type Flash memory was proposed by deposition
of np-HfO2 embedded in a SiO2 matrix as a charge-trapping layer.
The electron trapping at np-HfO2 causes a flatband voltage shift
which is measured in order to extract the most important
performance characteristics of memory devices. The FN tunneling
mechanism is used as an efficient mechanism of electron injection to
charge-trapping layer. Also it was found that after erase operation not
all charge is released from the charge trapping layer due to deep trap
levels in np-HfO2. The tunneling oxide thickness is an important
parameter in devices performance. A thin oxide thickness increases
the writing/erase time but decreases the charge retention time.
Therefore, by considering a high retention time of about 80% for the
29
ISSN 977-2177-128009
Un algoritmo en tiempo real para etiquetado de

componentes conectados en imgenes
Elisa Calvo
Departamento de Electrnica y Electromagnetismo
Universidad de Sevilla
Sevilla, Espaa
calvo@imse-cnm.csic.es
Piedad Brox, Santiago Snchez-Solano

Instituto de Microelectrnica de Sevilla (IMSE-CNM)
Consejo Superior de Investigaciones Cientficas
Sevilla, Espaa
brox, santiago@imse-cnm.csic.es
ResumenEsta comunicacin presenta un algoritmo de dos

pasadas para el etiquetado en tiempo real de los componentes
conexos en una imagen. El algoritmo propuesto es una buena
opcin frente a otras alternativas de dos y mltiples pasadas ya
que ha sido diseado considerando que su implementacin en
FPGAs ofrezca un buen compromiso entre recursos ocupados y
velocidad de operacin. Se describen dos implementaciones
hardware de este algoritmo, cuyo desarrollo se ha llevado a cabo
siguiendo un flujo de diseo basado en la herramienta System
Generator de Xilinx.
I.
INTRODUCCIN
Cada vez con ms frecuencia es necesaria la integracin de

sistemas de visin por computador en dispositivos especficos
y sistemas empotrados (mviles, PDAs, redes de sensores,
etc.). Las limitaciones que presentan estos dispositivos en
cuanto a consumo de potencia y capacidad de clculo y
almacenamiento, as como la necesidad de operacin en
tiempo real de las aplicaciones donde se utilizan, obligan a
llevar a cabo una revisin de los algoritmos existentes con el
fin de adaptar su implementacin a las caractersticas de las
plataformas emergentes.
Figura 1. Sistema de visin genrico
principales tipos de algoritmos CCL existentes en la literatura.

La Seccin III se centra en el anlisis de un tipo concreto de
algoritmo, los denominados algoritmos CCL de dos pasadas, y
los problemas que presenta su implementacin. La Seccin IV
recoge la propuesta de un nuevo algoritmo que minimiza el
principal problema de los mtodos anteriores. En la Seccin V
se muestran los resultados obtenidos en simulacin con 250
imgenes usadas habitualmente en aplicaciones de visin y se
comparan con los proporcionados por un algoritmo clsico. La
Seccin VI describe algunos detalles de la implementacin y
la metodologa de diseo seguida. Las conclusiones del
trabajo se resumen en la Seccin VII.
Este proceso de adaptacin es especialmente importante en

los algoritmos de etiquetado de componentes conexos o
algoritmos CCL (Connected Component Labeling), ya que
estos procedimientos constituyen un paso intermedio entre las
tareas de tratamiento de imgenes a bajo y alto nivel, como se
ilustra en el sistema de visin genrico mostrado en Fig. 1.
II.
En este trabajo se describe el desarrollo de un nuevo

algoritmo CCL para el etiquetado de imgenes correctamente
preprocesadas, as como su implementacin eficiente sobre
FPGAs (en trminos de recursos y/o velocidad). En el proceso
de diseo se ha seguido una metodologa basada en System
Generator, una herramienta de Xilinx integrada en el entorno
Matlab/Simulink que, al facilitar la realizacin de todas las
etapas de diseo bajo un mismo marco de referencia, ha
permitido acortar el tiempo de desarrollo de las distintas
implementaciones.
ESTADO DEL ARTE DE LOS ALGORITMOS CCL
Los algoritmos CCL realizan la asignacin de un

identificador nico a cada conjunto conexo de pxeles con
unas mismas propiedades dentro de la imagen. Este trabajo se
centra en algoritmos que, partiendo de una imagen binaria (B),
van analizando conectividades entre pxeles situados en un
entorno formado por cuatro (N4) u ocho vecinos (N8), de
forma que, al final, dos pxeles, p y q, pertenecern a un
mismo componente cuando ambos se consideren parte del
primer plano (o 'Foreground') o del fondo (tambin llamado
'Background') y exista un camino de pxeles del mismo tipo
entre ellos, es decir, cuando se verifique (1), donde S es un
subconjunto de pxeles de la imagen binaria B.
La estructura de la comunicacin es la siguiente. En la

Seccin II se revisan algunos conceptos bsicos y los
Este trabajo ha sido financiado por los proyectos: MOBY-DIC FP7-IST248858 (www.mobydic-project.eu) de la Comunidad Econmica Europea,
TEC2008-04920 del Ministerio de Ciencia e Innovacin de Espaa y P08TIC-03674 de la Junta de Andaluca (con soporte FEDER).
30
ISSN 977-2177-128009
( )
exploracin de vecinos, con el fin de minimizar el

nmero de accesos a memoria [8]-[10].
Este tipo de algoritmos son, en general, ms difciles

de implementar en hardware que los de mltiples
pasadas, su tiempo de ejecucin es tambin elevado y
dependiente de la complejidad de la imagen a tratar y
pueden presentar, con determinadas estructuras de
memoria, solapamientos y prdidas de etiquetas. Sin
embargo, frente a aquellos, presentan la clara ventaja
de que permiten establecer la duracin concreta del
mtodo.
La Fig. 2 muestra un ejemplo de etiquetado de una imagen

donde puede comprobarse cmo el nmero de componentes
encontrados vara en funcin de la conectividad considerada.
Como consecuencia de la importancia de los algoritmos de
CCL, desde la dcada de los 60 se han invertido muchos
esfuerzos en el avance y desarrollo de los mismos. Aunque
existen varios criterios que permiten catalogar los distintos
procedimientos, como la regularidad en el acceso a memoria o
la forma de representacin de la imagen, resulta habitual
representar la imagen como un array bidimensional y
establecer una clasificacin atendiendo al nmero de barridos
que se realizan de la misma, entendiendo como tal la
exploracin de todos los pxeles independientemente del orden
que se siga para ello. De este modo, podemos encontrar
algoritmos:
De una pasada (one-scan): Son algoritmos en los que

se recorre la imagen una sola vez. Los accesos
irregulares y aleatorios a las estructuras de datos que
almacenan la imagen o las etiquetas asignadas, y la
dificultad para predecir los tiempos de duracin de los
mtodos, son los principales inconvenientes de este
tipo de algoritmos, entre los que se encuentran el de
trazado de contorno presentado por Chang en 2003 [1]
o el de raster-scan presentado por Bailey en 2007 [2].
De mltiples pasadas (multi-scan): Realizan varios

barridos de la imagen (normalmente alternos, de
arriba abajo y de izquierda a derecha, y de abajo a
arriba y de derecha a izquierda), accediendo a
memoria de forma regular. El tiempo de ejecucin del
procedimiento depender de la disposicin de los
pxeles en cada imagen, por lo que no es posible
establecer, a priori, una duracin del mtodo en cada
caso. Su implementacin software y hardware es ms
sencilla que la de algoritmos de otros grupos. Entre
ellos se pueden destacar los trabajos presentados por
Haralick en 1981 [3] y Suzuki en 2000-2003 [4], [5].
De dos pasadas (doble-scan): Son los mtodos que

llevan a cabo dos barridos de la imagen. Normalmente
el primero permite un etiquetado temporal de la
misma, mientras que el segundo posibilita la
asignacin de las etiquetas definitivas a cada pxel.
Suelen acceder a memoria de forma regular y usar una
o varias tablas para almacenar las equivalencias entre
etiquetas distintas asignadas de forma temporal a un
mismo componente. De hecho, las estructuras de
datos usadas para almacenar estas equivalencias entre
etiquetas (que han evolucionado desde matrices de
adyacencia o estructuras de n-tuplas [6] hasta
estructuras vectoriales [7] y tablas asociativas [8]), los
algoritmos empleados para resolver las equivalencias
y el instante en el cual tendr lugar esa resolucin,
caracterizan las distintas alternativas propuestas en la
literatura.
Es importante mencionar que muchos de los algoritmos

publicados en los ltimos aos introducen algn tipo de
paralelismo en las estructura de datos y/o en los elementos de
procesado. Estas tcnicas, basadas en investigaciones iniciadas
durante los 70 y los 80 en las cuales se disearon e
implementaron algoritmos sobre arquitecturas masivamente
paralelas (ej. [11]- [16]) permiten acelerar la ejecucin a costa
de un incremento del rea ocupada del dispositivo, lo que se
traduce en la imposibilidad de trabajar con tamaos de imagen
mayores y en el uso, en ocasiones, de memorias externas (ej.
[17]- [20]).
Figura 2. Concepto de conectividad cuando se considera un entorno

de vecindad de 4 pxeles (a) u 8 pxeles (b). El array de pxeles
considerados en cada caso (tambin llamado mscara) se muestra en
la esquina superior derecha de cada ejemplo
Muchos trabajos publicados en esta lnea se centran

tambin en buscar una solucin ptima al problema de
31
ISSN 977-2177-128009
III.
B. Problema de prdida de equivalencias y su evaluacin

Se producen prdidas de equivalencias cuando se
sobrescriben las posiciones de memoria de la tabla de
equivalencias.
ALGORITMOS CCL DE DOS PASADAS
Los algoritmos que se adaptan mejor a una

implementacin hardware en la que la imagen va a llegar
como un flujo continuo de pxeles son los algoritmos de dos
pasadas del tipo raster-scan, ya que la forma de acceso a
memoria suele ser regular y establecen una duracin finita del
procedimiento.
El algoritmo de doble-scan que se describe a continuacin

corresponde a una implementacin clsica utilizando este tipo
de estructura de memoria. Su anlisis permite explicar el
problema de prdidas de equivalencia y ser usado en este
trabajo para establecer comparaciones con la solucin
propuesta. Dicho algoritmo, al cual llamaremos 'algoritmo 1',
presenta las siguientes caractersticas:
A. Algoritmo genrico
Este tipo de algoritmos comienzan realizando un primer
barrido de la imagen, en el que se asignan etiquetas temporales
a los pxeles y se identifican las posibles equivalencias que
puedan producirse. La expresin matemtica que permite
describir la etiqueta temporal asignada al pxel (x,y) durante el
primer barrido es la siguiente:
(
{(
(
} (
)
)
[{ (
FB son los pxeles que constituyen el fondo de la

imagen binaria (pxeles en negro con valor '0') y FO
son los correspondientes al primer plano (pxeles en
blanco con valor '1'),
M es la ventana que establece el criterio de

conectividad entre pxeles,
m incrementar su valor (m=m+1) cada vez que se

} ( )
verifique la condicin {( )
,
L es la etiqueta temporal asignada finalmente al pxel

(x,y). Esta etiqueta fue asignada ya a alguno de los
pxeles vecinos analizados. La forma en que se
determina su valor es diferente en los distintos
algoritmos propuestos.
}]
)( )
}])
[{ ( ) ( )
}]
(4)
Modo de resolucin de la tabla: Tras el primer scan,

se recorre la tabla de equivalencias desde la primera
posicin hasta la ltima. Para cada entrada, si la
direccin de entrada es diferente de su equivalencia se
actualiza dicha entrada en la tabla de acuerdo con:
(
)
(
( (
))
(5)
En este mtodo, considerando una conectividad 4, se

distinguen dos tipos de prdidas:
Una vez se concluye el primer barrido, o solapado con este

total o parcialmente, se realizar la resolucin de la tabla de
equivalencias. En la segunda pasada se utiliza la informacin
de la tabla de equivalencias para llevar a cabo la sustitucin de
las etiquetas temporales por permanentes.
En cuanto a recursos, a diferencia de los algoritmos multiscan, la implementacin hardware de un algoritmo de este tipo
requiere nicamente una memoria para almacenar las
equivalencias entre etiquetas, ya que las etiquetas temporales
asignadas a cada pxel durante el primer barrido, que sern
necesarias de nuevo en la fase de sustitucin, podrn volver a
ser calculadas durante esta fase utilizando la misma circuitera
empleada en la fase anterior [18].
Simples: Se producen cuando existen dos pares

equivalentes que comparten uno de los elementos del
par siendo el elemento compartido el de ms alto
valor de entre las tres etiquetas. Pueden darse en una
fila o en filas diferentes. Varias equivalencias entre
etiquetas pueden enlazarse (formando una cadena) lo
que impide que puedan evitarse estas prdidas de una
forma sencilla.
Mltiples: Se producen cuando existen ms de dos

pares equivalentes que comparten uno de los
elementos del par, siendo el elemento compartido el
de ms alto valor de entre los considerados.
La Fig. 3 ilustra casos de prdidas simples (en una fila (a)

o entre filas (b)) y mltiples.
Para analizar con mayor profundidad con qu frecuencia se
producen prdidas de equivalencias, se crearon bateras de
imgenes patrn con concavidades, convexidades y escaleras
sucesivas y diferente orden de aparicin de las etiquetas
asignadas. Las simulaciones realizadas con estas imgenes
muestran cmo el porcentaje de casos en los que existen
prdidas es muy elevado. Concretamente, con patrones con
concavidades en una misma fila (Fig. 4 (a)) hay prdidas en un
66,7 % de los casos, con patrones escalera creciente hay
prdidas en un 66,7% de los casos (Fig. 4 (b)) y con patrones
Como se ha comentado al establecer la clasificacin de los

algoritmos CCL, existen diferentes propuestas para la
implementacin de la memoria de equivalencias. De todas
ellas, las estructuras vectoriales son las que permiten alcanzar
un compromiso mejor entre el consumo de memoria y el coste
de procesamiento. Sin embargo, presentan un inconveniente:
la posible prdida de equivalencias.
)(
Actualizacin de la tabla de equivalencia con cada

equivalencia encontrada:
([{ (
donde:
Etiqueta asignada en la primera pasada, si el pxel es

blanco y no es una nueva etiqueta:
32
ISSN 977-2177-128009
Figura 4. Ejemplos de patrones analizados en el estudio de

prdidas de equivalencias (a) Concavidades sucesivas en una fila
(b) Escalera creciente (c) Convexidades sucesivas en una vertical
se ha modificado su valor directamente como si se ha hecho a

travs de una cadena con otros pares).
V.
A. Imgenes
Para evaluar la bondad y aplicabilidad del mtodo
propuesto se han realizado una serie de simulaciones con
varios grupos de imgenes reales utilizando las herramientas
del entorno Matlab. Las imgenes, tomadas de las bases de
datos de la USC-SIPI [21] y el Berkeley Computer Vision
Group [22], han sido seleccionadas intentando abarcar el
mayor rango posible en cuanto a temas, para comprobar la
aplicabilidad del mismo en distintos campos: medioambiente
(animales, paisajes), seguridad (personas), medicina (clulas),
etc.), y dificultad, con el fin de verificar la calidad del
etiquetado (para ello se han seleccionado imgenes de texturas
e imgenes areas).
Figura 3. Ejemplos de prdidas simples y mltiples (a) Imagen. (b)

Etiquetado temporal. (c) Evolucin de la tabla de equivalencias
durante el barrido inicial (izq. inicial, der. final). En rojo, los
cambios en cada ciclo. En azul, los casos en los que se
sobreescriben equivalencias entre etiquetas. (d) Etiquetado final
con convexidades que comparten una columna (Fig. 4 (c))

ocurren prdidas en un 50% de los casos.
IV.
Para aplicar los algoritmos sobre estas imgenes, fue

necesaria la realizacin de distintas operaciones de
preprocesado sobre las mismas: conversiones de color
(modelo RGB) a niveles de gris, umbralizaciones mediante el
mtodo de Otsu y dilataciones de bordes (Fig. 5).
ALGORITMO PROPUESTO
Con el objetivo de reducir las prdidas de equivalencias y

aprovechar las ventajas que proporciona este tipo de
algoritmos de dos pasadas, se ha propuesto una modificacin
sobre el Algoritmo 1. La etiqueta asignada en la primera
pasada (si el pxel es blanco y no es una nueva etiqueta) y la
actualizacin de la tabla de equivalencias se har de acuerdo a:
[{ ( (
([{ (
)(
)) (
}])
)
[{ ( ( )) (
)
)
}]
}]
(7)
B. Medidas realizadas
La calidad del algoritmo se midi mediante el clculo de
los siguientes valores:
Es decir, en lugar de asignar la etiqueta mnima de entre

las vecinas, se asigna el mnimo de entre las equivalencias de
las etiquetas vecinas, al igual que se propone en [5]. Las
entradas de la tabla de las etiquetas del entorno de vecindad se
actualizan tambin con ese mismo valor. La forma en la que se
lleva a cabo la resolucin de la tabla de equivalencias se
mantiene con respecto al Algoritmo 1.
Error absoluto (ErrorA): Diferencia en el nmero de

componentes conexos encontrados con respecto a los
resultados proporcionados por la instruccin 'bwlabel'
de Matlab.
Error relativo (ErrorR): Error absoluto cometido en

cada imagen dividido por el nmero total de
componentes etiquetados en cada caso.
C. Resultados obtenidos
Los resultados obtenidos, resumidos en la Tabla I, reflejan
que el porcentaje de imgenes en las cuales se cometi errores
en el etiquetado descendi de un 55-75% con el 'Algoritmo 1'
a un 6-10% con el algoritmo propuesto. Este dato, unido a que
el error medio que se comete es en el peor de los casos de 14
etiquetas en el algoritmo propuesto frente a 120 etiquetas en el
'Algoritmo 1', y a que el error relativo mximo cometido con
el algoritmo propuesto es de 1,07% (en imgenes de texturas),
permite afirmar que la calidad del algoritmo propuesto es
buena y mejor que la de un algoritmo tpico de este tipo.
Adems, hay que tener en cuenta que las imgenes con las que
Al analizar la salida de este algoritmo con las distintas

imgenes de las bateras patrn, se comprueba que con este
mtodo se eliminan las prdidas mltiples y las simples que
tienen lugar en una misma fila, adems de reducirse las
prdidas simples que tienen lugar entre filas (el porcentaje de
error desciende del 66,7% al 33%). Esto es debido a que se
evitan las prdidas que se producen cuando la equivalencia del
elemento compartido sea menor, al producirse la sobrescritura,
que la del otro elemento del segundo par equivalente (tanto si
RESULTADOS DE SIMULACIN CON IMGENES REALES
33
ISSN 977-2177-128009
Figura 5. Ejemplo de las operaciones de preprocesamiento

realizadas a las imgenes usadas como entrada
se cometen ms errores han sido seleccionadas por su

dificultad pero normalmente los algoritmos de etiquetado no
se aplican a esas imgenes. Con las imgenes que
habitualmente sern entrada de estos algoritmos (suelen tener
disposiciones ms sencillas) el algoritmo propuesto no comete
errores.
TABLA I.
Figura 6. Herramientas usadas en el flujo de diseo
de la utilizacin de dos ciclos en el procesamiento de cada

pxel en cada fase, mientras que la otra minimiza el nmero de
ciclos invertido en procesar cada pxel haciendo uso de una
cantidad mayor de memoria.
ANLISIS DE LA BONDAD DEL ALGORITMO PROPUESTO:

VALORES MEDIOS DE ERROR
N(p)
Imgenes
Texturas
Areas
Miscelneas
Miscelaneas con un
preprocesamiento de
deteccin de bordes
Texturas
Areas
Miscelneas
Miscelaneas con un
preprocesamiento de
deteccin de bordes
VI.
Algoritmo 1
Algoritmo
propuesto
ErrorA
ErrorR
ErrorA
ErrorR
119,22
76,11
19,33
8,89
3.46
2.83
13,88
11,55
3,6
1,07
0,5
0,3
0,9
7,4
85,22
88
25,8
13,84
7,99
6.34
1,55
1,88
0,73
0,18
0,14
0,2
6,74
47,83
En ambos diseos, la implementacin de la fase inicial del

mtodo emplea como buffer una memoria de dos puertos que
permite almacenar las etiquetas temporales asignadas a los
pxeles de dos filas consecutivas. De esta forma, en el
etiquetado de cada pxel, se dispone de las etiquetas de los
pxeles de la mscara situados en la fila anterior (en la Fig. 2
(b) los pxeles p, q y r en el etiquetado de x), las cuales fueron
almacenadas con anterioridad, y de espacio suficiente para
guardar la etiqueta que acaba de ser asignada (cada una de
estas dos operaciones, el acceso a r para obtener el equivalente
al pixel q en cada ciclo y la escritura de x, se llevan a cabo por
un puerto diferente de la memoria). Las etiquetas de los
pixeles del entorno del pxel x son a su vez entrada de
direccin de una memoria o tabla de equivalencias que
proporciona, en cada instante, los valores con los cuales se
calcula el mnimo que va a ser almacenado. Adems, para
cada pxel ser necesario comprobar si hay equivalencia entre
las etiquetas vecinas. Si la hay, esta es almacenada por un
segundo puerto de la memoria de equivalencias.
IMPLEMENTACIN DEL ALGORITMO PROPUESTO
A. Herramientas y metodologa de diseo

El proceso seguido, as como las herramientas usadas en
l, se muestran en la Fig. 6. Se ha partido de una
implementacin software en Matlab (fichero .m) que ha sido
traducida a un modelo Simulink (.mdl). Dicho modelo,
compuesto por bloques tanto de la librera bsica de Simulink
como del Xilinx Blockset, ha sido verificado a nivel lgico y
funcional desde Matlab, tras lo cual se ha compilado,
generndose en el proceso los ficheros de un proyecto ISE.
Desde el entorno ISE, se han realizado estimaciones de rea y
tiempo y otras operaciones de test. Por ltimo, para comprobar
que el funcionamiento real del sistema es el deseado, se ha
llevado a cabo una co-simulacin HW/SW, cerrando el lazo en
el flujo de diseo.
El hecho de que los diseos traten de optimizar el rea

ocupada sustituyendo una memoria de etiquetas temporales
por este buffer conlleva, adems, que sea necesaria la lectura
de la imagen en dos ocasiones, as como la inicializacin de
este buffer de etiquetas entre la fase inicial y la fase de
sustitucin para que no existan errores en el etiquetado al
reutilizar los bloques hardware (la fase de sustitucin se
realizar de la misma forma que la fase inicial, con la salvedad
de que no se modificar la tabla de equivalencias, y los
bloques usados en una fase estarn disponibles en la otra).
B. Implementacin
Se han realizado dos implementaciones sobre FPGA del
algoritmo propuesto, considerando, en ambos casos, la
conectividad en un entorno de 4 vecinos. Una de ellas permite
reducir el nmero de recursos usados en el dispositivo a costa
La implementacin de la fase de resolucin es diferente en

los dos diseos. En el que invierte dos ciclos en procesar cada
entrada de la tabla, por un puerto se accede siempre en lectura
al equivalente de una etiqueta, o lo que es lo mismo, al valor
34
ISSN 977-2177-128009
de esa entrada de la tabla. Por el otro puerto, en un ciclo se

accede en escritura para almacenar el nuevo valor de la
entrada anterior a la que est siendo considerada, mientras que
en el otro se accede en lectura para obtener el equivalente del
equivalente obtenido del primer puerto. En el diseo que
invierte tan solo un ciclo en procesar cada pxel son necesarias
dos memorias de equivalencias iguales para poder acceder, en
un mismo ciclo, al equivalente de una etiqueta (T(Labeli)) y al
equivalente del equivalente de la etiqueta anterior (T(T(Labeli1)), el cual ser almacenado por el otro puerto de las memorias
en la entrada correspondiente a la Labeli-1.
BIBLIOGRAFA
[1]
[2]
[3]
[4]
Puesto que las memorias son recursos compartidos por las

distintas fases del procedimiento, es necesario acceder a ellas
mediante bancos de multiplexores controlados mediante
seales de fase.
[5]
C. Resultados de la implementacin
Tras sintetizar los diseos para una placa SPARTAN 3A
DSP (XC3SD1800A), se comprueba que la mayor imagen que
es posible sintetizar sin el uso de memoria externa
considerando una tabla de equivalencias mxima es de
330x240 en el caso del diseo de un ciclo y de 400x370 en el
caso del diseo de dos ciclos. No obstante, en la mayora de
los casos, el nmero de etiquetas asignadas en la primera
pasada no excede del 20% de las que se pueden asignar. Si se
considerase una tabla de equivalencias con el 30% del tamao
mximo, sera posible sintetizar imgenes con resoluciones de
hasta 500x460 y 670x600 respectivamente. En ambos casos, el
recurso que limita el tamao mximo de imagen con el cual es
posible trabajar son los bloques de memoria RAM de doble
puerto de 16 kb de datos, ya que los porcentajes de utilizacin
del resto de recursos son muy bajos (ej. slo se usan un 3% de
los ''Slice Flip Flops'' y ''4 inputs LUTs'' disponibles en la
placa). La frecuencia mxima de reloj obtenida es
aproximadamente de 32 MHz. Con estos datos, se estima que
con el primer diseo se puede trabajar en tiempo real con el
tamao de imagen mximo sintetizable (es decir, si nos
ajustamos a los estndares de vdeo existentes, es posible
trabajar con un estndar WCIF) siendo la RAM disponible el
recurso que impide trabajar con dimensiones mayores. El
segundo diseo, por el contrario, est limitado por la velocidad
a la cual es posible trabajar. Con l, se puede trabajar en
tiempo real con resoluciones de hasta 640x480 (VGA).
VII.
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
CONCLUSIONES
Como consecuencia de los requerimientos de las nuevas

plataformas utilizadas por los sistemas de visin, es necesario
llevar a cabo un proceso de rediseo de los algoritmos de
tratamiento de imgenes que han venido implementndose
mediante software en procesadores de propsito general. En
este artculo se propone un nuevo algoritmo CCL que, al haber
sido diseado teniendo en cuenta su implementacin
hardware, constituye una buena opcin para integraciones en
dispositivos empotrados, alcanzndose con l un compromiso
adecuado en consumo de recursos, velocidad de operacin y
calidad en el etiquetado. Adems, en su diseo se ha seguido
una metodologa novedosa, que ha permitido acortar los
tiempos de desarrollo y facilitar la realizacin de pruebas de
integracin del algoritmo en sistemas complejos.
[18]
[19]
[20]
[21]
[22]
35
F.Chang, C. Chen,A component-labeling algorithm using contour

tracing technique", Proc. Int. Conf. Document Anal. Recog., pgs. 741
745, 2003
D.G. Bailey, C.T.Johnston, "Single Pass Connected Components
Analysis", Proceedings of Image and Vision Computing, New Zeland,
pgs. 282-287, 2007.
Haralick, R.M. "Some neighborhood operations". [aut. libro] M Onoe,
K. Jr. Preston y A Rosenfeld. "Real Time Parallel Computer Image
Analysis". New York : Plenum Press, pgs. 11-35, 1981.
K.Suzuki, I.Horiba, N.Sugie, "Fast connected-component labeling
based on sequential local operarations in the course of forward raster
scan followed by backward raster scan", Proc. of 15th International
Conference on Pattern Recognition, Barcelona, Spain, pgs. 434-437,
Sept 3-7 2000.
K.Suzuki, I.Horiba, N.Sugie, "Linear-time connected-component
labeling based on sequential local operations". Computer Vision and
Image Understanding 89, pgs. 1-23, 2003.
A.Rosenfeld, J.L Plazt, "Sequential operator in digital pictures
processing", Journal of ACM,vol 13,4, pgs. 471-494,1966.
R.Lumia, L.Shapiro, O.Zungia, "A new connected components
algorithm for virtual memory computers", Comput. Vision, Graphics,
and Image Process. 22 (2), pgs. 287300, 1983.
L.He, Y.Chao, l.Suzuki, "A run-based two-scan labeling algorithm",
IEEE Trans. Image Process., vol. 17, no. 5,, pgs. 749756, 2008.
K.Wu, E.Otoo, A. Shoshani , "Optimizing connected component
labeling algorithms", Proc. SPIE Conf. Med. Imag, vol 5747, pgs.
19651976, 2005.
K.Wu, E.Otoo, K.Suzuki, "Optimizing two-pass connected-component
labeling algorithms", Pattern. Anal. Applic. 12, pgs. 117-135, 2009.
R.Miller, Q.F. Stout, "Varying diameterand problem size on meshconnected computer", Proc. Intl. Conf. Par. Proc, pgs. 679-699, 1985.
D. Nassini, S.Sahni, "Finding connected components and connected
ones on a mesh-connected parallel computer" SIAM J.Comput, vol. 9,
no 4, , pgs. 744-757, 1980
A. Agrawal, L. Nekludova, W.Lim, "A parallel O(log(N)) algorithm
for
finding
connected
components
in
planar
images",
Proc.Int.Conf.Parallel Processing, pgs. 783-786, 1987.
R.Cypher, J.L.C Sanz, L. Snyder "Algorithms for image component
labeling on SIMD mesh connected computers", IEEE Trans. Comput,
vol.39, no.2, pgs. 276-281, 1990.
H.M. Alnuweiri, V.K. Prasanna, "Fast image labeling using local
operators on mesh-connected computers" IEEE Trans. Patt.Anal.
Machine Intell, vol 13, no 2, pgs. 202-207, 1991.
H. Shi, G.X. Ritter, "O(n)-Time and O(logn)-Space Image Component
Labeling with Local Operators on SIMD Mesh Conected Computers",
International Conference on Parallel Processing, pgs. 98-101, 1993.
S.-W Yang et al.,"Vlsi architecture design for a fast parallel label
assignment in binary image", Circuits and Systems, ISCAS 2005. IEEE
International Symposium, vol 3, pgs. 23932396, 2005.
H. Flatt et al., "A Parallel Hardware Architecture for Connected
Component Labeling Based on Fast Label Merging". ApplicationSpecific Systems, Architectures and Processors, 2008. International
Conference, pgs. 144 - 149, 2008.
S.-W Yang et al ,"Parallel 3-Pixel Labeling Method and its Hardware
Architecture Design", Fifth International Conference on Information
Assurance and Security, 2009.
D.K. Kim et al. "Real-Time Component Labeling and Boundary
Tracing System Based on FPGA". International Conference of Robotics
and Biomimetics, Proceedings of the 2007 IEEE, Sanya, China, pgs.
15-18, 2007.
USC-SIPI(University of Southern California- Signal and Image
Processing Institute. [En lnea] http://sipi.usc.edu/database/.
Berkeley Computer Vision (University of California). [En lnea]
http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/re
sources.html.
ISSN 977-2177-128009
Behavior of Lyapunov Function for Different

Strategies of the Circuit Optimization Problem
Alexander Zemliak, Antonio Michua
Tatiana Markina
Department of Physics and Mathematics

Autonomous University of Puebla
Puebla, Mexico
Institute of Physics and Technology

National Technical University of Ukraine
Kiev, Ukraine
AbstractThe design problem for analog network has been

formulated as the functional minimization problem of the
optimal control theory. The optimal sequence of the control
vector switch points was determined as a principal
characteristic of the minimal-time system design algorithm.
The conception of the Lyapunov function was proposed to
analyze the behavior of the process of designing. The special
function that is a combination of the Lyapunov function and its
time derivative was proposed to predict the design time of any
strategy by means of the analysis of initial time interval of the
process of network optimization. The parallel computing serves
to compare the different strategies of optimization in real time
and to select the best strategy that has the minimal computer
time. This approach gives us the possibility to select the quasi
optimal strategy of network optimization by analyzing the
initial part of the total design process.
of this theory is the introduction of the special control

functions, which, on the one hand generalize the design
process and, on the other hand, they give the possibility to
control design process to achieve the optimum of the cost
function of designing for the minimal computer time. This
possibility appears because practically an infinite number of
the different design strategies that exist within the bounds of
the theory. The different design strategies have the different
operation number and different executed computer time. On
the bounds of this conception, the traditional design strategy
is only a one representative of the enormous set of different
strategies of designing. As shown in [8] the potential
computer time gain that can be obtained by the new design
problem formulation increases when the size and complexity
of the system increase. However it can be realized for
optimal or quasi optimal algorithm only.
We can define the formulation of the main properties of
the quasi optimal design strategy as one of the first problems
that needs to be solved for the optimal algorithm
construction.
I.
INTRODUCTION
The reduction of computer time for a large system

designing is one of the essential problems of the total quality
design improvement. Besides the traditionally used ideas of
sparse matrix techniques and decomposition techniques [1-2]
some another ways were proposed to reduce the total
computer design time [3-4]. The above described ideas of
system designing can be named as the traditional approach or
the traditional strategy because the method of analysis is
based on the Kirchhoff laws.
The other formulation of the circuit optimization problem
was developed on heuristic level some decades ago [5]. This
idea was based on the Kirchhoffs laws ignoring for all the
circuit or for the circuit part. The special cost function is
minimized instead of the circuit equation solving. This idea
was developed in practical aspect for the microwave circuit
optimization [6] and for the synthesis of high-performance
analog circuits [7] in extremely case, when the total system
model was eliminated.
The generalized approach for the analog system design
on the basis of control theory formulation was elaborated in
some previous works [8-9]. This approach serves for the
definition of minimal-time algorithm of designing. On the
other hand this approach gives the possibility to analyze with
a great clearness the design process while moving along the
trajectory curve into the design space. The main conception
II. PROBLEM FORMULATION

The designing process for any analog system design can
be defined in discrete form [8] as the problem of the
generalized cost function F X ,U minimization by means
of the equation (1) with the constraints (2):
s +1
(1 u ) g
j
= X
+ ts H
( X ) = 0,
j = 1, 2 , . . . , M
(2)
where X RN , X = ( X, X) ,
X R K is the vector of the

M
independent variables and the vector X R is the vector
of dependent variables ( N = K + M ), g j ( X ) for all j
presents the system model, s is the iterations number, t s is

1
the iteration parameter, t s R , H H(X,U) is the direction
This work was supported by the Autonomous University of Puebla under

project VIEP ZEEA-EXC12-G.
(1)
36
ISSN 977-2177-128009
of the generalized cost function F ( X , U ) decreasing, U is

the
vector
of
the
special
control
functions
U = u1 , u 2 , ... , u m , where u j ; = {0;1} . The
where the operator / x means the complete derivative

of appropriate function.
The idea of the system design problem formulation as
the functional minimization problem of the control theory is
not depend on the optimization method and can be
embedded into any optimization procedures.
Now the analog system design process is formulated as a
dynamic controllable system. The time-optimal design
process can be defined as the dynamic system with the
minimal transition time in this case.
generalized cost function F X ,U is defined as:
F ( X ,U ) = C ( X ) + ( X ,U )
(3)
where C ( X ) is the non negative cost function of the

designing process, and
function:
( X , U ) is the additional penalty
( X ,U ) =
g 2j ( X )
III. LYAPUNOV FUNCTION

The main problem of the time-optimal algorithm
construction is unknown optimal sequence of the switch
points during the design process. We need to define a special
criterion that permits to realize the optimal or quasi-optimal
algorithm by means of the optimal switch points searching.
On the other hand a Lyapunov function of dynamic system
serves as a very informative object to any system analysis in
the control theory. We suppose that the Lyapunov function
can be used for the revelation of the optimal algorithm
structure. First of all we can compare the behavior of the
different design strategies by means of the Lyapunov
function analysis.
There is a freedom of the Lyapunov function choice
because of a non-unique form of this function. Let us define
the Lyapunov function of the design process (2)-(6) by the
following expression:
(4)
j =1
This formulation of the problem permits us to redistribute

the computer time expense between the solution of problem
(2) and the optimization procedure (1) for the function
F X ,U . The control vector U is the main tool for the
redistribution process in this case. Practically an infinite
number of the different design strategies are produced
because the vector U depends on the optimization procedure
current step. The problem of search of the optimal design
strategy is formulated now as the typical problem for the
functional minimization of the control theory. The functional
that needs to minimize is the total CPU time T of the design
process. This functional depends directly on the operations
number and on the strategy of designing that has been
realized. The main difficulty of this definition is unknown
optimal dependencies of all control functions u j .
The continuous form of the problem definition is more
adequate for the control theory application. This form
replaces Eq. (1) and can be defined by the next formula:
dxi
= f i ( X ,U ) ,
dt
i = 1,2,...,N
V ( X , U ) = [F ( X , U
F ( X , U )
V ( X , U ) =
x i
i
F (X ,U ),
xi
i = 1,2 , ... , K
(7)
2
(8)
where F(X,U) is the generalized cost function of the design

process. The formula (7) can be used when the general cost
function is non-negative and has zero value at the stationary
point. Other formula (8) can be used always.
The problem of the construction of the time-optimal
design algorithm can be formulated now as the problem of
the transition process searching with the minimal transition
time. There is a well-known idea [10-11] to minimize the
time of transition process by means of the special choice of
the right hand part of the principal system of equations, in
our case these are the functions f i ( X ,U ) . It is necessary to
(5)
This system together with equations (2), (3) and (4)

composes the continuous form of the design process. The
structural basis of different design strategies that correspond
to the fixed control vector U includes 2M design strategies.
The functions of the right hand part of the system (5) can be
determined for example for the gradient method as:
fi (X ,U ) =
)]r
change the functions f i ( X ,U ) by means of the control

vector U selection to obtain the maximum speed of the
Lyapunov function decreasing (the maximum absolute value
(6)
(1 uiK ) { xs + ( X )} (6')
f i ( X ,U ) = uiK
F( X ,U ) +
i
i
ts
xi
of the time derivative of Lyapunov function V = dV / dt ).

Normally the time derivative of Lyapunov function is nonpositive for the stable processes. However we define more
informative function as a relatively time derivative of the
i = K + 1, K + 2 , ... , N ,
Lyapunov function: W = V/ V .
37
ISSN 977-2177-128009
TABLE I. DATA OF COMPLETE STRUCTURAL BASIS OF STRATEGIES OF

DESIGNING FOR NETWORK IN FIG. 1.
IV. ANALYSIS OF DIFFERENT STRATEGIES

All examples have been analyzed for the continuous form
of the optimization procedure (5). Lyapunov function V(t)
and some other functions that been produced from V(t) were
the main objects of the analysis. The behavior of all these
functions have been analyzed for all strategies that compose
the structural basis of the general design methodology. We
need to analyze some special functions for the definition of
the rigorous correlation between the CPU time and the
properties of Lyapunov function. The cost function C ( X )
has been determined as the sum of the squared differences
between beforehand-defined values and current values of the
nodal voltages for some nodes. All results were obtained by
parallel computing for different strategies of designing.
The three-node network is shown in Fig. 1.
N Control
vector
1
2
3
4
5
6
7
8
Iterations
number
( 000)
( 001)
( 010)
( 011)
( 100)
( 101)
( 110)
( 111)
Total
design
time (sec)
10.61
198989
10.71
586750
5.87
272611
6.11
541099
2.64
118901
4.72
278663
3.35
198162
2.14
274751
0
0
0
0
0
0
0
0
Figure 1. Three-node nonlinear passive network.
The nonlinear elements have been defined by following

dependencies: yn1 =an1 +bn1 (V1 V2 ) ,
2
yn2 =an2 +bn2 (V2 V3 ) .
The vector X includes seven components:
Figure 2. Behavior of the functions W(t) and S(t) for all strategies of
structural basis during the design process.
x12 = y1, x22 = y2 ,
other time interval. We can assume that the area under the
curve -W(t) may be the best way to predict the CPU time, as
important to the behavior of this function at a certain time
range, rather than a specific point. In this case, it makes
sense to introduce a new function defined by the integral of
the function W(t), which will serve as a criterion for
analyzing of dynamic properties for a Lyapunov function.
x32 = y3 , x42 = y4 , x5 = V1 , x6 = V2 , x7 = V3 . The model of

this network (2) includes three equations (M=3) and the
optimization procedure (1) includes seven equations. This
network is characterized by three dependent parameters and
the control vector includes three control functions:
U= (u 1 , u 2 , u 3 ) . The cost function C ( X ) has been
determined as: C( X ) = ( x7 m1 ) + ( x6 m2 ) , where m1
2
and m 2 are the beforehand-defined voltages of the circuit.
The initial approximation: x10 = 1 and xi 0 = 2 , for i = 2,

3,..., 7. The results of the designing process for a complete
structural basis of design strategies are presented in Table I.
The behavior of the function W(t) for all strategies of this
table is presented in Fig. 2.
We can state that the function W(t) is a normalized
derivative and for this reason it is very sensitive. The
behavior of this function for various strategies is non
monotonic, and there are some intersections between the
functions belonging to different strategies as we can see in
Fig. 2. This complicates the identification of the best and the
worst strategies. One of the strategies can be identified as the
best for one time interval, and another strategy is the best for
V (t )
dV 1
dV
V(t )
dt =
= ln
dt V
V
V(0)
V ( 0)
0
t
S(t ) = W(t )dt =
(9)
The behavior of the function S(t) during the design

process is shown in Fig. 2 too. The curves W(t) have
intersections but the curves S(t) do not have intersections.
We can see that all curves corresponding to the function S(t)
are very well regulated as in design time and in absolute
value of this function. There is a correlation between the
function S(t) and a computer time. The strategy that has a
lesser computer time of designing, at the same time it has a
greater value of the function S(t) at any time moment. We
can state that there is a strong correlation between the
behavior of the function S(t) of the process of designing and
a full CPU time of designing.
38
ISSN 977-2177-128009
Next example is a three-stage transistor amplifier that

showed in Fig. 3.
Figure 3. Three-stage transistor amplifier.
Figure 4. Behavior of the functions S(t) for different design strategies of

structural basis for three-stage transistor amplifier.
x12 = y1,
x22 = y2 , x32 = y3 , x42 = y4 , x52 = y5 , x62 = y6 , x72 = y7 ,
x8 = V1 , x9 = V2 , x10 = V3 , x11 = V4 , x12 = V5 , x13 = V6 ,
x14 = V7 . The model of this network (2) includes seven
The vector X
includes fourteen components:
V. CONCLUSION
The problem of the construction of minimal-time
algorithm of designing can be solved adequately on the basis
of the control theory. The designing process in this case is
formulated as the controllable dynamic system. The
Lyapunov function and its time derivative include the
sufficient information to select more perspective design
strategies. The special function S(t) has been proposed to
predict the better designing strategies with a minimal
designing time. These functions can be used as the principal
tool to the prediction of the optimal in time algorithm of
designing.
equations (M=7) and the optimization procedure (5) includes

fourteen equations. The total structural basis contains 128
different design strategies. The control vector includes seven
control functions: U= (u1 , u 2 , u3 , u 4 , u 5 , u 6 , u7 ) .
The results of the designing process for some strategies
from the total structural basis are shown in Table II. The
corresponding dependences of the function S(t) during the
design process are presented in Fig. 4.
This example, as well as all the previous shows an
unambiguous correlation between the behavior of the
function S(t) and total CPU time required to optimize the
circuit. Parallel computing gives us a possibility to compare
all different strategies in real time and select the best
strategies.
Summarizing the results of the analysis can be argued
that the behavior of the Lyapunov function of the designing
process that is calculated as the logarithm of the Lyapunov
function related to the total CPU time that is required to
optimize the circuit.
REFERENCES
[1]
J. R. Bunch, and D. J. Rose, (Eds), Sparse Matrix Computations,

Acad. Press, N.Y., 1976.
[2] N. Rabat, A.E. Ruehli, G.W. Mahoney, and J.J. Coleman, A survey
of macromodeling, Proc. of the IEEE Int. Symp. Circuits Systems,
pp. 139-143, April 1985.
[3] A. George, On Block Elimination for Sparse Linear Systems, SIAM
J. Numer. Anal. Vol. 11, No.3, 1984, pp. 585-603.
[4] A.E. Ruehli, A. Sangiovanni-Vincentelli, and G. Rabbat, Time
Analysis of Large-Scale Circuits Containing One-Way
Macromodels, IEEE Trans. Circuits Syst., Vol. CAS-29, No. 3, 1982,
pp. 185-191.
[5] I. S.Kashirskiy, General Optimization Methods, Izvest. VUZ
Radioelectronica, Vol. 19, No. 6, 1976, pp. 21-25.
[6] V. Rizzoli, A. Costanzo, and C. Cecchetti, Numerical optimization of
broadband nonlinear microwave circuits, IEEE MTT-S Int. Symp.,
Vol. 1, 1990, pp. 335-338.
[7] E. S. Ochotta, R. A.Rutenbar, and L. R. Carley, Synthesis of highperformance analog circuits in ASTRX/OBLX, IEEE Trans. on
CAD, Vol. 15, No. 3, 1996, pp. 273-294.
[8] A. M. Zemliak, Analog system design problem formulation by
optimum control theory, IEICE Trans. on Fundam., Vol. E84-A, No.
8, 2001, pp. 2029-2041.
[9] A. Zemliak, Novel Approach to the Time-Optimal System Design
Methodology, WSEAS Transactions on Systems, Vol. 1, No. 2, April
2002, pp. 177-184.
[10] R. Pytlak, Numerical Methods for Optimal Control Problems with
State Constraints, Berlin: Springer, 1999.
[11] N. Rouche, P. Habets, and M. Laloy, Stability Theory by Liapunovs
Direct Method, Springer-Verlag, N.Y, 1977.
TABLE II. DATA OF SOME STRATEGIES OF DESIGNING FROM TOTAL

STRUCTURAL BASIS FOR THREE -STAGE TRANSISTOR AMPLIFIER.
N Control
vector
1
2
3
4
5
6
7
8
9
10
( 000 000 0)
( 001 010 1)
( 011 100 0)
( 101 010 1)
( 101 110 1)
( 101 111 1)
( 111 011 1)
( 111 110 0)
( 111 111 0)
( 111 111 1)
Iterations
number
2354289
110889
1075433
102510
107541
38751
43387
185085
147094
52651
Total
design
time (sec)
420.181
117.150
272.014
50.211
43.440
12.753
13.891
110.624
66.131
4.782
39
ISSN 977-2177-128009
Digital Multiplexer of an EEG Signal Acquisition

System
Adriana A Izidoro, Eduardo Dias, Fernando A Cardoso and Tales C Pimenta
Universidade Federal de Itajuba
Itajub, Brazil
MUX
Sync
Sync
INTRODUCTION
SAR
Sync
I.
SAR
SAR
Sync
D_out
Sync
D_in
Sync
Interface PC-Memory
Clk
Electrodes
Sync
PC
Sync
AbstractThis paper describes the implementation of a digital

unit multiplexer/buffer of an Electroencephalogram - EEG
signal acquisition system. The module performs both way
communication between the electrode amplifiers and an external
computer. In one way, the unit adjusts on filters and
amplification gain, besides power supply. The processed data is
sent the other way. The multiplexer/buffer provides
identification and forward error correction overhead before
sending to the external computer. The digital system is
implemented in VHDL.
SAR
In order to obtain a reliable Electroencephalogram EEG,

one approach is the amplification of brain signals right on top
of the electrodes. It avoids the effects of noise on the cables of
regular EEG devices. In addition to amplifications, the signal
can also be filtered and converted to digital before it can be
carried out. The digital signal from all electrodes are summed
in a single chip and transmitted out.
We have used the UART to perform communication with

the MUX, as indicated in Fig. 1. The D_in line is used to
receive data from the computer, such as the amplification
adjustment. The D_out line is used to send data from the
electrodes to the computer.
The EEG Multiplexer - MUX, was developed in order to

promote communication between electrodes and the computer,
where the data is processed and displayed. The brain signals
need amplification, but depending on age, race, gender and
genetic factors, the amplification gain may need adjustments,
either to avoid small signals or to avoid amplification
saturation. Therefore, besides sending data to the computer,
the computer can also send data back to the amplifiers in order
to adjust the gain.
The MUX communicates with electrodes using the Sync

structures, via D line. Each electrode has a Sync structure
which communicates by a MUX Sync. It is a simple structure
since it uses only a single wire between MUX and each
electrode. Since there are 23 electrodes, a synchronous system
communication with clk, D_in and D_out would require a
considerable increase in number of wires used. The Syncs to
PC inner interface is implemented in hardware in order to
obtain a simple an error free implementation.
Figure 1. EEG Multiplexer MUX structure.
This article describes the implementation of a MUX and

its bidirectional operation, as indicated in Fig. 1. The project
was developed in VHDL for quick and easy verification [1].
The simulation results are also presented.
II.
III.
If the Sync provides asynchronous communication through

D line, it will be transparent to the MUX system. It means that
the MUX can not provide synchronous data to the PC
independently from the electrodes. In order to evaluate
operation of the Sync, consider one in MUX and another in the
electrode, as indicated in Fig. 2. Due to the use of a pull-up
resistor, D remains normally high, since this line can be used
as input (in) or output (out) during communication. If the
MUX is going to send data, it activates its En line and the
signals present in the inner D line is placed on the outer D
line, and the electrode received the data. It is the same
procedure if the electrode sends a data to the MUX.
THE MULTIPLEXER - MUX
The Universal Asynchronous Receiver/Transmitter

UART controller is the key component of the serial
communications subsystem of a computer. The UART takes
bytes of data and transmits the individual bits in a sequential
fashion. At the destination, a second UART re-assembles the
bits into complete bytes. Serial transmission is commonly
used in modems and for non-networked communication
between computers, terminals and other devices [2].
THE SYNC UNIT
40
ISSN 977-2177-128009
MUX
The transmission principle between Sync units is based on

synchronous serial communication. The Sync unit
synchronizes the data communication line D by sampling it
through clock clk_16X. At every 16 clk_16X clock cycles, it
generates the signals D_sync and clk_sync.
Electrode
Sync-Master
D - InOut
En
D - InOut
RUp
En
RUp
D
The D_sync signal is center aligned by the positive edge of

clk_sync. Based on the circuit presented in Figure 3, the circuit
description is given as:
Vector_In Keep data to be sent to another Sync.

Vector_Out Receive and keep data from another Sync.
Figure 2. Sync Mux x Sync Electrode.
Master/Slave set up unit into Master or Slave mode.
A much better approach is the use of a synchronous

communication, as indicated in Fig. 3. Sync units are
configured as master in MUX and slave in electrodes (masterslave) to avoid conflict, as it is the case of both MUX and
electrode willing to send data. The Master Sync sends and
requests data from slave Sync, thus synchronizing the
electrodes sampling period.
Resquest_Data Signal out used to request data from the

other unit (master Sync). The other unit place data in its
Vector_Out.
Master/Slave
Send_Data Input signal received from the master Sync.

Clk_16x Base clock. It is also used to generate
clk_sync.
Reset Output signal in a slave Sync used to get a new
sample. Input signal in a master Sync, used to generate, at the
end of frame-byte, a reset signal to the electrode.
Vector_In
IV.
Request_Data
Send_Data
Serial/Parellel
Sync
SYNCHRONIZATION
The line D synchronizer uses the Clk_16x that is 16 times

faster than Clk_sync. At every 16 Clk_16x periods, Clk_sync
generates a data bit in D_Sync, as indicated in Figure 4.
Clk_16x
As can be observed, line D is kept high due to the pull-up

resistor, and the Sync units are connected in input mode. When
MUX sends data, it drives line D low (meaning Start_bit) for
16 clock_16x clock cycles. When the electrode detects low D,
it triggers a counter, which counts the numbers of 1 and 0 in
the line. At the end of the 16 clock_16x clock cycles, it checks
if there are 0 in the D line, and if so, it is identified as the first
start_bit in D_sync. Figure 5 depicts the communication
formats.
Reset
Vector_Out
Clk_Sync
Figure 3. Sync block diagram..
Start Bit
D
Samples
Start Bit
Verification
Data Sampling
Clk_16x
Clk_sync
Figure 4. Synchronizing line D.
41
ISSN 977-2177-128009
That sequence allows request_data command to capture

the sample created by send_reset in the previous period, thus
providing enough time for the analog to digital conversion. It
synchronizes all electrodes. The conversion time is critical to
the electrode sampler/converter, since the MUX
communication system is much faster, as indicated by the
timing diagram of Fig. 7.
50 ms
Figure 5. Timing diagram of a frame_byte.
The second bit works the same way, but it is used to

identify read or write mode. High level (1) means read, and
electrode Sync sends the digital data that has been converted.
Low level (0) indicates amplifiers gain adjustment and it is
written in electrode Sync registry that controls the
amplification gain.
Request Data
VII. SENDINGRECEIVING DATA

The MUX unit also has the Interface_PC used to perform
the communication with the external computer. Each Sync unit
has its corresponding memory for read or write operation. The
computer writes the amplifiers gains into the memory, and
once the memory is full, the MUX sends that data to the
electrodes. Thus, after sending send_reset command, MUX
sends data (amplifiers gains) to electrodes.
FRAME_BYTE STATE MACHINE
The frame_byte state machine has clk_sync and D_sync

control signals, and it is comprised of four states, as shown in
Fig. 6. In electrode Sync (configured as slave), if the first bit is
low (start) the communication proceeds to the next bit
(read/write), and then goes to an inout port D. Low level (0) D
is identified as write, and MUX Sync sends the amplifier gain.
Otherwise, it is identified as read (high level) and electrode
Sync sets line D as output mode, since it sends out the data
acquired by electrode.
On the other hand, upon receiving all samples from

electrode, they are saved into the Interface_PC memory, that
generates permission to computer read those data. The
computer is always checking for data. The send and fetch data
from Interface_PC and Sync on electrodes, is conducted in
parallel.
VIII. CONCLUSIONS
Clk_Sync
Wait Start
D_Sync
Stop/Reset
This paper described the implementation of a digital unit

multiplexer/buffer of an EEG signal acquisition system. It is
comprised of a module that performs both way
communication between the electrode amplifiers and an
external computer. It is used to adjust the amplification gain
on the electrodes or to receive data from them. The digital
system is implemented in VHDL and the partial simulations
indicate the system implements the desired operation.
Read_Write
Data
Vector_In
Vector_Out
ACKNOWLEDGMENT
This work was supported by CAPES, FAPEMIG and
CNPq.
D InOut
Figure 6. Frame_byte state machine.
VI.
Send Data
Figure 7. Synchroning samples scheme.
Next are placed the data bits and last bit indicating stop bit
or reset signal to electrode, meaning that the electrode must
sample a new data. The transmitted bits form a frame_byte.
V.
Request Data
REFERENCES
SYNCHRONIZING SAMPLES
[1]
The MUX unit has a synchronous system to request the

acquired data from electrodes. When electrode receives a
send_reset, it acquires a new sample, convert it to digital and
place it in the Vector_in. Immediately after the send_reset
command and within sampling period, which in this case is 50
ms (200 Hz), the MUX sends request_data or send_data
command to the electrode unit.
[2]
[3]
42
Jiang Ning Fan Duo-Wang, The Design of UART(Universal

Asynchronous Receiver Transmitter) based on FPGA/CPLD (200601).
F.
Durda,
Serial
and
UART
Turotial,
$FreeBSD:
doc/en_US.ISO8859-1/articles/serial-uart/article.sgml,
v 1.14 2010/07/31.
He Huizhu Qin Li Zhang Huixin, Design of a UART IP Core Based on
FPGA (National Key Laboratory For Electronic Measurement
Technology, North University of China, Taiyuan, 030051, P.R. China)
(2008).
ISSN 977-2177-128009
S3Proto-mini: Tarjeta de Hardware Libre con FPGA

de encapsulado BGA
Diego Brengi, Salvador Tropea, Matas Parra Visentin, Christian Huy, Rodrigo Melo
Instituto Nacional de Tecnologa Industrial
Centro de Electrnica e Informtica
Laboratorio de Desarrollo Electrnico con Software Libre
Buenos Aires, Argentina. Email: {brengi,salvador}@inti.gob.ar
ResumenEste trabajo presenta el desarrollo de una tarjeta

con FPGA, brindada como hardware libre para aplicacin en
educacin y desarrollo de prototipos. Se comentan las etapas
de planificacin, diseo, manufactura y testeo. La tarjeta est
compuesta por un circuito impreso de cuatro capas y una FPGA
con encapsulado BGA. Las etapas de soldadura del BGA y
posterior inspeccin se realizaron en el laboratorio, habindose
construido exitosamente 3 prototipos. Actualmente se trabaja en
el diseo de un mdulo opcional de entradas y salidas para
prcticas educativas y en una segunda versin, ms compleja,
con capacidad para correr un sistema embebido GNU/Linux.
I.
O BJETIVOS
DEL PROYECTO
43
DE SELECCIN
Marca y familia del dispositivo FPGA
Dispositivo FPGA seleccionado
Se realizaron estimaciones sintetizando algunos diseos de

la Grlib (Procesador Sparc LEON y perifricos asociados
como Ethernet, PCI, VGA, etc.). Estas pruebas arrojaron
como resultado que, para poder implementar un sistema con
suficiente espacio para experimentar y desarrollar, es conveniente disponer de por lo menos 17600 LUT4s (u 8800
slices de Xilinx). De las familias seleccionadas, solamente los
dispositivos XC3S1600E y XC3S1500 (o superior) cumplan
con esta cantidad de recursos lgicos.
Eligiendo la alternativa de menor costo, se seleccion
como componente central una FPGA Spartan 3E, modelo
XC3S1600E-FGG320.
Encapsulado del dispositivo FPGA
Las capacidades necesarias en la FPGA definen en gran

parte el encapsulado a utilizar, ya que los fabricantes mantienen una relacin entre recursos lgicos y pines de E/S
en los modelos ofrecidos al pblico. Para nuestro caso, se
hace casi obligatorio trabajar con encapsulados BGA2 debido
a que el encapsulado ms simple en el que se presentan los
dispositivos con esta cantidad de recursos es el FG/FGG 320.
Este encapsulado BGA posee 320 terminales sin plomo. Es
del tipo fine-pitch, con espaciado de 1mm entre los centros de
cada bolita, y dimetro de las mismas entre 0,5 mm y 0,7 mm,
organizadas en una matriz de 18 x 18 terminales, quitando los
cuatro centrales. El chip ocupa un rea de 19 x 19 mm.
El uso de este encapsulado obliga tambin a utilizar un
circuito impreso multicapa, dado que de otra forma se hace
imposible interconectar adecuadamente el sector del BGA
donde el espacio es poco y el nmero de pads y conexiones
a realizar es alto.
2 Ball
Programmable Gate Array.
C RITERIOS
Se abord el diseo con dispositivos de Xilinx debido a la

experiencia previa del grupo de trabajo con las herramientas
de sntesis, place & route y tarjetas de desarrollo disponibles
con FPGAs de esta empresa. Se tomaron en cuenta las familias
Spartan 3 y Spartan 3E por ser las de menor costo que podan
cumplir los requerimientos, al momento de la seleccin.
III-C.
Para comprender los criterios adoptados y las tecnologas

seleccionadas es necesario conocer primero los objetivos del
proyecto:
Se desea un grado alto de independencia tecnolgica
en cuanto a las herramientas y componentes utilizados
(hardware y software) por lo que se busca, siempre que
sea posible, utilizar software y hardware libre.
Uno de los objetivos es la difusin de las tecnologas
utilizadas.
Atendiendo a los objetivos anteriores, se desea resolver
dentro del pas la mayor cantidad de tareas, intentar
reducir los costos finales de fabricacin y realizar la
experiencia de forma tal que pueda ser documentada y
replicada por otros grupos de trabajo.
Se brindarn los diseos como Hardware Libre una vez
terminado, usando la misma metodologa empleada con
la tarjeta s2proto[9].
1 Field
III-A.
III-B.
I NTRODUCCIN
El proyecto FPGALibre[1][2], impulsado por el INTI, busca

desarrollar y brindar herramientas de software libre y diseos
de hardware abierto y libre[3][4] para trabajar con tecnologas
FPGA1 .
Una de sus lneas de trabajo busca como objetivo final
el crear una plataforma FPGA que pueda alojar un diseo
con un procesador LEON[5][6] y un sistema GNU/Linux[7]
embebido[8]. Se presenta en este trabajo el desarrollo de una
versin inicial e intermedia, que no cumple completamente
estos objetivos, pero permite abordar las tecnologas necesarias
para lograrlos.
II.
III.
Grid Array.
ISSN 977-2177-128009
Figura 2. Diagrama en bloques de los recursos de la tarjeta S3Proto-mini.

Figura 1. Vista superior de la tarjeta S3Proto-mini.
IV.
C IRCUITO
DESARROLLADO
Para abordar el desafo que presenta la tecnologa BGA,

como un paso intermedio, se realiz el diseo de una tarjeta
con menores capacidades que las necesarias para implementar
un sistema con GNU/Linux embebido, pero s utilizando el
chip FPGA del sistema final en un circuito impreso multicapa.
Este primer prototipo se denomina S3Proto-mini (ver Fig.1).
IV-A.
Circuito Impreso
El diseo del circuito impreso se realiz con el software

KICAD[10], una herramienta de software libre[11] con licencia GPL. Se utilizaron cuatro capas para interconectar la red
de alimentacin y las seales necesarias para esta aplicacin.
La separacin mnima utilizada fue de 4 mil (milsima de
pulgada) en unos pocos puntos debajo del BGA.
El trazado de las pistas se realiz manualmente y el circuito
total resultante posee un rea de 7x7 cm.
IV-B.
Caractersticas generales
En resumen, la tarjeta desarrollada posee (ver diagrama de

la Fig.2):
Dispositivo FPGA Xilinx Spartan 3E (XC3S1600E) de
29504 LUT4s.
2 Memorias de configuracin XCF04S (4+4 Mbit).
Transceptor USB TUSB1106 de 12 Mb/s (full Speed) con
conector tipo B.
2 Puertos seriales RS232 de hasta 300 Kbps (ST3232).
Uno con conector DB-9.
4 Pulsadores.
5 dip switch.
4 LEDs.
26 Pines de I/O y un Puerto JTAG.
Oscilador con zcalo.
Alimentacin simple de 5V.
44
Figura 3. Mdulo de alimentacin S3Power.
Dimensiones de 7x7 cm.

IV-C. Mdulo de alimentacin
La fuente de alimentacin se realiza con un mdulo separado llamado S3-Power (ver Fig.3), diseado especficamente
para este proyecto[12] y que puede ser utilizado para cualquier
otro diseo de las familias Spartan 3 y Spartan 3E. El mdulo
est basado en el chip TPS75003 y sigue los mismos criterios
que la tarjeta S3Proto-mini en cuanto a herramientas utilizadas
y facilidad de construccin. Se trata de un impreso doble faz,
de 3x5 cm, que se acopla a la parte posterior de la S3Protomini para proveerla de las siguientes tensiones y corrientes
mximas:
1,25 V / 2,5 A (Vcore)
3,3 V / 2,5 A (Vcco)
2,4 V / 200 mA (Vaux)
V.
V-A.
FABRICACIN
DE LOS PROTOTIPOS
Fabricacin del circuito impreso
Por los criterios de proyecto mencionados anteriormente, la

fabricacin del circuito impreso fue encargada a una empresa
nacional que realiz todo el proceso de manufactura en el
ISSN 977-2177-128009
pas, llegando al lmite de las capacidades actuales en cuanto

a tolerancias y distancias mnimas, especialmente para la
distancia entre vas y pads en la zona del BGA.
Tambin se solicit acabado superficial de nquel oro electroltico, para obtener pads lo ms planos posibles, que no
interfieran con el posicionamiento y la soldadura del BGA, ya
que el proceso tradicional HASL3 no est recomendado para
estos casos.
V-B. Soldadura del BGA
La soldadura del dispositivo BGA se realiz en el
laboratorio[13], siguiendo las recomendaciones de la norma
JEDEC J-STD-020D.1, con una estacin de soldadura por
infrarrojos Modelo Jovy RE-7500, con mesa XY para sujecin
del circuito impreso. Adems se utiliz un registrador de
temperatura para medir el perfil aplicado y luego analizarlo.
Luego de la soldadura del BGA se montaron el resto de los
componentes en forma tradicional.
V-C. Verificacin, prueba y puesta en marcha
Se realizaron varias verificaciones antes de la puesta en
marcha. En primer lugar se analiza el perfil de temperatura
aplicada durante el proceso de soldadura. Tambin se tomaron
placas radiogrficas para observar la soldadura del BGA (ver
Fig.4) y se observ lateralmente el chip con un microscopio
digital de propsitos generales, para comprobar que la hilera
exterior del BGA estuviera correctamente soldada (observando
el brillo y la forma de las bolitas).
Luego se realiz la puesta en marcha (energizar el circuito),
utilizando el GNU JTAG[14] para comandar pines y leer las
entradas de la FPGA .
Finalmente se grab una configuracin de prueba demostrativa para verificar el correcto funcionamiento del prototipo.
VI. T RABAJO FUTURO
VI-A. Mdulo de prcticas
Se est trabajando en el diseo de un mdulo para realizar
prcticas educativas (ver Fig.5) que se acople a las entradas y
salidas de la S3Proto-mini. Este mdulo poseer las siguientes
caractersticas:
3 displays de 7 segmentos.
Conector para dispositivos PS/2.
Encoder rotativo incremental (para control manual).
Salida VGA utilizando un conversor D/A tipo R2R de
dos bits por cada seal de color.
VI-B. Tarjeta S3Proto-full
Por otra parte, se est trabajando en la versin full de
la S3Proto, con capacidad para alojar un procesador LEON3
corriendo un sistema operativo GNU/Linux. Para esto, la
tarjeta debe tener memoria flash, memoria RAM, RAM DAC
y Ethernet, entre otras cosas. Actualmente se estn finalizando
los diseos esquemticos.
Una vez desarrollada la plataforma final se podrn realizar
aplicaciones modelos y dictado de cursos asociados al procesador LEON3.
3 Hot
Air Solder Leveling
45
Figura 4. Radiografa en la zona del encapsulado BGA.
Figura 5. Mdulo de prcticas en actual desarrollo.
VII.
R ESULTADOS Y
CONCLUSIONES
Durante este desarrollo se ha aprendido sobre la utilizacin

de un dispositivo BGA, comenzando con la etapa de diseo
del circuito impreso, pasando por la soldadura del mismo y
terminando con la verificacin y puesta en marcha. Recorrer
este proceso brinda una visin ms amplia sobre el trabajo
con encapsulados BGA y circuitos multicapa, a la vez que
ayuda a mejorar los criterios de diseo. Actualmente ya se
han fabricado exitosamente tres prototipos para uso interno.
Todos los archivos de diseo y la informacin tcnica para
replicar la experiencia se ha publicado en la pgina web del
proyecto FPGALibre, bajo una licencia GPL[15], siguiendo el
concepto de hardware libre, que permite su uso en cualquier
aplicacin y con cualquier propsito. Se estima un costo, por
tarjeta armada, entre 150 y 200 USD, incluyendo el mdulo
ISSN 977-2177-128009
de alimentacin. Este costo es muy variable ya que depende

de la cantidad de componentes que se deba importar y del
tamao de lote que se piense fabricar.
Se han presentado varios grupos interesados en obtener la
tarjeta, y dos alumnos de la Universidad Tecnolgica Nacional
de Crdoba han comenzado a construirla para utilizarla en sus
respectivas tesis finales.
Esta tarjeta resulta interesante para cualquiera con inters
en abordar la tecnologa FPGA, el diseo y la fabricacin de
hardware, manejo de encapsulados BGA y diseo de circuitos
impresos multicapa.
VIII.
AGRADECIMIENTOS
Se agradece a J. P. Laurino de la empresa Inarci S.A. por la

buena atencin recibida, a S. Guberman de la firma Electrocomponentes S.A. por la asistencia inicial en la temtica de
soldadura BGA, y especialmente a William Crease de INTIMecnica por las imgenes radiogrficas.
46
R EFERENCIAS
[1] INTI Electrnica e Informtica et al., Proyecto FPGA Libre, http:
//fpgalibre.sourceforge.net/.
[2] S. E. Tropea, D. J. Brengi, and J. P. D. Borgna, FPGAlibre: Herramientas de software libre para diseo con FPGAs, in FPGA Based Systems.
Mar del Plata: Surlabs Project, II SPL, 2006, pp. 173180.
[3] I. Gonzlez, J. Gonzlez, and F. Gmez-Arribas, "hardware libre:
clasificacin y desarrollo de hardware reconfigurable en entornos
gnu/linux",
http://www.iearobotics.com/personal/juan/publicaciones/
art4/hardware-libre.pdf.
[4] Opencollector.org, "writings on open source hardware", http://www.
opencollector.org/Whyfree/.
[5] Aeroflex Gaisler. LEON processor & Grlib IP-core library. [Online].
Available: http://www.gaisler.com/
[6] J. Gaisler, An open-source VHDL IP library with plug&play configuration, in IFIP Congress Topical Sessions, R. Jacquart, Ed. Kluwer,
2004, pp. 711718.
[7] GNU project, http://www.gnu.org/, Jun. 2010.
[8] A. Muoz, E. Ostua, M. J. Bellido, A. Millan, J. Juan, and D. Guerrero,
Building a SoC for industrial applications based on LEON microprocessor and a GNU/Linux distribution, in 2008 IEEE International
Symposium on Industrial Electronics (ISIE). IEEE, 2008, pp. 1727
1732.
[9] D. J. Brengi, S. E. Tropea, and J. P. D. Borgna, Tarjeta de diseo
abierto para desarrollo y educacin, in 2007 3rd Southern Conference
on Programmable Logic Designer Forum Proceedings, Mar del Plata,
2007, pp. 5760.
[10] J.-P. Charras, "Kicad: GPL PCB Suite", http://www.lis.inpg.fr/realise_
au_lis/kicad.
[11] Free Software Foundation, Inc., "The Free Software Definition", http:
//www.gnu.org/philosophy/free-sw.html.
[12] C. Huy and D. Brengi, Mdulo de alimentacin para placas
con dispositivos FPGA, in Congreso de Microelectrnica Aplicada,
uEA2010. San Justo, Buenos Aires: Universidad Nacional de La
Matanza, 2010, p. 21. [Online]. Available: http://utic.inti.gob.ar/
publicaciones/uEA2010/uea2010_submission_49.pdf
[13] D. Brengi, S. Tropea, M. P. Visentin, and C. Huy, Soldadura,
inspeccin y verificacin, en laboratorio, de un prototipo con chip
BGA, in II Congreso de Microelectrnica Aplicada, uEA2011: Libro
de memorias. La Plata, Buenos Aires: Universidad Nacional de
La Plata, 2011, pp. 95100. [Online]. Available: http://utic.inti.gob.ar/
publicaciones/uEA2011/bgaS3.pdf
[14] (2003, Oct.) Openwince GNU JTAG Tools. [Online]. Available:
http://openwince.sourceforge.net/jtag/
[15] Free Software Foundation, Inc., GNU General Public License, http:
//www.gnu.org/copyleft/gpl.html.
ISSN 977-2177-128009
Implementacin Hardware de un Multiplicador Serial

Basado en Bases Normales sobre GF(2163)
Fernando Aparicio Urbano-Molano
Vladimir Trujillo-Olaya y Jaime Velasco-Medina
Grupo de Ingeniera Telemtica (GIT)

Departamento de Telemtica, FIET, Universidad del
Cauca
Popayn,Colombia
faurbano@unicauca.edu.co
Grupo de Bionanoelectrnica
Escuela EIEE, Universidad del Valle
Cali, Colombia
{vlatruo, jvelasco}@univalle.edu.co
embargo, la multiplicacin es una desventaja debido a que

requiere mayor tiempo de computacin. En la literatura
existen varios algoritmos para realizar la multiplicacin, pero
la gran mayora son orientados para implementaciones en
software, la cuales requieren mayor tiempo computacional que
las implementaciones en hardware.
Varios trabajos relacionados con la implementacin de
diferentes arquitecturas hardware de multiplicadores en el
campo finito binario se han presentado. En [4] comparan tres
multiplicadores seriales: Berle-Kamp, Massey-Omura y un
multiplicador en bases polinomiales, los cuales fueron
implementados para pequeos campos finitos GF(28) en VLSI.
En [5] consideran la implementacin en VLSI de
multiplicadores paralelos sobre GF(2m) con grados de
extensin m= 8, 16, 24 y 32 que no son primos. En [6] se
presenta una modificacin del algoritmo Massey-Omura. Sin
embargo estos multiplicadores son complejos de implementar,
ya que la multiplicacin en bases normales est determinada
por el producto cruz de los trminos de GF(2m). Cuando m
crece, la complejidad del espacio es proporcional a m2. En [7],
se presenta la implementacin de multiplicadores paralelos a
nivel de dgito sobre GF(2163) usando bases normales
Gausianas y en [8] el diseo de multiplicadores en bases
polinomiales sobre GF(2233).
Abstract This paper presents the design of a serial multiplier

based on normal basis for elliptical curve cryptosystems over
GF(2163). The design is described using structural/behavioral
VHDL and synthesized on the EP2S60F1020C3 and
EP3S150F1152C2 using Quartus II v11.0 sp1. The simulation and
in-system hardware verification results show that the designed
multiplier presents a good area throughput trade-off and it is
suitable to be embedded into a SoC cryptosystem.
Keywords- elliptical curve cryptosystem, FPGA, Galois Field
multipliers, Normal basis, Hardware Implementation.
I.
INTRODUCCION
as implementaciones de las operaciones aritmticas

sobre los campos finitos determinan el desempeo en
muchas aplicaciones como la teora de codificacin y los
criptosistemas de clave-pblica, en particular los criptosistemas
de curvas elpticas (ECC) propuestos por Koblitz [1]. Aunque
todos los campos finitos de la misma cardinalidad son
isomrficos, la eficiencia de la aritmtica depende de la
seleccin de la base usada para la representacin del elemento
finito. Los criptosistemas ECC estn definidos en los
estndares y ANSI, y su principal ventaja sobre otros sistemas
de clave pblica como RSA es la utilizacin de parmetros de
tamao menor pero con el mismo nivel de seguridad
computacional [2]. Entonces teniendo en cuenta la ventaja
anterior, los criptosistemas ECC se utilizan en aplicaciones
donde los recursos de computacin son limitados, tales como
Smart cards y telfonos celulares.
Teniendo en cuenta lo anterior, en este artculo presentamos

el diseo e implementacin de un multiplicador serial para
bases normales sobre GF(2163) basado en el algoritmo
propuesto en [9]. Este multiplicador es descrito usando VHDL
estructural/genrico y sintetizado sobre los FPGAs
EP2S60F1020C3 y EP3S150F1152C2.
Las bases ms usadas para representar la aritmtica de

campo finito son las Bases Normales (NB) y las Polinomiales
(PB). Las bases normales son ms adecuadas para las
implementaciones en hardware que las bases polinomiales
debido a que las operaciones aritmticas en la representacin
de las bases normales estn principalmente basadas en la
rotacin, el desplazamiento y la adicin. Esta ltima operacin
es basada en compuertas OR exclusivas, las cuales son
eficientemente implementadas en hardware [3].
Tambin, otra ventaja de las bases normales es que la
operacin de elevar al cuadrado se implementa con una
rotacin a la derecha de la representacin binaria. Sin
II.
En esta seccin se presenta una descripcin detallada del

algoritmo de multiplicacin presentado en [9]. En este caso la
multiplicacin secuencial es descrita por la ecuacin 1.
A.
Multiplicador Serial
La multiplicacin secuencial es descrita por la ecuacin 1.
Se tiene que C =
m 1
c
i =0
Este proyecto fue patrocinado por el Programa Universitario de Altera.
ALGORITMO DE MULTIPLICACIN USANDO

BASES NORMALES
= AB y se puede representar por la
ecuacin (1).
47
ISSN 977-2177-128009
m 1 m 1
cs = ai + s b j + s (ij0 ) = ai + s b j + s ij = ai + s ij b j + s (1)
i, j
i, j
j =0 i =0
Se define un elemento
X mt tienen el mismo trmino ai s en sus sumandos. En

otras palabras, de (2), se obtiene
m1
m1
m1
x s +t ,mt = ai + s +t i , t bs = ai + s i t , t bs = ai + s it bs (7)
i
=
0
i
=
0
i
=
0
xst , para 0 s y t m 1 , en
GF (2m ) como
m1
xst = ai+ s it bt + s ,
i =0
Entonces la t-sima columna del vector
X t de X es
Por lo tanto
X t = (x0t , x1t ,..., xm1,t )
donde
(x
0t
(x
0t
donde la tercera expresin se obtiene desde la

reorganizacin de la sumatoria sobre el subndice i y la ltima
se obtiene cuando ij = i j , j .
(2)
(3)
m1
, x1t ,..., xm1,t ) es la transpuesta del vector fila

T
i =0
, x1t ,..., xm1,t ) . Adems la suma de todos los vectores
columna
TABLA I.
(c0 , c1 ,..., cm1 )T ,

Ya que
x
t =0
st
ENTRADAS: A, B
(4)
m1
A = ai i
i =0
m1
B = bi i
i =0
Para t = 0 hasta m 1
y s ,s+t + Ds+t Ds+t +1,
(8)
Fin Para.
Para todo 0 s m 1 .
X ,..., X 3 , X 1 , X m1 , X m3 ,..., X m , X 1 ,..., X 2 , X 0 , X m2 ,..., X m +1 (5)
Despus de la msima iteracin, se tiene D = c para todo

i
i
m 1
0 i m 1 , donde AB = ci i .
y cuando es par , Y se define como:
i =0
En el primer ciclo de reloj (t = 0), los valores de
X ,..., X 2 , X 0 , X m 2 ,..., X m , X 1 ,..., X 3 , X 1 , X m 1 , X m 3 ,..., X m +1 (6)
Entonces la suma de todos los vectores columna
Ds +1 = Ds + yss son calculados simultneamente para todo

0 s m 1 , por ejemplo, D1 = y00 , D2 = y11 , ,
D0 = ym 1, m 1 . Cuando t = 1, los valores de
Yt ,
0 t m 1 de Y . Donde con Yt = ( yot , y1t ,..., ym1,t )
es el mismo que la suma de todos los vectores columna
0 t m 1 , de X los cuales son (c0 , c1 ,..., cm1 ) .
Ds +2 = Ds +1 = Ds +1 + ys ,s +1
son
calculados
simultneamente para todo 0 s m 1 , por ejemplo
Xt
Con el objetivo de realizar una arquitectura hardware del

algoritmo del multiplicador presentado en la Tabla I, se
D2 = D1 + y01 = y00 + y01,

D3 = D2 + y12 = y11 + y12 ,..., D1 = D0 + ym1,0 = ym1,m1 + ym1,0 .
Finalmente, en el m-simo ciclo (t = m - 1), los valores de
Ds = Ds1 + ys ,s1 son calculados simultneamente. Esto es
calcula la suma de los vectores de diagonales de Y

desplazados, en lugar de calcular la suma de los vectores
columna de Y . Esto puede hacerse debido a las siguientes
consideraciones. En la expresin de la matriz Y , hay
Xt
(2 )
D0 , D1 ,..., Dm1 0
X como sigue:
exactamente t 1 columnas entre los vectores
( )
GF 2m
Se cargan en los registros de m-bits respectivamente.
Con el propsito de reducir la complejidad de las

compuertas del multiplicador, los vectores columna X t , son
reorganizados y se reutilizan las sumas parciales en el clculo.
Sea m 1 = 2 y Y = ( y st ) una matriz m m definida por
la permutacin de los vectores columna de
cuando es impar, Y se define como:
ALGORITMO DE MULTIPLICACION
SALIDAS: C GF
= cs .
Adems, la s-sima entrada de
en su expresin, por lo tanto ahorrar el nmero de
compuertas XOR durante el clculo de AB.
X t' , t = 0,1,..., m 1 , es exactamente
m1
i + s it
X st y X s +t ,mt tienen el mismo trmino
D0 = Dm1 + y0,m1 = y00 + y01 + ... + y0 ,m1 = c0
X t y X mt .
D1 = D0 + y10 = y11 + y12 + ... + y10 = c1
......
......
y la s + t-sima entrada de
48
ISSN 977-2177-128009
Dm1 = D m2 + ym1,m2 = ym1,m1 + ym1, 0 + ... + ym1,m2 = cm1
(9)
En otras palabras, para una s dada, el valor final

calcula secuencialmente en el siguiente orden
Ds se
todos los
Por consiguiente se necesita una AND y ninguna XOR para
Ds + 2
Ds + yss ,0 s m 1, en la ecuacin (12) se necesita una

XOR para cada 0 s m 1, la complejidad total de
ys1,s y yss ,
0 s m 1 , son de la misma columna yss de la matriz

Y . Debido a que Y se obtiene por una permutacin de una
columna de una matriz X , se concluye que y s1,s = xs1,s' y
compuertas del multiplicador es m ANDs ms como mucho
m+
yss = xss' para algunos s ' en funcin de s . Por otra parte

desde (2), se obtiene
Lo cual implica que
m 1
(k 1) XORs [9].
2
III.
DISEO DEL MULTIPLICADOR SECUENCIAL

USANDO BASES NORMALES.
(11)
A.
xs1,s' (= ys1,s ) se obtiene por una
rotacin cclica de una posicin de los vectores

la expresin
xss' con s ' = 0. Debido a que la suma
calcular
m1
m1
xss' = ai+ s is' bs' + s , y xs 1,s' = ai+ s is ' bs' + s 1

i =0
i =0
ms
m 1
(k 1) XORs.
2
'
Cuando s = 0, entonces el nmero de entradas no cero de
i 0 , para 0 i m 1 , es uno porque 0 = 2 = 1.
Ds +1
}
m 1
Ds = yss + ys ,s+1 + y s ,s+2 + ... + ys ,s+i = y s ,s+i = cs (10)
1424
3
i =0
Desde la ecuacin 9 se observa que
yss = xss' con s ' 0 son m 1 ANDs,
Arreglo del Multiplicador

Sea
una raz primitiva de orden p = 2 m + 1 = 11 de la
GF (210 ) y sea = + 1 un elemento normal

5
ptimo de tipo II en GF ( 2 ) . Los trminos de la
unidad en
ai s y bi s de
xs ,s ' (= ys ,s ) . Debido a que esto puede hacerse
C = ci i
A = ai i
sin ningn costo extra desde el punto de vista del hardware,

todas las compuertas necesarias para implementar el circuito
desde el algoritmo presentado en la Tabla I, son suficientes
para calcular el primer ciclo de reloj (por ejemplo, t = 0) del
paso 2 del algoritmo,
(12)
Ds +1 = Ds + yss ,0 s m 1
multiplicacin
Recordando que, para cada s, hay una correspondiente s

(debido a una permutacin) tal que
c1 = (a4 + a0 )b3 + a2b1 + (a2 + a3 )b4 + (a1 + a4 )b2 + (a3 + a0 )b0
m1
yss = xss' = ai +s is' bs' + s

i =0
Por ejemplo si
de
i =o
i =o
B = bi i , para 0 i 4, son
i =o
c0 = ( a3 + a 4 ) b2 + a1b0 + ( a1 + a 2 ) b3 + ( a 0 + a3 ) b1 + ( a 2 + a 4 ) b4
(13)
s 0, y tambin xss' no est en la columna

'
En este caso, dos registros de desplazamiento son

requeridos para calcular C = AB usando una base normal
0 de X , entonces desde (2) y (7), se encuentra que las

compuertas XOR necesarias para calcular
xss' y xs+s' ,ms'
( )
GF 2 m para m = 5 . Las
Figuras 1 y 2 muestran el arreglo y la celda Ri del
ptima (ONB) de tipo II en
(que son las entradas de la matriz Y ) pueden compartirse. Se
m1
xss' = ai +s is' bs' + s puede calcularse

i =0
mediante una AND y al menos k 1 XOR, ya que la matriz

de multiplicacin (ij ) de una base normal de tipo k tiene al
multiplicador diseado.
puede observar que
Los productos parciales subrayados son los primeros en ser

calculados. Adems, las entradas de la diagonal (desplazada)
tienen los trminos en comn.
menos k entradas no cero para cada columna (fila). Por lo

tanto el nmero total de compuertas necesarias para calcular
49
ISSN 977-2177-128009
163
cuerpo finito GF (2 ) utilizando Modelsim Altera Starter

Edition.
La Figura 4 presenta los resultados de simulacin y estos
son comparados con los obtenidos funcionalmente en Matlab,
tal como se muestra en la Tabla II. En este caso, el vector de
entrada es A.
TABLA II. RESULTADOS DE SIMULACIN PARA EL MULTIPLICADOR SOBRE
GF(2163)
A
A2
A3
A4
Figura 1. Arreglo del multiplicador usando una ONB de tipo II en
1A02580C3AE101408D8A1829E72D1721968282A8B
3404B01875C202811B143053CE5A2E432D0505516
398C19165805858E6EA9764A669EC0A1B598E8370
68096030EB840502362860A79CB45C865A0A0AA2C
GF (25 )
para m = 5.
(a) Resultado de la simulacin para A*A = A2
Figura 2.
Bloque de la celda
Ri
(b) Resultado de la simulacin para A*A2 = A3
B.
Diseo del Multiplicador

La arquitectura hardware del multiplicador se basa en un
arreglo y un contador, tal como se muestra en la Figura 3. El
contador genera la seal de control Fin para finalizar la
multiplicacin despus de 164 ciclos de reloj y el arreglo del
multiplicador realiza los productos parciales. El multiplicador
tiene cinco entradas y dos salidas. Las entradas son: En, seal
para habilitar de todos los registros del multiplicador; clk, la
seal de reloj; load, seal para cargar los datos a multiplicar;
reset, seal para borrar todos los registros del multiplicador; A
y B, son los datos de 163 bits. Las salidas son: C, el resultado
y Fin, la seal para indicar cuando est lista la multiplicacin.
(c) Resultado de la simulacin para A*A3 = A4
(d) Resultado de la simulacin para A2*A2 = A4

Figura 4. Resultados de las simulaciones y verificaciones del multiplicador
de Kwon.
Figura 3. Diagrama de bloques del Multiplicador.
IV.
Con el propsito de conocer la relacin entre el campo (m)

y el rea del multiplicador, en la Figura 5 se muestran los
recursos usados de acuerdo al campo binario m, es decir la
cantidad de ALUTs y registros utilizados por el multiplicador.
RESULTADOS DE SIMULACIN Y SNTESIS.
Con el propsito de verificar el funcionamiento del

multiplicador se realizaron diferentes simulaciones en el
50
ISSN 977-2177-128009
intensivamente para diferentes vectores de prueba y los

resultados obtenidos fueron comparados con los resultados de
los scripts realizados en Matlab. Ambas implementaciones,
hardware y software generaron los mismos resultados.
Relacin entre m y el rea del multiplicador
rea (ALUTs-Registros)
600
500
El trabajo futuro, estar orientado a implementar nuevos

algoritmos de multiplicacin, en particular a nivel de dgito y
un criptoprocesador basado en curvas elpticas sobre GF(2163)
y GF(2233).
400
300
ALUTs
200
Registros
100
AGRADECIMIENTOS
0
5
11
23
69
Campo Binario m GF(2^m)
Este trabajo ha sido patrocinado por Altera Corporation a

travs del programa universitario.
F. A. Urbano-Molano agradece al Departamento de
Telemtica y al Grupo de Ingeniera Telemtica de la
Universidad del Cauca por brindar el tiempo para poder
concluir satisfactoriamente este proyecto.
163
Figura 5. Relacin entre el campo binario m y el rea del multiplicador
En la Tabla III se presentan los resultados de la sntesis

sobre el FPGA EP2S60F1020C3, con el objetivo de comparar
los resultados de [7] y [10]; Debido a que no es posible realizar
comparaciones entre diseos realizados con diferentes
herramientas y FPGAs. Como se puede observar, nuestro
diseo es mucho ms rpido y los recursos hardware utilizados
son mucho menores, manteniendo una buena relacin rea
velocidad.
REFERENCIAS
[1]
N. Koblitz, Elliptic Curve Cryptosystems, Mathematics of

Computation, vol. 48, Number 177, pp. 203209, January 1987.
[2] V. Trujillo, J. Velasco, J. Lpez, Multiplicador en el Cuerpo Finito
GF(2163) usando Bases Normales Gaussianas, Grupo de
Bionanoelectrnica. EIEE. Universidad del Valle. X Workshop Iberchip,
vol.1, fasc.1, 2004, p.1002 - 1012.
[3] T. F. Al-Somani and A. Amin. Hardware Implementations of GF(2m)
Arithmetic using Normal Basis, Journal of Applied Sciences, Vol 6,
Issue 6, 3rd ed., vol. 2. Asian Network for Scientific Information, 2006,
pp.13621372.
[4] I. S. HSU, T. K. Truong, L. J. Deutsch and I. S. Reed, A comparison of
VLSI architecture of finite field Multiplers using dual, normal, or
standard basis, IEEE TRansactions on Computers, vol. 37, no. 6, june
1988, pp.735339.
[5] C. Paar y N. Lange, A comparative VLSI Synthesis of Finite Field
Multipliers, in Proceedings of the 3rd International Symposium on
Communication Theory & Application, Lake District, UK, July 1995.
[6] A. Reyhani-Masoleh and M. A. Hassan, A New Construction of
Massey-Omura parallel multiplier over GF(2m), IEEE Transactions on
Computers, Vol. 51, No. 5, PP. 511-520, May 2002.
[7] P. C. Realpe, V. Trujillo y J. Medina, "Implementacin de un
multiplicador paralelo a nivel de digito sobre GF(2163) usando bases
normales gaussianas" En: Per. 2007. Evento: XIII Workshop Iberchip,
Editorial Hozlo Srl , p.253 - 256 , v.1 <, fasc.1.
[8] V. Trujillo, J. Medina y J. Lpez. "Design of polynomial basis
multipliers over GF (2233)" En: Per. 2007. Evento: XIII Workshop
Iberchip, Editorial Hozlo Srl , p.257 - 260 , v.1 <, fasc.1.
[9] S. Kwon, K. Gaj, C. H. Kim and C. P. Hong, Efficient Linear Array for
Multiplication in GF(2m) using Normal Basis for Elliptic Curve
Cryptography, En: M. Joye and J. J. Quisquater (Eds.): CHES 2004,
LNCS 3156, pp. 76-91, 2004.
[10] V. Trujillo, J. Medina y J. Lpez. "Design of Gaussian Normal and
Polynomial Basis Multipliers over GF(2163)" En: Costa Rica. 2006.
Evento: XII Workshop IBERCHIP Ponencia:Design of Gaussian
Normal and Polynomial Basis Multipliers over GF(2163) Libro: Diseo
De Un Algoritmo En Hardware Para La Sincronizacin de Portadora, ,
p.174 - 176 , v.1 <, fasc.1
TABLA III. RESULTADOS DE LA SNTESIS PARA EL MULTIPLICADOR SOBRE

GF(2163) USANDO EL FPGA EP2S60F1020C3
ALUTS
REGISTROS
FMAX(MHZ)
TIEMPO DE
TOTALES
OPERACIN
Este paper
175
497
385.80
425.088 ns
NO REPORTA
[7]
903
652
842.4 ns
NO REPORTA
[10] Diseo 1
974
128.34
1.704 us
NO REPORTA
[10] Diseo 2
1249
178.13
1.67 us
NO REPORTA
[10] Diseo 3
1292
276.55
1.72 us
NO REPORTA
386.12
1.74 us
[10] Diseo 4
1269
En la Tabla IV se presentan los resultados de la sntesis

sobre el FPGA EP3S150F1152C2. Al utilizar un FPGA con
mejores caractersticas, se puede concluir que hay una mejora
considerable en cuanto a la velocidad de operacin,
manteniendo la misma rea.
TABLA IV. RESULTADOS DE LA SNTESIS PARA EL MULTIPLICADOR SOBRE
GF(2163) USANDO EL FPGA EP3S150F1152C2
ALUTS
REGISTROS
FMAX(MHZ)
TIEMPO DE
TOTALES
OPERACIN
175
497
509.42
321.932 ns
V.
CONCLUSIONES.
Este artculo presenta la implementacin en hardware del

algoritmo de multiplicacin presentado en [9] usando bases
normales sobre varios campos binarios m. El multiplicador
presenta una buena relacin rea - velocidad y puede ser usado
en los criptosistemas basados en curvas elpticas para soportar
aplicaciones como smart cards y telfonos mviles. Los
multiplicadores fueron sintetizados sobre los FPGAs
EP2S60F1020C3 y EP3S150F1152C2 usando Quartus II 11.0
sp1 de Altera. La implementacin fue verificada
51
ISSN 977-2177-128009
Electromagnetic blooming by vectorial laser irradiation

in semiconductive nanostructures
C. Torres-Torres
Seccin de Estudios de Posgrado e Investigacin, ESIME-Z
Instituto Politcnico Nacional
Mxico City, Mxico
crstorres@yahoo.com.mx
AbstractThe modification in the irradiance profile of a

photonic signal in propagation through a typical lowdimensional semiconductor structure is described. It is predicted
a strong blooming effect associated to the change of absorptive
and refractive nonlinearities when interference or consecutive
high intensity two-wave mixing are interacting into the sample.
I.
II.
The electric field of an optical wave can be represented as,
E x E0 x exp ikx ,
(1)
here E0 is the amplitude of the electric field; k
INTRODUCTION
represents a component of the wave vector; x indicates the

position and is the optical wavelength of the wave. For a
gaussian beam,
The outstanding features of low dimensional

semiconductor structures have originated numerous scientific
researches in order to improve their electrical, magnetic and
optical properties. It is well-known that various silicon doped
materials have been proposed to be suitable as a platform for
implementing diverse electronic circuits [1]. Different
schemes using advanced materials have been recommended
for designing passive and active systems [2]; but nowadays
the development of optoelectronic configurations with
capabilities for controlling high-speed signals seems to be in a
promising progress [3]. Microelectronic circuits based on
nanostructures can handle more information data in a shorter
time in comparison than electronic circuits based in bulk
materials, and with the benefits of plasmonic effects,
nanoelectronic systems give the impression to be appropriate
for ultrafast special functions [4]. It has been demonstrated
that silicon nanocrystals exhibit an important nonlinear optical
response [5]; and there is also a great interest for finding an
enhancement in their optoelectronic properties for waveguiding [6], instrumentation of signals [7], luminescent
properties [8], and mechanical features [9]. Furthermore, the
study of the close relation between collective phenomena
where optical and electronic characteristics of silicon
nanocomposites are involved looks to be motivating. Within
this work, we investigate the modification of the irradiation
profile of a beam through propagation in typical
semiconductor nanoclusters embedded in a dielectric film with
strong third optic nonlinearities. An interferometric technique
based on vectorial two-wave mixing for the modification of
the beam profile is presented.
MATHEMATICAL FORMULATION
x m / 2
x
E0 x exp
(2)
where x represents the full width at the half maximum

(FWHM) of the gaussian profile and mx is a constant for a
displacement of the function.
Two coherent waves with the same frequency which
interfere in a medium can be represented with their
components of linear polarization as,
(3)
E1 E1x iE1 y ,
E2 E2 x iE2 y ,
(4)
here i 1 representing the different phase of the waves,
E jx and E jy are the electric field components of the wave j in

the x and y axis, respectively.
The total intensity that results from the interaction of
the waves in the medium can be written,
1/ 2
2
0
I 2n E1 E2 ,
(6)
0
and the radiation pressure P that is consequence of the
absorbed energy in a defined area A, is equal to,
52
I
,
A
(7)
ISSN 977-2177-128009
, that are initially
incident and self-diffracted waves at the surface of the

sample. Jm((1)) stands for the Bessel function of order m and
2
n0 4
4 2 z
0

E j ...
A
2 j 1
n0
generated by the incident waves E1 and E2 are considered.
One of these gratings results from the nonlinear optical

absorption, while the other one results from the birefringence
induced by the optical Kerr effect during the interaction of
the waves into the media. We can express both gratings in
terms of the incident and self-diffracted waves as:
2
2 D
E1 E 2 E 3 E 4 ,
4 2 D
A E1 E 2 E 3 E 4
n 0
2
A B E1 E 2 E 3 E 4 ,
...,
(10)
where is the nonlinear absorption coefficient,

3
2
n 4
(15)
A B 0 E j ,
2 j 1
n0 3 4
4 2 z
1
A
E j E k* ...
n0
2 j 1 k 2
n 3 4
A B 0 E j E k* , (16)
2 j 1 k 2
are the phase increments.
(9)
III.
is the
wavelength, and A Re 6 1122 and B Re 6 1221 are
the components of the third-order susceptibility tensor, (3),

for an isotropic material [11].
For an important contribution of the waves E3 and E4 to

the interaction, which is the case for a media with high values
of refractive and absorptive nonlinearities, the amplitude
transmittance function can be written as:
I z ,
(11)
T x , z x , z exp
with
(12)
x, z K .
the polarization of E1 is rotated with a half-wave plate for each

numerical simulation. In fig. 1 are shown 10 different
interactions of these beams calculated with (1-7); each
curve is
associated to a different angle of polarization for E1 .
13
x 10
8
Radiation pressure [Pa]
linear and nonlinear absorption coefficients, respectively, z is

the length of the propagation of the wave through the
nonlinear media and I is the total optical irradiance. It is
possible to calculate the electric field of the transmitted and
self-diffracted waves by means of the Fourier transform of
the product between the amplitude transmittance function
T x and the incident field E . The electric fields thus
0
90
2
60
1.5
1
30
E1 z E10 J 0 1 iE20 iE30 J1 1 ...
I z ,
(13)
E40 J 2 1 exp i 0
0.5
Angle of polarization [DEG]
RESULTS AND DISCUSSION
Consider the interference of two gaussian beams with

linear polarization and total intensity I= 2GW/cm2. The
relation of intensities of the two beams is 1:1, and A=1cm2.
The polarization of E2 is fixed in the x axis, i.e. E2 y 0 , and
In our case, I o I , where o and are the
calculated are:
I z ,
(14)
E30 J 2 1 exp i 0
where E1(z) and E2(z) are the complex amplitudes of the

circular components of the transmitted waves beams, E3(z)
and E4(z) are the amplitudes of the self-diffracted waves,
0
while E 1
, E 20 , E 30 and E 40 are the amplitudes of the
The circular components of the incident electric field E can

be expressed as:
(8)
E E E ,
Two induced gratings, and
E2 z E20 J 0 1 iE40 iE10 J1 1 ...
According to the previous analysis made at reference [10],

within this work is investigated the nonlinear optical
absorption and refraction in a thin media, with thickness D,
where two incident, and their self-diffracted, waves interact.
lenght of the sample [cm]
Fig. 1. Pressure radiation vs. angle of polarization vs.

length of the sample.
53
ISSN 977-2177-128009
Intensity [MW/m2]
Intensity [MW/m2]
Calculations with (8-16) were performed taking into

account the numerical results presented in fig. 1 and a twowave mixing resulting from the interaction of the beams in
propagation through silicon quantum dots with a typical
nonlinearity of index of refraction close to n2=110-15 m2/W,
and =110-10 m/W. The change in the beam profile is
exhibited as a strong blooming that is originated by the selffocusing effect and the scattering features associated to the
absorptive response of the sample.
IV.
The manifestation of optical blooming resulting for a

vectorial multi-wave mixing is calculated within this work.
Results of strong modifications in the propagation of optical
waves were estimated and it seems that the nonlinear optical
response related with the sample can easily manipulated the
irradiance profile by vectorial two-wave mixing in order to
control their optical and electronic response. The effects can
be explained by the vectorial theory of multi-wave mixing. It
is stated that a multi-beam interference can generate a
modulation of the nonlinear absorption and nonlinear
refractive index. The modification of the irradiance beam
profile poses potential applications in modulation, switching
and filtering of optical signal by means of nonlinear optical
effects.
1
0.5
0
20
10
-10
-20
-20
-10
10
ACKNOWLEDGMENT
20
This work was partially supported by IPN through grant

SIP20120691; from COFAA-IPN, from ICyT-DF through
grant PIUTE10-129 and from CONACyT through grant
82708.
1
0.5
0
20
CONCLUSION
REFERENCES
[1]
10
-10
-20
-20
-10
10
20
Fig. 2. Irradiance profile of the beams, before (above) and

after (below) the nonlinear interaction.
Figure 2 illustrates a resulting optical blooming
higher than the 10% of the irradiance obtained through
propagation originated by self-focusing for silicon quantum
dots. This can be explained because the absorptive grating
generated by the two-wave mixing induces a birefringence,
diffraction and scattering phenomena. These effects seem to
be useful for potential applications in development of
ultrafast processing of signals. In fig. 3 is shown a change in
the optical gain that can be expected for a third optical beam
interacting with 1W of power with the proposed two-wave
mixing interaction.
1
Gain
0.8
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
Beam power [Watts]
Fig. 3. Gain vs power.
54
R. Soref, R. E. Peale, and W. Buchwald, "Longwave plasmonics on

doped silicon and silicides," Opt. Express 16, 6507-6514, 2008.
[2] T. Y. Liow, K. W. Ang, Q. Fang, J. F. Song, Y. Z. Xiong, M. B. Yu, G.
Q. Lo, and D. L. Kwong, Silicon modulators and germanium
photodetectors on SOI: Monolithic integration, compatibility, and
performance optimization, IEEE Sel. Top. Quantum Electron.16(1),
307315, 2010.
[3] S. Assefa, F. Xia, and Y. A. Vlasov, Reinventing germanium
avalanche photodetector for nanophotonic on-chip optical
interconnects, Nature 464, 8084, 2010.
[4] D. Li and C. Z. Ning, "All-semiconductor active plasmonic system in
mid-infrared wavelengths," Opt. Express 19, 14594-14603, 2011.
[5] C. Torres-Torres, A. Lpez-Surez, L. Tamayo-Rivera, R. RangelRojo, A. Crespo-Sosa, J. C. Alonso, A. Oliver, Thermo-optic effect
and optical third order nonlinearity in nc-Si embedded in a siliconnitride film, Opt. Express 16, 18390-18398, 2008.
[6] Gong-Ru Lin, Cheng-Wei Lian, Chung-Lun Wu, and Yung-Hsiang Lin,
"Gain analysis of optically-pumped Si nanocrystal waveguide
amplifiers on silicon substrate," Opt. Express 18, 9213-9219, 2010.
[7] A. Martnez, J. Blasco, P. Sanchis, J. V. Galn, J. Garca-Ruprez, E.
Jordana, P. Gautier, Y. Lebour, S. Hernndez, R. Spano, R.Guider, N.
Daldosso, B. Garrido, J. Marc Fedeli, L. Pavesi and J. Mart, Ultrafast
All-Optical Switching in a Silicon-Nanocrystal-Based Silicon Slot
Waveguide at Telecom Wavelengths, Nano Lett. 10, 15061511,
2010.
[8] D. Timmerman,J. Valenta,K. Dohnalov,W. D. A. M. de Boer,T.
Gregorkiewicz, Step-like enhancement of luminescence quantum yield of
silicon nanocrystals, Nature Nanotechnology, 6, 710-713, 2011.
[9] C. Torres-Torres, A. Lpez-Surez, R. Torres-Martnez, A. Rodriguez,
J. A. Reyes-Esqueda, L. Castaeda, J. C. Alonso and A. Oliver,
Modulation of the propagation speed of mechanical waves in silicon
quantum dots embedded in a silicon-nitride film, Opt. Express, in
press, 2012.
[10] Y. Zhu, F. Zhang, J. Yang, H. Zheng and F. Yang J, Stability of
Mechanical Properties for Submicrometer Single-Crystal Silicon
Cantilever Under Cyclic Load, Microelectromech Syst 20, 178-183
2011.
[11] R.W. Boyd. Nonlinear Optics, Academic Press, San Diego, 1992.
ISSN 977-2177-128009
Experiences Teaching a Rapid System Prototyping

Class
Andres F. Marquez, Ramiro Jordan, and Wilfrido A. Moreno
AbstractThis paper describes the experiences learned from

teaching the rapid system prototyping class at the University
of South Florida, and the computer logic design class at the
University of New Mexico. In it, students are required to
implement several assignments on a Spartan-3E FPGA from
Xilinx using VHDL and the ISE Design tools from the same
vendor. The main challenge encountered in the class is getting
the students to think in terms of hardware and circuit design
as opposed to sequential programming. This document describes
the structure of the class, hardware and software development
tools, the kind of assignments for the students, several functional
final projects, and ideas for future improvements.
I. I NTRODUCTION
Rapid system prototyping has evolved into a complex multidisciplinary area involving many layers of system design. Not
only hardware and software skills, but also appropriate system
specification and modeling are required to implement a working prototype which satisfies original design requirements.
Embedded systems are currently used in a large variety of
industries confirming their great importance. As system complexity increases, design tools and debugging techniques must
evolve to keep up with strict time to market and performance
needs [1]. This paper describes the rapid system prototyping
course taught at the University of South Florida, and the
computer logic design class at the University of New Mexico.
It is designed with the intention of providing the student
with theoretic concepts as well as practical system design
and implementation. From beginning to end the student is
encouraged to understand the topics in terms of their location
within the whole system to help clarify their specific functions
and interactions. Many evaluation boards are available from
different vendors and with different set of peripherals, some
of them even tailored to specific markets (what Xilinx refers
to as targeted platforms). We selected Xilinxs Spartan-3E
Starter Kit due to the wide variety of included peripherals such
as ROM, RAM, LCD, LEDs, VGA, and UART which give
the student and the professor great flexibility for the project
assignments and examples. FPGA-based design on Xilinxs
Spartan-3E Starter Kit, shown in Fig. 1, using Xilinx ISE
design tools provides the flexibility to present the student
with many capabilities in terms of design entry, simulation,
synthesis, and hardware programming (Fig. 2). The paper is
organized as follows. In section II, the class structure and
assignments are described in detail. Hardware and software
development tools are presented in section III. Section IV
describes some of the projects that have been implemented
by students in the past. Finally, conclusions and future work
are included in section V.
Fig. 1.
Xilinx Spartan-3E Starter Kit [2].
II. C LASS S TRUCTURE

This course focuses on a System approach to rapid system
prototyping using VHDL and Field Programmable Gate Arrays (FPGAs). A primary objective of the class is to move from
knowledge of VHDL in the software domain to practical implementation and testing of VHDL-based designs in hardware.
Two textbooks by Chu are greatly used and recommended [3],
[4]. They are very detailed references for FPGA synthesis
using VHDL. A considerable amount of example code is
included, as well as interesting suggested problems. They
cover most of the peripherals in the Spartan-3E starter kit. The
three chapters on Picoblaze are extremely useful to provide an
introduction to microprocessor architecture and embedded systems design. Students have found this part of the class exciting
because by the time these topics are covered they have a much
broader understanding of logic design and synthesis. Additionally, a lot of resources are provided during the course including
design tools tutorials (Xilinx Synthesis Tool, ISim Simulator,
Floorplanning), VHDL guides, component datasheets, vendors
websites, etc. These are aimed to further enhance the material
covered in class. Rapid prototyping utilizing VHDL, logic
synthesis, and in-circuit programmable FPGAs allow for implementation of complex system architectures that previously
required an extended design cycle. The course familiarizes the
students with a broad range of topics critical to rapid system
prototyping including system design, design approach, clear
coding techniques, model architecture selection, code reuse,
code documentation, synthesis and debug, timing analysis
(Fig. 3), simulation, and downloading the design to a prototype
55
ISSN 977-2177-128009
Fig. 2.
FPGA design flow [7].
Fig. 3.
it provides the foundations for making smart decisions as to

which parts of the system will be implemented in software
(executing on a microcontroller) and what will be implemented
in hardware (digital circuit design). The Picoblaze section is
the preferred one for the students. It introduces computer
architecture and microprocessor-based system design concepts. Depending on the interest areas of the students, useful
examples related to different kind of applications are provided
to further clarify the concepts and their practical implementation. In the past, we have been able to arrange visits from
Engineers working in the hardware industry in system design
and tool development to give speeches to the class. Students
participation and interaction have been outstanding indicating
a lot of interest for the technologies. Xilinx university program
was included into the curriculum [8]. It provides high quality
teaching materials such as concise presentations and source
code in a very organized manner. Also, Xilinx offers many
online training videos related to FPGA design covering topics
such as ASIC conversion, FPGA architecture, FPGA design,
FPGA power, PlanAhead, embedded design, DSP design, and
HDL coding. At the University of New Mexico, the computer
logic design class is the introductory course for Boolean
algebra and digital logic. It has a laboratory component where
students use VHDL to implement, simulate and analyze basic
asynchronous and sequential circuits. The laboratory is based
on the Xilinxs Spartan-3E Starter Kit. Currently, we are in
the process of working with the Albuquerque Public Schools
system to implement similar labs in selected high schools
where students can take this class as well as calculus I, physics
I, programming fundamentals, intermediate programming and
Engineering programming solving. The intent is that these
classes are completed in 32 weeks, as opposed to the 16
weeks taken at the University, while getting the credits. Thus,
the goals are to reduce the stay of ECE students by one
semester, raise the number of ECE Engineering students,
increase retention and save money to the State of New Mexico.
board for debug and functional verification [5]. Students are

requested to completely read and follow the instructions in the
starter kit user guide and the ISE quick start tutorial to initiate
their way into the Xilinx design tools and the board. In this
case, there is no supplement for reading the manuals; there are
just many tools and configurations that must be understood [6].
The main objectives of the class are included below:
System prototyping using VHDL and FPGAs.
PLD, CPLD and FPGA architecture overview.
System design, simulation and implementation using the
Xilinx software and a Spartan-3E development kit.
Gate level combinational design.
Register transfer level design.
Regular sequential circuit design.
Industry trends in FPGAs and prototyping.
The course is designed to introduce all aspects of system
design so that students can gain an understanding of the
development process as a whole and not just a programming
language (system engineering approach). First, students are
presented with an introduction to VLSI and digital logic design
to provide the foundations under which system prototyping is
supported. Programmable logic architectures are described to
understand the evolution path that brought FPGAs to a great
market penetration across all kinds of industries. Gate-level,
behavioral and structural design in VHDL are presented as
design techniques used to describe the hardware. Schematic
input and code design entry are presented and contrasted to
provide advantages and disadvantages depending on system
complexity. VHDL language structure, simulation and synthesis provide the means for system prototyping, simulation, and
verification. Finite state machine design is explained along
with sequential and synchronous systems (Fig. 4). Finally,
hardware software codesign is explained through the use of
programmable logic and the Picoblaze soft microcontroller
which can be implemented in the Spartan-3E FPGA. At this
point, the student has deep understanding of FPGA capabilities
which is helpful to start assembly programming for Picoblaze
(Fig. 5) and learn how to interface it to programmable logic
in the FPGA. This section is considered very important since
VHDL structure [9].
III. D EVELOPMENT T OOLS

Xilinx ISE Design tool is used for FPGA-based design using
VHDL. Xilinx ISE allows the synthesis and analysis of HDL
56
ISSN 977-2177-128009
Fig. 4.
Synchronous circuit design [3].
designs enabling the developer to synthesize, analyze timing

performance, examine RTL diagrams, simulate the system, and
configure target device. The student is presented with all the
design phases described below [10]:
Design creation: Manage source files, design modules,
netlists, schematics, intellectual property (IP), embedded
processor, DSP modules.
Synthesis: Transform HDL sources into architecturespecific design netlist.
Simulation: Verify functionality of the design based on
modifiable stimuli.
Constraints entry: Specify timing, placement, I/O pin
planning.
Implementation: Convert logical design to a physical file
format to be downloaded to the target device.
Implementation analysis: Analyze performance against
constraints, device resource utilization, timing performance, power utilization, in-system debugging.
Device configuration: Configure the FPGA with generated programming file.
Hardware and software tools from Altium have been acquired to further enhance electronic systems design coverage (Fig. 6). Altium powerfully combines FPGA, PCB (3D
environment) and embedded software development within a
single application (Fig. 7). Altium also offers reconfigurable
hardware platforms to allow interactive, vendor-independent
implementations. The tool offers the integration of the following stages of embedded systems design [11]:
Schematic entry.
PCB level design (FPGA PCB codesign).
Mixed signal simulation.
Signal integrity analysis.
Release to manufacture.
Software system design.
Graphical design capture.
Embedded design.
C to hardware.
Unified data model.
Component management.
Altium Designer provides the capability to develop both
hardware and software for an FPGA-based embedded system
using supplied software and hardware IP components. Compilers and debuggers are included for popular processors such
as Xilinx MicroBlaze, Alteria Nios II, ARM, PowerPC, and
Altiums TSK3000 soft processor. One of the advantages of
using FPGA technology is the gains obtained when porting
Fig. 5.
Picoblaze microprocessor datapath [12].
Fig. 6.
Altium hardware prototyping [11].
embedded C code executed on a CPU to fast programmable

hardware. The tool has a C-to-hardware compiler that generates the required hardware implementation. Altiums hardware
prototyping platforms (NanoBoards) were also obtained to
evaluate the tool. Hands-on teaching materials are currently
being developed to provide the student with a more interactive
experience.
Concepts are explained based on the professors own experience and the literature. Specific examples guide the student
as to how to think and organize the ideas to solve a problem.
Assignments are designed with the intention of providing the
students with the opportunity to tackle problems by themselves
and go through the whole design process to find a working
solution (problem description, system requirements, functional
requirements, architecture definition, system implementation).
Some of the assignments prepared in the past are detailed
below:
Stack design.
FIFO queue.
UART transceiver.
Calculator combining Picoblaze and programmable logic
to control input and output ports.
Central processing unit.
It is grateful to see students assimilate concepts in a clearer
way once they have a system working and after going through
the analysis about how to design the circuit. Somehow implementing a concept makes it easier to remember and understand
57
ISSN 977-2177-128009
Fig. 9.
Fig. 7.
Traffic lights system states.
Altium 3D board visualization [11].
Fig. 10.
PWM signal generation for three-phase VVVF motor controller.
Stepper motor control.

Some of the final projects implemented by the students on
the prototyping board in the previous years are included below:
Viterbi decoder (Fig. 8).
Traffic lights system simulation using VGA (Fig. 9).
Single axis positioner.
Three-phase variable voltage variable frequency motor
controller (Fig. 10).
It has been found that students who possess some kind of
industry experience propose interesting projects likely to be
finished within the available time frame of the class. On the
other hand, students who have only been in academy while
completing their studies find it difficult to define the scope and
application of their designs. The authors strongly believe that
classes with practical experience such as the one described in
this paper are very beneficial for engineering studies since they
provide a meaningful way for understanding and remembering
the concepts.
Fig. 8.
Viterbi decoder datapath.
it rather than just reading its explanation in a book. Students

find this combination of theory and practice very useful.
IV. F INAL AND R ESEARCH P ROJECTS
During the semester students are encouraged to pick a
research project in an area of their specific interest into which
they must find and describe the current state of the art of FPGA
technology. Their research must be presented to the class at
the end of the term. Ideally, topics cover current industry
and academy needs. Similarly, a final project is required for
each student and must be implemented on the board. The
reason for the research project is to motivate the student to
find information from academia and industry specific to the
chosen topic. It is highly recommended that students go to
manufacturers websites, tool designers, and system integrators
to obtain information in the form of webinars, white papers,
and blogs. Following research projects have been proposed by
the students in the past:
FPGA testing and diagnosis.
SystemC as an alternative to VHDL and Verilog.
FPGA partial reconfiguration.
Hardware-software codesign.
V. C ONCLUSIONS
The rapid system prototyping class has been very successful at providing students with understanding of FPGA-based
system design. From open discussions with the students, class
structure and resources have been adapted over the years to
improve the interaction and learning experience. Recommendations to fellow students to take the class from previous
students confirm satisfaction and interest for the course.
A mix of theoretical knowledge of concepts and practical
implementation is fundamental to build a complete and clear
system perspective.
The most difficult part of the course is the evaluation. There
are many concepts involved from cross-disciplinary areas that
it is complicated to fairly judge student concepts assimilation.
58
ISSN 977-2177-128009
There is a lot of documentation available for every layer

of system design: language tutorials, components datasheets,
design tools and instruments documentation making it hard to
fully cover the topics within the limited timeframe.
Current work is focused on creating more laboratories in
which students are presented with phases of system prototyping such as schematic entry for board design, routing, layout,
and manufacturing.
At the University of New Mexico, work is in progress
with the Albuquerque Public Schools system to implement
Xilinxs Spartan-3E Starter Kit based laboratories in selected
high schools.
R EFERENCES
[1] D. Kim, S. Kim, S. Kim, and S. Park, Software engineering education
toolkit for embedded software architecture design methodology using
robotic systems, 200815th Asia-Pacific Software Engineering Conference, pp. 317324, 2008.
[2] Xilinx, Spartan-3e starter kit. http://www.xilinx.com/products/boardsand-kits/HW-SPAR3E-SK-US-G.htm, July 2011.
[3] P. P. Chu, FPGA Prototyping by VHDL Examples. Wiley-Interscience,
first ed., 2008.
[4] P. P. Chu, RTL Hardware Design Using VHDL. Wiley-Interscience,
first ed., 2006.
[5] R. C. Cofer and B. F. Harding, Rapid System Prototyping with FPGAs:
Accelerating the Design Process. Newnes, first ed., 2006.
[6] S.
A.
Edwards,
Experiences
teaching
an
fpga-based
embedded
systems
class.
http://www.cs.columbia.edu/
sedwards/papers/edwards2005experiences.pdf, July 2011.
[7] Xilinx,
Fpga
design
flow
overview.
http://www.xilinx.com/itp/xilinx8/help/iseguide/html/ise fpga design
flow overview.htm, July 2011.
[8] Xilinx, Xilinx university program. http://www.xilinx.com/university/,
July 2011.
[9] DARPA, Rapid prototyping of application specific signal processors
(rassp). http://www.eda.org/rassp/, July 2011.
[10] Xilinx,
Ise
design
flow
overview.
http://www.xilinx.com/support/documentation/sw manuals/xilinx11/ise
c fpga design flow overview.htm, July 2011.
[11] Altium, Altium designer. http://products.live.altium.com/, July 2011.
[12] Xilinx, Picoblaze 8-bit embedded microcontroller user guide.
http://www.xilinx.com/support/documentation/ip documentation/
ug129.pdf, July 2011.
59
ISSN 977-2177-128009
Circuito para la Extraccin de Raz Fraccional Usando

Transistores en Inversin Dbil
Mauricio Huixtlaca-Quintana1,2, Jos Miguel RochaPrez1
Alejandro Daz-Snchez1, Carlos Muiz-Montero2

1 Departamento de Electrnica, Instituto Nacional de
Astrofsica, ptica y Electrnica, Luis Enrique Erro #1,
Tonantzintla, Puebla, Mxico
2 Laboratorio de Sistemas Embebidos. Centro de
Investigacin en Computacin. Instituto Politcnico Nacional
Email: carlosmm2k@gmail.com
1 Departamento de Electrnica, Instituto Nacional de

Astrofsica ptica y Electrnica, Luis Enrique Erro #1,
2 Facultad de Electrnica, Benemrita Universidad Autnoma
de Puebla, Av. San Claudio y 18 Sur Puebla, Mxico
II.
ResumenEn este trabajo, se presenta el proceso de anlisis,

diseo y medicin, de un circuito que permite la obtencin de la
raz fraccional, haciendo uso de las propiedades de bajo
consumo de energa y la relacin exponencial que presentan los
transistores MOSFET en la regin de sub umbral. En el proceso
de simulacin, diseo del circuito y generacin del patrn
geomtrico, se emple la herramienta Tanner usando los
parmetros de la tecnologa ON SEMI de . m.
Palabras clave Procesado analgico,
Compuertas flotantes, Inversin dbil.
I.
Bajo
CORRIENTE DE SUB UMBRAL
Considere el modelo de un transistor en

inversin dbil, la corriente de drenaje es:
la regin de
(1)
Donde es el factor de la corriente en inversin dbil, es es
el factor de no idealidad,
es el voltaje de compuerta a
fuente y el voltaje trmico. Esta corriente se genera cuando
es menor al voltaje de umbral.
consumo,
Despejando
de la ecuacin anterior tenemos:
INTRODUCIN
(2)
Los circuitos integrados para computacin analgica son

bloques de construccin que hallan muchas aplicaciones en
campos tan diversos como el procesado analgico de seales,
sistemas difusos, redes neuronales, etc. Sus caractersticas
ms importantes son su simplicidad, alta velocidad, eficiencia
en rea y bajo consumo de potencia. Las operaciones
analgicas bsicas son la suma/resta, multiplicacin, divisin,
elevar al cuadrado, extraccin de raz y exponenciacin, las
cuales pueden realizarse en modo voltaje [1, 2] y/o en modo
corriente [3, 4], utilizando tcnicas como compuertas
flotantes, el principio translineal, etc., en las diversas regiones
de operacin del transistor MOS.
La Figura 1 muestra un transistor de compuerta flotante,

donde, el voltaje en compuerta est dado por [5]:
Figura 1 Transistor de compuerta flotante
"#$
En este trabajo se presenta un circuito analgico generalizado

para la extraccin de la raz fraccional utilizando la regin de
inversin dbil del transistor MOS. El artculo est
organizado de la siguiente manera: Primero se discutirn
brevemente las ecuaciones del transistor en la regin de
inters y se describir el funcionamiento del circuito
propuesto. La seccin III muestra los resultados de la
simulacin para diferentes casos de raz fraccional y las
dimensiones propuestas para la creacin del patrn
geomtrico presentado en la seccin IV. En la seccin V se
observan los resultados de las mediciones correspondientes a
cada uno de los bloques diseados. Finalmente en la seccin
VI se bosquejan las conclusiones.
%&
(3)
Donde ' es la capacitancia de entrada, ' y ' pertenecen

a las capacitancias parsitas entre compuerta-drenaje
compuerta-fuente respectivamente, mientras que )* es la
carga atrapada en la compuerta. Si despreciamos las
capacitancias parsitas haciendo a ' mucho mayor a estas, y
suponiendo que la carga atrapada es cero, podemos escribir el
voltaje en el nodo x de la siguiente manera para n = 2:
+
Donde '.
60
'$ ! '-
ISSN 977-2177-128009
(4)
(4a)
Simplificando
Si el transistor M est en inversin dbil, entonces:
?+
: CD >
ln 4 5
: A> !
?,
: B>
(10)
Aplicando propiedades de logaritmos tenemos:

G+
A G
Despejando
lo siguiente.
IJ
IJ
678
>
(11)
G+
G
"
G,
G
(12)
La ecuacin (12) podemos discutirla en funcin de los

siguientes casos:
Por otro lado, si consideramos un transistor conectado como

diodo y adems en inversin dbil; (Fig. 2) el voltaje
generado en la compuerta es:
;<=
H: >
y considerando a la ecuacin (4a) obtenemos
Figura 2 Transistor conectado como diodo
G,
B G
: >
CD
Caso 1: Sea KL un capacitor unitario y supongamos que K$ es n

veces mayor que K- entonces:
KL
K$
M
KL ! KL KL O ! 1Q
K- KL N K. K$ ! K-
(5)
El circuito propuesto para la extraccion de la raiz fraccional

se puede expresar en funcion de los dos bloques anteriores y
considerando que todos los transistores estan en inversin
dbil, Figura 3 [6].
"?S
?+
R?
De la misma forma:
?S O"T$Q
?,
(13)
"
O"T$Q
?S
?S O"T$Q
(14)
$
O"T$Q
(15)
De lo anterior obtenemos:
IJ
B
OBU+Q
"
+
OBU+Q
Caso 2. Consideremos el caso contrario,

KL
K
M $
KKL N K. K$ ! K- KL ! KL
(16)
KL O ! 1Q
(17)
IJ
Figura 3. Circuito para extraccin de raz ensima
Para este circuito podemos escribir:

?+
1@ !
?,
1"
?+
: A> ! ?,
(6)
: B>
?+
: A> !
III.
(8)
?,
: B>F
(18)
cualquier valor, con lo que se puede tener un nmero grande

de posibilidades para el valor del exponente.
Simulacin
Considerando el diagrama esquemtico de la Figura 3 y

poniendo en inversin dbil a los transistores, se obtuvieron
las siguientes curvas para diferentes valores de c1 y c2.
Igualando trminos obtenemos:
: CD >
"
(7)
Por otro lado considerando al transistor de salida y

suponiendo que se encuentra en inversin dbil:
: CD >
B
OBU+Q
De las ecuaciones (16) y (18) podemos ver que la corriente de

drenaje del transistor de salida es una raz fraccional de la
corriente de entrada multiplicada por una constante. En los
dos casos anteriores, consideramos que tanto K$ como K- se
expresan en mltiplos de un capacitor unitario, sin embargo,
?
si quitamos esa restriccin, el exponente de IWX , , puede ser
Sustituyendo 1@ y 1 " de la ecuacin (6) tenemos:

?
+
OBU+Q
(9)
61
ISSN 977-2177-128009
Figura 4 Resultados de simulacin para diferentes valores del exponente.
La Figura 4 muestra las curvas correspondientes a las races

cuadrada, cbica y cuarta, as como de fracciones que se
obtuvieron quitando la restriccin de una capacitancia
unitaria. Como fueron ^1/2.5 y ^1/3.5. Como puede
apreciarse el circuito es capaz de obtener la raz fraccionaria
de la corriente de entrada.
Figura 5 Detalles del circuito a fabricar.
IV.
Se realiz el patrn geomtrico del circuito de la Figura 5

para el caso 2 considerado en la seccin II y para tres valores
de C2= 3pF, 6pF y 9pF. En la Figura 6 se muestra el layout
parcial de un solo bloque. Por razones de espacio no se
muestran los dos bloques restantes.
La Tabla 1 presenta las dimensiones de los transistores

empleados en la simulacin de la Figura 4 utilizando los
parmetros de la tecnologa ON SEMI de 0.5m.
M1
M2
M3
M4
1.2um
1.2um
1.2um
2.1um
15um
15um
15um
15um
LAYOUT
Tabla 1
La Figura 5 muestra detalles adicionales del circuito a

fabricar. Los transistores M1 y M2 se encuentran en inversin
dbil con una corriente de polarizacin Ib de 500nA, mientras
que M2 es polarizado con una fuente de corriente diente de
sierra con una frecuencia de 200 Hz y con una corriente de
500nA de amplitud pico. El capacitor nc toma valores de
3pF, 6pF y 9pF mientras que el capacitor unitario es de 3pF.
Debido a la carga atrapada en la compuerta flotante se aadi
un transistor tipo P apagado con el propsito de fijar un punto
de operacin en la compuerta[7] con un Vs de 0.58V.
Finalmente con el propsito de obtener la respuesta en modo
voltaje se aade una resistencia de 1 Mega Ohm conectada a
Vdd = 1.65V con el fin de medir tanto el voltaje en el drenaje
del transistor M3 como la corriente de salida, que es la
funcin de la raz fraccional, esto con el fin de poder
comparar estas seales con las mediciones despus de la
fabricacin. Tanto las corrientes como los voltajes de
polarizacin dependieron de las dimensiones pre definidas de
los transistores con la intencin de poder aplicar las tcnicas
de diseo para reducir el error de mismatch [9] usando
tcnicas de interdigitado entre otras.
Figura 6 Layout del circuito.

Las dimensiones del rea activa son de 90
mX38
m
En la Figura 6 se observan 3 secciones correspondientes a los

transistores tipo N interdigitados, al transistor tipo P apagado
y el ms grande a los capacitores c y nc.
La Figura 7 muestra la imagen del chip en el cual se
encuentran los 3 bloques remarcados en color amarillo. En las
esquinas inferiores se pueden apreciar las reas que
corresponden las races cuadrada cbica y cuarta con los
capacitores de 3pF 6pF y 9pF respectivamente.
62
ISSN 977-2177-128009
Figura 9. Resultados de las mediciones.
VI.
En este trabajo se propone un circuito analgico capaz de

obtener la raiz fraccional de una corriente de entrada. Los
resultados de los analisis muestran que controlando la
relacin entre los capacitores se puede obtener cualquier raz
fraccional manteniendo la misma topologa del circuito, sin
necesidad de agregar bloques adicionales, con esto,
permanece constante el rea activa requerida por el circuito
lo cual hace modular el patrn geomtrico, incluso el
capacitor unitario tambien podria ser el mismo. El ruido
generado por los circuitos y el mismo proceso de medicin
hacen dificil caracterizar la respuesta del circuito a muy
bajas corrientes, sin embargo, las mediciones que se
obtuvieron estan acorde con las simulaciones cuando las
corrientes se encuentran en el rango de nano amperes.
Figura 7. Microfotografa del Chip Fabricado. Las reas de los circuitos

se muestran en amarillo.
V.
MEDICIONES.
La Figura 8 muestra el equipo que se emple en la medicin,

tomando en cuenta los valores de polarizacin que se
obtuvieron en el proceso de diseo.
VII.
La medicin con compuertas flotantes involucra un proceso

de descarga, teniendo que hacer incidir luz directamente
sobre el chip, ya que de no ser as, obtendramos mediciones
incorrectas debido a que no podemos conocer la carga
almacenada en la compuerta durante la medicin, otro
problema que se presenta es que las mediciones se deben
tomar a un tiempo donde la seal se establezca debido al
mismo problema de almacenamiento de carga.
Se tomaron las muestras simultaneas de los 3 bloques para
poder observar que efectivamente el comportamiento
estuviera en funcin de la razn entre los capacitores de
entrada de la compuerta y se comprararon con las curvas de la
simulacin llegando a observar una similitud entre lo
esperado y las mediciones. La Figura 9 muestra los resultados
obtenidos en las mediciones. Note que lo que se mide es
voltaje en el drenaje del transistor de salida por lo que las
curvas aparecen invertidas con respecto a la Figura 4.
Ademas para valores de voltaje cercanos a 1.5V la corriente
es muy pequea, del orden de nano Amperes por lo cual
resulta dificil de medir.
63
REFERENCIAS
[1]
Seyed R. Zarabadi, Mohammed Ismail, and Chung-Chih Hung, High

Performance Analog VLSI Computational Circuits, IEEE Journal of
Solid-State Circuits, Vol. 33, NO. 4, April 1998
[2]
S. Vlassis and S. Siskos,Design of Voltage-Mode and Current-Mode

Computational Circuits Using Floating-Gate MOS Transistors, IEEE
Transactions on Circuits and SystemsI: Regular Papers, vol. 51, no.
2, February 2004.
[3]
Weihsing Liu, Wei-Lung Mao, Jyh Sheen, A Low-power and Lowvoltage Cube-law Circuit, IEEE Conference on Electron Devices and
Solid-State Circuits, Dec 2007. EDSSC 2007, Pags 829-832.
[4]
L. Song, M.I. Elmasry and A. Vannelli, Analog Neural Network

Building Blocks Based on Current Mode Subthreshold Operation,
IEEE International Symposium on Circuits and Systems, 1993, ISCAS
'93, vol.4, Page(s): 2462-2465.
[5]
Minch B. A., Diorio C., Hasler P., Mead C. A. Translinear Circuits

using Subthreshold floating-gate MOS transistors, Analog Integrated
Circuits and Signal Processing, Vol. No.2, 1996. pp. 167-179.
[6]
Gonzlez-Carabarn L,Gmez-Castaeda F and Moreno-Cadenas J.A.

Generalized nth-Power-Law and nth-Root Circuits
2009 6th
International Conference on Electrical Engineering, Computing Science
and Automatic Control (CCE 2009) (Formerly Known as ICEEE)
[7]
Koosh V.F., Goodman R. Dynamic Charge Restoration of Floating

Gate Subthreshold MOS translinear Circuits Advanced Research in
VLSI, 2001. AVRVLSI 2001. Proceeding, 2001 Conference on 2002,
pp. 163-171.
[8]
E. Rodriguez-Villegas, Low Power and Low Voltage AnalogueDesing

with the Floating Gate MOS, IET,2006.
[9]
Alan Hastings, The Art of Analog Layout Prentice-Hall, 2001.
Figura 8. Montaje experimental para realizar las mediciones.
CONCLUSIONES
ISSN 977-2177-128009
Classificaca o de Modulaca o em Radio Cognitivo:

Uma Implementaca o em FPGA
Adalbery R. Castro, Lilian C. Freitas, Claudomir Cardoso and Aldebaro Klautau
Laboratorio de Sensores e Sistemas Embarcados (LASSE) - Universidade Federal do Para (UFPA)
Rua Augusto Correa 01 - CEP 66075-110 - Belem - Para - Brasil.
Web: www.lasse.ufpa.br. E-mails: {adalbery, liliancf, claudomir, aldebaro}@ufpa.br
Abstract This paper presents the development of an IP
(Intelectual Property) core for modulation classification in
cognitive radio consisting of an SVM (support vector machine),
which can be easily programmed and optimized based on an
feature extractor. The SVM IP core supports multiple classes and
can be used in any pattern recognition application. This paper
also evaluates the performance of SVM considering the feature
extractor CSS (concatenated sorted symbols). The CSS-SVM
classificator have been implemented in VHDL. The results show
the feasibility of the classification system developed, which can be
used not only in cognitive radio but also in military applications.
Resumo Este artigo apresenta o desenvolvimento de um
nucleo
IP (Intelectual Property) para classificaca o de modulaca o
em radio cognitivo. Este nucleo

IP e composto por um
classificador SVM (Maquina de Vetores de Suporte), o qual pode
ser facilmente programado e otimizado com base no extrator de
caractersticas. O classificador SVM suporta multiplas

classes
e pode ser usado em qualquer aplicaca o de reconhecimento de
padroes. Este artigo tambem avalia o desempenho do classificador
SVM, considerando a tecnica de extraca o de caractersticas
chamada de CSS (Smbolos Ordenados e Concatenados). O CSS
usa os smbolos ordenados de uma modulaca o digital linear
para representar a modulaca o. O classificador CSS-SVM foram
implementados em VHDL. Os resultados mostram a viabilidade
do sistema de classificaca o desenvolvido, o qual pode ser usado
nao apenas em radio cognitivo, mas tambem em aplicaco es
militares.
Keywords Cognitive radio, modulation classification, support
vector machine, FPGA.
I. I NTRODUC AO
As tecnicas de radio cognitivo (RC) vem sendo bastante
pesquisadas devido sua capacidade de aumentar a eficacia dos
sistemas de comunicaca o. A classificaca o de modulaca o [1] e
um dos modulos da etapa de sensoriamento espectral em RC
que mais demanda esforco computacional.
Na tarefa de classificaca o de modulaca o, o objetivo e
determinar que tipos de sinais estao ocupando o espectro.
Para isso, em etapa previa, definem-se as N classes e o
classificador convencional ira sempre decidir por uma dentre
as N classes. A definica o das classes pode levar em conta o
tipo de modulaca o, forma de onda, largura de faixa, frequencia
da portadora, dentre outros aspectos [2].
O processo de classificaca o de modulaca o e constitudo por
dois blocos: um bloco de extraca o de parametros e um de
classificaca o. O primeiro bloco extrai a informaca o relevante
para a decisao, transformando os sinais recebidos (apos passar
pelo canal) em um conjunto chamado de parametros ou
64
caractersticas. Os parametros sao utilizados pelo classificador

para escolher a classe que melhor descreve a modulaca o do
sinal.
As publicaco es sobre classificaca o automatica de modulaca o
(AMC, de automatic modulation classification) disponveis
na literatura abordam diferentes tecnicas de extraca o de
parametros e de classificaca o, conforme e mostrado em [3].
No entanto, nao ha projetos de arquitetura para implementaca o
pratica (em hardware) de AMC. Um motivo talvez seja a clara
conexao de AMC com sistemas militares e o financiamento
de muitas pesquisas por instituico es vinculadas a` seguranca e
inteligencia.
Em sua grande maioria, os algoritmos publicados sao
desenvolvidos e avaliados em ambientes de simulaca o,
abstraindo-se restrico es impeditivas para implementaca o em
tempo real, como o custo computacional. Dado que detectar
e classificar sinais em tempo real ou com uma duraca o de
tempo curta e uma tarefa primordial para operaca o de sistemas
de RC e diversas tecnicas tem sido estudadas, tais como
tecnicas que utilizam cicloestacionariedade [4], cumulantes
cclicos [5], espectrograma tempo-frequencia [6], transformada
wavelet [7], o que se observa e que a maioria dos algoritmos
propostos na literatura apresenta um custo relativamente alto.
Dessa forma, torna-se de grande interesse a implementaca o
e testes de algoritmos que sejam factveis de serem
embarcados em dispositivos como as placas USRP [8] e
modulos de FPGA, capazes de em tempo real analisar
determinadas faixas de frequencias, detectando espectros
disponveis para uso, bem como classificando modulaco es de
sinais de usuarios.
Nesse contexto, este trabalho tem como objetivo
o desenvolvimento de uma arquitetura eficiente para
classificaca o de modulaca o em RC, com e nfase especial
na implementaca o em hardware e operaca o em tempo real.
Para isso foi desenvolvido um IP (Intelectual Property) core
baseado em um classificador SVM (suport vector machine),
com suporte a multiplas classes, em conjunto com o extrator
de parametros CSS (concatenated sorted symbols), o qual usa
os smbolos ordenados de uma modulaca o digital linear para
representar a modulaca o.
O restante deste trabalho esta organizado da seguinte
maneira: a Seca o II apresenta o extrator de parametros
CSS, bem como o processamento de sinais que precede
a sua utilizaca o. A Seca o III destina-se a apresentar o
ISSN 977-2177-128009
II. E XTRATOR DE PAR AMETROS

CSS
O CSS (concatenated sorted symbols ou smbolos
ordenados e concatenados) e um extrator de parametros para
classificaca o de modulaca o, proposto recentemente [9], o qual
usa os smbolos da modulaca o digital como parametros de
entrada do classificador.
O extrator de parametros CSS armazena a magnitude e
a fase dos smbolos recebidos em s (apos a normalizaca o)
e ordena-os separadamente. Os dois vetores ordenados
(magnitude e fase) sao entao concatenados, gerando um novo
vetor x com a dimensao D igual a duas vezes o numero N
de smbolos (D = 2N ), que busca refletir uma assinatura
individual da modulaca o correspondente.
A Fig. 1 ilustra um exemplo de dois possveis vetores x
com amostras de smbolos recebidos (modulo e fase) das
modulaco es 16-QAM e 8-PSK apos serem ordenados. Cada
vetor corresponde a N = 250 smbolos, produzindo um
comprimento total de D = 2N = 500 elementos em x
correspondentes a 250 magnitudes e 250 fases, nesta ordem.
Nota-se que existe apenas um valor de modulo e oito de fases
para a modulaca o 8-PSK. E possvel distinguir de forma clara
as diferentes assinaturas fornecidas pelo extrator CSS para as
duas modulaco es.
4
Magnitude / Fase (rad)
16QAM, SNR = dB
8PSK, SNR = dB
0
100
200
300
Indice das amostras
400
500
4
3
2
Magnitude / Fase (rad)
classificador SVM. A Seca o IV apresenta a arquitetura do

IP core CSS-SVM para classificaca o de modulaca o em RC.
A Seca o V apresenta e discute os resultados de simulaca o e
sntese em FPGA do IP Core CSS-SVM. A Seca oVI apresenta
as consideraco es finais do trabalho.
1
0
1
2
3
4
16QAM, SNR = 15 dB
8PSK, SNR = 15 dB
0
100
Na Fig. 2, um rudo branco gaussiano com SN R = 15 dB

foi adicionado aos dois tipos de modulaca o anteriormente
ilustradas. Neste caso, percebe-se que a distinca o entre os
dois tipos de modulaca o nao e mais tao o bvia quanto no caso
ideal, sem rudo. No entanto, em [9] e mostrado que ainda e
possvel distinguir as modulaco es mesmo com adica o de rudo
utilizando-se, por exemplo, o classificador SVM.
III. C LASSIFICADOR SVM
Maquina de Vetores de Suporte (SVM) e um tecnica de
reconhecimento de padrao baseada na teoria do aprendizado
65
400
500
Fig. 2. Comparaca o entre 16-QAM e 8-PSK, com D = 2N = 500 amostras

(250 magnitudes e 250 fases), SN R = 15dB.
estatstico, a qual foi desenvolvida por Vapnik [10] com o

objetivo de solucionar problemas de classificaca o de padroes.
Foi desenvolvida para resolver problemas de classificaca o
binaria, ou seja, dado um conjunto de treino X com exemplos
de duas classes, uma SVM separa essas atraves de um
hiperplano, o qual e determinado por alguns exemplos das
classes pertencentes ao conjunto X que sao denominados
de vetores de suporte. Considerando padroes linearmente
separaveis, o hiperplano e uma superfcie de decisao que
separa os exemplos das classes usando maxima margem de
separaca o entre elas. Entretanto, para padroes nao-linearmente
separaveis, tanto o hiperplano quanto os vetores de suporte
sao obtidos atraves de um resultado de uma funca o de
mapeamento apropriada sobre os dados do conjunto que os
torna separaveis [11].
Os fundamentos de uma SVM sao constitudos pela Teoria
de Aprendizado Estatstico (TAE). De acordo com [12], a TAE
estabelece as ferramentas matematicas que permitem encontrar
um bom classificador a partir do conjunto de dados de treino.
Descreve-se o desempenho desejado de um classificador
f como sendo uma taxa que indica o menor erro durante a
etapa de treinamento, cujo erro e mensurado pelo numero de
predico es incorretas de f , a qual e definida de risco emprico
Remp (f ) [13], fornecido pela Equaca o 1:
Remp (f )
Fig. 1. Comparaca o entre 16-QAM e 8-PSK com D = 2N = 500 amostras

(250 magnitudes e 250 fases) , sem rudo.
200
300
Indice das amostras
1
n
c(f (xi ), yi )
(1)
i=1
Na Equaca o 1, n representa o numero de exemplos contidos no

conjunto de entrada X e c(f (xi ), yi ) representa uma funca o
custo, relacionando a previsao de f quando a sada desejada e
y. Um tipo de funca o frequentemente utilizada em problemas
de classificaca o e a 0/1 definida por c(f (xi ), yi ) = 12 |y
f (x)|, a qual retorna o valor 0 se x e classificado corretamente
e o valor 1, caso contrario.
IV. A RQUITETURA P ROPOSTA : IP CORE DE UMA SVM
PROGRAM AVEL
Esta seca o apresenta o desenvolvimento de um IP
core consistindo de uma SVM para funcionar como um
ISSN 977-2177-128009
classificador universal que realize a tarefa de classificaca o

dentro de um perodo de tempo definido (tempo real) e que
possa ser facilmente programado e otimizado com base no
extrator de parametros CSS. Para simulaca o e implementaca o
em FPGA, foi utilizada a linguagem VHDL em conjunto com
os programas ModelSim e Altera Quartus II [14].
Primeiramente foi idealizada a arquitetura apenas do
classificador SVM, sem o extrator de parametros CSS.
Para que o classificador SVM realizasse o processamento
computacional necessario para classificaca o de forma rapida
e em tempo real, foi idealizada uma arquitetura onde os
conjuntos de testes pudessem ser continuamente fornecidos
e os coeficientes do classificador pudessem ser alterados
a qualquer momento. Esta e uma arquitetura programavel,
proposta para classificaca o multiclasses, utilizando-se
classificadores binarios. O treinamento dos classificadores
deve ocorrer fora desta arquitetura. A Equaca o 2 representa a
funca o do problema de decisao entre duas classes utilizando
classificador binario SVM, onde w and b representam os
coeficientes do classificador e x representa o vetor de teste.
O sinal de f (x) indica o resultado do classificador, com
f (x) = 0 sendo o limiar de decisao entre as duas classes para
o qual o classificador foi treinado.
D
f (x) =
wi xi + b.
(2)
i=1
A Fig. 3 mostra como ocorre a entrada dos coeficientes dos

classificadores e a entrada dos dados de teste.
Os sinais provenientes da entrada i sao armazenados em
registradores de deslocamento, onde diversos registradores
sao dispostos em arranjos sequenciais e as informaco es
sao deslocadas pelo circuito ate que todos os registradores
estejam preenchidos. D e o numero de parametros ou
features. Considerando y classes, B representa o numero
de classificadores binarios utilizados e e calculado atraves
da combinaca o dois a dois de y classes. A Equaca o 3
pode ser utilizada para calcular o numero de classificadores
binarios B necessarios para classificaca o multiclasses com y
classes. Os registradores w e b armazenam os coeficientes dos
classificadores e x armazena os dados para teste. Durante as
simulaco es, o coeficiente b apresentava ordem de grandeza
superior aos demais coeficientes, por isso ele ocupa duas
posico es de registradores, ou seja, o coeficiente b utiliza o
dobro de bits que os demais coeficientes.
y(y 1)
(3)
2
Com o preenchimento dos registradores que armazenam
os coeficientes, somente os valores de testes precisam ser
informados para incio de uma classificaca o. Uma vez que os
coeficientes tenham sido informados, a classificaca o podera
ocorrer continuamente a cada conjunto de teste inserido.
Caso seja necessario atualizar os coeficientes dos
classificadores, a entrada i sera direcionada para o conjunto
B
66
B
msb
B
lsb
B-1
msb
B-1
lsb
B-1
w
D
2
msb
2
lsb
1
msb
1
lsb
1
w
D
B
D
2
D
B
D-1
B-1
w
D-1
2
D-1
1
w
D-1
D-1
B
4
B-1
w
4
2
4
B
2
B
1
B-1
w
3
B-1
2
B-1
1
2
3
2
2
2
1
1
2
1
1
B
3
1
w
4
1
w
3
ctrl
Fig. 3.
Entrada dos coeficientes e dados de testes para classificaca o.
de registradores dos classificadores que se deseja atualizar.

O tempo necessario para a atualizaca o dos coeficientes dos
classificadores e o perodo do clock multiplicado pelo numero
de registradores a serem preenchidos.
Para implementaca o de um classificador SVM em FPGA
e proposta uma arquitetura em quatro passos bem definidos,
conforme mostrado na Fig. 4, a qual foi implementada atraves
de uma maquina de estados.
Passo 1: realiza a multiplicac
a o dos elementos dos vetores
w e x da Equaca o 2 e armazena os resultados em outros
conjuntos com a mesma quantidade de registradores
(para nao perder informaca o e por se tratar de uma
multiplicaca o, os registradores que recebem os resultados
possuem o dobro da quantidade de bits dos registradores
que armazenam os vetores w e x).
Passo 2: realiza, para cada classificador, a soma de
todos os valores provenientes do Passo 1, acrescido do
coeficiente b.
Passo 3: verifica o resultado de cada classificador. Sendo
classificadores binarios, o resultado de um classificador
corresponde a um voto dado para uma das classes
para a qual foi treinado. Ao final do Passo 3, tem-se
a quantidade de votos totalizados para cada uma das
classes.
Passo 4: faz a confer
encia de qual foi a classe que recebeu
mais votos contabilizados no Passo 3. O resultado e a
sada da classificaca o.
E importante perceber que esta arquitetura usurfrui das
caractersticas de descrica o de hardware em FPGA, ou seja,
cada um dos passos descritos para se realizar o calculo da
SVM ocorrera em um perodo do clock. No Passo 1, por
exemplo, em um u nico ciclo do clock ocorrem todas as
multiplicaco es necessarias para se calcular o produto interno
entre o vetor teste e os coeficientes dos classificadores binarios.
A. Implementaca o do classificador de modulaca o CSS-SVM
1) Bloco principal: A Fig. 5 representa a concepca o do
bloco classificador, mostra quais sao os sinais necessarios na
entrada e sada.
O sinal clk representa o clock, que e a base de tempo
do sistema. O sinal ckd e necessario para indicar que existe
nas entradas i1 e i2 novas informaco es disponveis para
o bloco. Estas informaco es podem ser coeficientes para os
classificadores ou smbolos para teste. A taxa de chegada dos
ISSN 977-2177-128009

w
1
D
1
b
1
D-1
1
1
B-1
D
D-1
1
D
1
D-1
1
1
B-1
B-1
D-1
B-1
D
D-1
B-1
D-1
B-1
1
B
D
B-1
1
B-1
B
D-1
B
D
B
1
D-1
B
D-1
B
1
contagem
3
4
cy
c y-1
c1
max
classe
Fig. 4.
Representaca o do bloco que realiza a classificaca o.
Memria
wB
B
2
B
1
B-1
2
B-1
1
w2
w2
w 2D/2-1
w2
2
2
2
1
w1
w1
D-1
w 1D/2+1
w1
1
2
1
1
2
D-1
x 2D/2+1
x2
x2
x2
1
D
1
D-1
x 1D/2+1
x1
D/2
x1
2
x1
B-1
B-1
D-1
x
x
B-1
w D/2
x2
D/2
D/2
D/2
i1
i2
ctrl
wB
wB
clk
Lgica de
Acesso
Memria
w1
w1
x2
x2
x1
x1
B. Logica de Acesso a` Memoria

Foi criado um bloco de memoria para cada classificador
binario SVM e outros blocos de memoria para receber
os conjuntos de testes. Para gerenciar todos os blocos de
memorias, foi necessario criar uma logica de acesso a`
memoria. A Fig. 6 ilustra os diversos blocos de memoria
necessarios para implementaca o do classificador CSS-SVM.
Os blocos superiores mostrados na Fig. 6 e simbolizados
por w foram criados para armazenar os coeficientes dos
classificadores binarios SVM. Os blocos simbolizados por
x foram criados para armazenar os smbolos para teste.
Observa-se que existem dois blocos de memoria para os testes:
x1 e x2 . Os coeficientes b da Equaca o 2 necessitam do dobro
de bits e sao armazenados em registradores e nao na memoria
RAM. Isto foi uma decisao de projeto como forma de obter
melhor desempenho. Nos sinais de sada da Fig. 6 existe
sempre informaca o dos diversos blocos de memoria, isto foi
necessario para otimizaca o da arquitetura proposta.
D/2
w D/2+1
smbolos deve ser diferente do clock. Os sinais i1 e i2 sao

as entradas do sinal. Para estes sinais utilizou-se a notaca o
complemento de dois, sendo 16 bits para magnitude e 16
bits para fase dos smbolos que serao classificados. O sinal
dok indicara ao bloco que todo um conjunto de smbolos de
teste ja foi informado e que uma nova classificaca o podera
dar incio. O sinal ctrl e de controle implementado para que a
entrada i1 tambem possa ser utilizada para fornecer ao bloco
os coeficientes dos diversos classificadores.
wB
w D-1
Sinais de entrada e sada do classificador.
w BD/2+1
D-1
wD
B-1
Fig. 5.
wB
67
Fig. 6.
Representaca o dos blocos de memoria.
C. Funcionamento
Para compreender esta arquitetura, e necessario lembrar que
para realizar a classificaca o multiclasses, os coeficientes dos
B classificadores binarios SVM deve ser inserido, bem como
um conjunto de teste, o qual precisa ser tratado antes de ser
aplicado aos classificadores binarios. Todas estas informaco es
chegam ate o bloco principal, representado pela Fig. 5, atraves
dos sinais i1 e i2. Estes sinais chegam ao bloco de forma serial
e a uma taxa definida por ckd.
O sinal ctl e responsavel por informar qual o destino das
informaco es contidas em i1 e i2. Quando o valor de ctl e zero,
indica ao bloco que as entradas i1 e i2 apresentam um novo
smbolo de teste e que devera ser direcionado para um dos
ISSN 977-2177-128009
blocos de memoria identificados como x1 ou x2 . A entrada i1

e a fase e i2 e o modulo. So pode ocorrer escrita de um valor
em cada ciclo do clock por bloco de memoria e, por isso, x1
ou x2 sao segmentados em dois blocos (um para o modulo e
outro para a fase), permitindo que eles possam ser escritos em
um u nico ciclo do clock.
Quando se deseja informar o valor de um coeficiente, a
entrada i1 deve conter este valor. O sinal ctl indica de qual
classificador pertence o coeficiente e, finalmente, a entrada i2
deve indicar para qual posica o dentro do bloco de memoria
deve-se armazenar o valor de i1. Dessa forma, e possvel
alterar o valor de apenas um coeficiente. Isto reflete em um
menor tempo de atualizaca o do conjunto de coeficientes, caso
seja necessario atualizar apenas alguns dos valores.
Apos o armazenamento dos coeficientes dos classificadores
binarios, e necessario a entrada dos conjuntos de testes para
classificaca o. Para que nao ocorra a interrupca o do fluxo de
smbolos para teste e que existe os dois blocos de memoria
x1 ou x2 . O bloco x1 e primeiramente preenchido e o sinal
dok deve indicar o incio do processo de classificaca o. Neste
momento, os novos smbolos que serao utilizados para a
proxima classificaca o, serao direcionados para x2 . Os valores
armazenados em x1 sao processados para classificaca o.
Considerando a necessidade de se armazenar N = 250
smbolos de teste e coeficientes para B = 6 classificadores
binarios, sendo D = 2N = 500, entao sao necessarios
5006+5002 = 4.000 posico es de memoria. Se cada posica o
necessitar de 16 bits, entao 64.000 bits de memoria RAM
sao necessarios para esta implementaca o. Antes de ocorrer
a primeira classificaca o, todos os coeficientes w devem ser
informados, assim como, pelo menos um conjunto de teste
com N smbolos.
Para o classificador CSS-SVM, a primeira etapa e a
ordenaca o dos modulos e das fases atraves de um algoritmo
de ordenaca o.
O algoritmo empregado consiste em percorrer varias vezes
todos os elementos da memoria em busca do maior valor.
A busca ocorre da direita para a esquerda e os maiores
valores sao armazenados na propria memoria da direita para a
esquerda. Ocorrem varias repetico es de busca do maior valor.
O resultado da primeira busca e o maior valor armazenado na
memoria. O resultado da segunda busca sera o segundo maior
valor armazenado na memoria. Estas buscas ocorrem ate que
todos os numeros estejam ordenados.
E possvel deduzir o numero de passos necessario para
efetuar toda a ordenaca o. Os valores a serem ordenados devem
estar nos blocos x1 ou x2 da memoria. Estes blocos possuem
N = D/2 posico es, como mostra a Fig. 6. Na primeira vez
que o algoritmo buscar o maior valor, ele ira ler todas as
N posico es de memoria. Na segunda vez ele ira ler N 1
posico es de memoria, assim por diante ate ler somente as duas
primeiras posico es. O numero de escrita e igual ao numero
de leitura na memoria durante a ordenaca o e, na maioria das
vezes, ocorrem em um mesmo passo, exceto no incio e no
fim do algoritmo. O numero total de passos necessarios para
o algoritmo ordenar todos os valores e deduzido a partir da
68
soma dos termos de uma progressao aritmetica somados de

mais uma constante que representa os passos de inicializaca o
e finalizaca o do algoritmo.
A Equaca o 4 mostra como calcular o numero total de
passos (np) necessarios para ordenar N posico es de memoria.
A constante 7 foi obtida empiricamente e e resultado de
alguns ciclos adicionais necessarios para inicializar e finalizar
o processo de leitura e escrita da memoria no FPGA.
2+N
(N 1) + 7
(4)
2
Se N = 250, entao e possvel calcular o numero de passos
total para ordenar 250 posico es na memoria. Neste caso, np =
31.381 ciclos necessarios. Sendo cada ciclo realizado em um
perodo do clock (sinal clk) igual a 20 ns (f = 50 MHz),
entao e necessario um tempo total de 627 s para realizar
toda operaca o de ordenaca o dos modulos e das fases.
Durante a ordenaca o, quando o algoritmo faz a busca pelo
maior valor, ja neste momento ele realiza a multiplicaca o
deste valor encontrado pelos coeficientes dos classificadores
binarios, melhorando o desempenho da implementaca o em
tempo de execuca o e tambem em uso de recursos do FPGA.
Os resultados parciais das multiplicaco es sao armazenados
em B registradores ate a conclusao da ordenaca o. Este
registradores devem ser inicializados antes da ordenaca o com
os valores dos coeficientes b de cada um dos classificadores
binarios.
Para o resultado final do classificador CSS-SVM, sao
executados os Passos 3 e 4 da Fig. 4.
E possvel estimar a relaca o entre a taxa do clock e a taxa
com que os smbolos chegam ao classificador. Basta dividir o
numero total de ciclos necessarios para uma classificaca o pelo
numero de smbolos utilizado para teste. E desejado que ocorra
uma classificaca o antes que um novo conjunto de smbolos
de teste chegue completamente ao classificador, para isso,
a frequencia do sinal clk deve ser superior a` frequencia do
sinal ckd. A taxa de smbolo deve ser contnua e e dada por
ckd. O maior custo computacional para esta arquitetura e a
ordenaca o dos valores sob teste e as operaco es sao realizadas
na frequencia do sinal clk.
Se o perodo do clock (clk) for igual a 20 s (f = 50 MHz),
entao o menor perodo de entrada de smbolo (sinal ckd)
devera ser aproximadamente 2.510 s.
np
V. R ESULTADOS
A arquitetura proposta foi inicialmente implementada e
testada no software Matlab e para auxiliar no desenvolvimento
e treino do classificador foi utilizado o software de mineraca o
de dados WEKA (Waikato Environment for Knowledge
Analysis) [15], o qual e uma ferramenta livre que reune
uma serie de classificadores e e bastante utilizada no meio
academico para comparaca o de desempenho em problemas de
classificaca o.
As modulaco es utilizadas foram BPSK, 4-PAM, 16-QAM e
8-PSK. Considerando rudo AWGN (Additive White Gaussian
Noise), de acordo com os trabalhos de classificaca o de
ISSN 977-2177-128009
modulaca o disponveis em outras publicaco es como em [16]

e [17].
Para os resultados mostrados a seguir, foram utilizados:
300 conjuntos de treino para cada modulac
a o.
50 conjuntos de teste para cada modulac
a o.
N = D/2 = 250 smbolos para cada conjunto de treino.
SN R = 2 dB usado no treino e no teste.
B = 6 classificadores bin
arios SVM.
fclk = 50 MHz para o sinal clk. Tclk = 20 s.
Tckd = 2.600 s.
A primeira verificaca o dos resultados consistiu em comparar
os resultados das classificaco es obtidas do programa Weka
com os resultados da implementaca o em FPGA. Notou-se que
as duas implementaco es tiveram o mesmo percentual de acerto
176/200 = 0, 88 88%.
Outro resultado relevante que pode ser auferido e o tempo
necessario para classificar 200 conjuntos de teste (50 exemplos
de cada modulaca o). O classificador CSS-SVM necessita
de aproximadamente 138 ms. Neste tempos tambem esta
incluso o tempo necessario para inserir os coeficientes dos
classificadores.
Para sntese, foi escolhido o FPGA EP2C20F484C,
fabricado pela ALTERA e pertencente a` famlia Cyclone II.
Apos compilaca o no programa Altera Quartus II, foi possvel
obter um quantitativo dos recursos utilizado por um FPGA
para implementaca o do classificador CSS-SVM.
A Tabela I mostra o resumo do desempenho classificador
CSS-SVM.
TABELA I
DE RECURSO DO FPGA ALTERA EP2C20F484C AP OS
U TILIZAC AO
SI NTESE DO CLASSIFICADOR CSS-SVM UTILIZANDO Q UARTUS II.
Recurso
Elementos L
ogico (Total)
Func
oes Combinacionais
Registradores L
ogicos Dedicados
Total Registradores
Total de bits de mem
oria
Multiplicadores Embarcados
Tempo p/ 200 classificac
oes
Consumo
1.824 (10%)
1.586 ( 8%)
704 ( 4%)
704
64.000 (27%)
24 (46%)
138 ms
Observa-se que a arquitetura proposta e relativamente

eficiente no que diz respeito ao uso de recursos de FPGA, pois
sua construca o prioriza o uso de memoria e multiplicadores
disponveis no FPGA, dessa forma libera registradores
e elementos logicos para outras funco es do RC, como
demodulaca o e outras, alem da AMC.
VI. C ONCLUS OES

Neste trabalho foi abordado o processo de classificaca o
de modulaca o aplicada ao sensoriamento espectral de RC.
Uma implementaca o eficiente de classificaca o de modulaca o
foi apresentada, na qual se buscou minimizar o uso de
recursos do FPGA, capacidade de processamento em tempo
real e maximizar a probabilidade de acerto. Dessa forma,
desenvolveu-se um IP core consistindo de uma SVM para
funcionar como um classificador universal, isto e , que seja
69
independente do extrator de parametros adotado e customiza-lo

para a aplicaca o em questao.
Os resultados mostraram que e possvel realizar
classificaca o de modulaca o em tempo real utilizando-se
classificador SVM, para isso o mesmo foi implementado em
conjunto com o extrator de parametros CSS.
Este trabalho demonstrou que e possvel implementar de
forma eficiente e em tempo real classificadores recorrentes
nas publicaco es, mas que dificilmente apontam soluco es para
sua implementaca o de execuca o em tempo real ou com um
atraso relativamente curto.
Como trabalhos futuros pretende-se expandir a arquitetura
proposta, implementando outras tecnicas de extraca o de
caractersticas, objetivando investigar novas formas de reduzir
o uso dos recursos em FPGA, ou ainda diminuir o tempo
necessario para classificaca o.
R EFER E NCIAS
[1] Simon Haykin, David Thomson, and Jeffrey Reed, Spectrum sensing
for cognitive radio, Proceedings of the IEEE, vol. 97, pp. 849877,
May 2009.
[2] T. Yucek and H. Arslan, A survey of spectrum sensing algorithms for
cognitive radio applications, Communications Surveys Tutorials, IEEE,
vol. 11, no. 1, pp. 116 130, quarter 2009.
[3] O.A. Dobre, A. Abdi, Y. Bar-Ness, and W. Su, Survey of automatic
modulation classification techniques: classical approaches and new
trends, Communications, IET, vol. 1, no. 2, pp. 137 156, April 2007.
[4] Simon Haykin,
Cognitive radio: Brain-empowered wireless
communications, IEEE Journal on Selected Areas in Communications,
vol. 23, n. 2, pp. 144150, Fev. 2005.
[5] Jing Xiaorong He Tao, Modulation classification using arbf networks,
7th International Conference on Signal Processing, vol. 03, pp. 1809
1812, Aug. 2004.
[6] Ahmad Lynn, Tan Jo; Shaamerr, Automatic analysis and classification
of digital modulation signals using spectogram time frequency
analysis, International Symposium on Communications and Information
Technologies, pp. 916920, Oct. 2007.
[7] Xiu-Jie Meng, Ling-Ling ; Si, An improved algorithm of modulation
classification for digital communication signals based on wavelet
transform, IEEE Transactions on Aerospace and Electronic Systems,
vol. 03, pp. 12261231, Nov. 2007.
[8] Ettus research llc, 2011, http://www.ettus.com/.
[9] C.;A. Klautau Muller, F.C.B.F.; Cardoso, A front end for discriminative
learning in automatic modulation classification, Communications
Letters, IEEE, vol. 15, no. 4, pp. 443 445, april 2011.
[10] V. Vapnik, The nature of statistical learning theory, Springer Verlag,
1995.
[11] C. Nello and S. John, An introduction to support vector machines :
and other kernel-based learning methods, Cambridge University Press,
1 edition, March 2000.
[12] A.Lorena, Investigacao de estrategias para a geracao de maquinas
de vetores de suporte multiclasses, Ph.D. thesis, Instituto de Ciencias
Matematicas e de Computacao - USP, 2006.
[13] K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf, An
introduction to kernel-based learning algorithms, Neural Networks,
IEEE Transactions on, vol. 12, no. 2, pp. 181 201, Mar. 2001.
[14] Altera, Altera corporation., 2011, http://www.altera.com/.
[15] http://www.cs.waikato.ac.nz/ml/weka, visited in 2011.
[16] W. Su, J. L. Xu, and M. Zhou, Real-time modulation classification
based on maximum likelihood, IEEE Commun. Lett., vol. 12, pp.
801803, Nov. 2008.
[17] I.; Ata S. Shimbo, D.; Oka, An improved algorithm of modulation
classification for digital communication signals based on wavelet
transform, IEEE Radio and Wireless Symposium, vol. 03, pp. 567570,
Jan 2007.
ISSN 977-2177-128009
Asynchronous Control in CMOS Technology for

Synchronous FSMs Partitioned with OneHot Encoding
Duarte L. Oliveira, Luiz S. Ferreira, Lester A. Faria, No Alles
Diviso de Engenharia Eletrnica IEEA
Instituto Tecnolgico de Aeronutica ITA
So Jos dos Campos So Paulo Brasil
AbstractCurrently, many digital systems are described by an

architecture composed by synchronous finite state machines
(FSM) networks and data-paths. Generally, they are battery
powered and implemented in VLSI technology (Very Large Scale
Integration) what leads to the demanding of batteries that have
long useful life. In order to design Low Power devices, an
interesting approach is the partitioning of FSMs. A very
promising style of partitioning involves shared state memory
(SSM), encoding one-hot of the sub-FSMs, and the mechanism of
communication between sub-FSMs is realized by asynchronous
control (AC). The drawback in this style is the latency time of the
AC that significantly increases the cycle time of the sub-FSMs. In
this paper we propose a new AC for this style of partitioning
design. The AC is implemented in the full-custom architecture
(CMOS complex gate) pseudo-static gC. The new AC has the
latency time significantly reduced, therefore without influence the
cycle time final of the sub-FSMs.
I. INTRODUCTION
Currently, the advancement of microelectronics leads to
the designing of increasingly complex digital systems. Most of
these systems are battery powered and focus on different
applications such as wireless, laptops, aerospace (satellite and
missile), aviation, automotive and medical. Being battery
powered it is desirable that these devices have long useful life,
making power dissipation to be an important parameter in the
design of these systems [1,2]. In this context, the synthesis of
synchronous finite state machines (FSM) has an important role
in the design of digital circuits powered by batteries.
Many digital circuits are described by an architecture
composed by a network of controllers with data paths and/or
processors [2]. The synchronous controllers of these devices are
often described as FSM. They can always be specified by a
State Transition Graph (STG). Generally, these circuits are
implemented in the VLSI (Very Large Scale Integration)
technology. Digital circuits are implemented with CMOS
components. In this technology the major part of the power
dissipation occurs during switching (dynamic power),
consisting mainly of two parts: the combinatorial part, related
to the excitation and output equations, and the sequential part,
related to the flip-flops (FFs).
In digital circuits, the sequential part is the largest
contributor to the power consumption. Recent studies indicate
that the clock of these circuits consume a large percentage of
the total power (15% to 45%) [1]. This consumption is related
to the buffers, clock distribution network and registers.
Techniques for reducing dynamic power can be applied at
different levels of digital design [1]. Currently, the dynamic
70
power reduction methods to the synthesis of FSM are mainly

focusing in the clock control (clock-gated) [3] [4], doubleedges triggered flip-flops [5], partitioning of FSM [6-10],
multi-criteria state assignments [11][12] and logic minimization
and technology mapping [13,14].
The technique of partitioning of FSMs has shown
interesting results in the reduction of power. A very promising
style of partitioning involves shared state memory (SSM) and
the mechanism of communication between sub-FSMs is
realized by asynchronous control [15-18]. Oliveira et al. [19]
proposes a model efficient of design of partitioning for FSM
Moore model (Fig. 1). The sub-FSMs are encoded with the
one-hot code the communication control of sub-FSMs is
implemented as a Huffman machine (HM) with output
feedback and specified in FXBM (Flexible Extended BurstMode) [20]. Compared with the methods of [15-18], the
method proposed in [19] avoids the inclusion of extra states.
This style has the drawback in the partitioning that is an
increase in cycle time of FSM as shown in the equation 1
below, where Tp is the time of propagation and TLATENCY is the
time of latency of asynchronous control.
TCycle > TLATENCY + TP-L-EXCITATION + TSET-UP + TP-FF
(1)
In this paper we propose a new asynchronous control for

the model of design partitioning proposed in [19]. Our control
is also specified in FXBM and implemented in the full-custom
architecture pseudo-static gC (generalized C-element see Fig.
2). The use of the architecture pseudo-static gC allows a better
performance (latency time) of asynchronous control when
compared with the controls standard-cell proposed in [15-20],
therefore significantly reduces the cycle time of FSM. Yun [21]
proposed the method 3D to synthesize FSMs of XBM type in
architecture pseudo-static gC, but unfortunately his method
does not support FXBM. In this paper we propose an
implementation free of logic hazard of FXBM control in
architecture pseudo-static gC.
Figure1. Target architecture with two sub-FSM of [19].
ISSN 977-2177-128009
Figure 2. Target architecture for proposed control: pseudo-static gC.
II. THEORETICAL CONCEPTS

A. Partitioning of FSM
Partitioning has been shown to be a very effective
technique to reduce power in FSM [1,2]. The partitioning
methods start always from a, so called, monolithic FSM derived
from a STG, generating two or more sub-FSM [6-10,15-20].
In order to implement the partitioned sub-FSMs, different
methods have been proposed in literature. In all of them only
one sub-FSM is activated at a time. There are two kind of
possible classifications, based both on structural aspect and on
communication mechanism between sub-FSMs. Each one of
these classifications can be divided in two classes. Based on
structural aspect, they can be: 1) each sub-FSM has its own
state memory [7-10]. This class has the drawback of the high
cost synchronous communication between the sub-SFMS,
because introduces signals and additional states, as also adds
machine cycles. Another drawback is the increase in memory
total (sum of memories of the sub-FSMs); 2) all sub-FSM share
a single state memory, also called the Local State Memory [6]
[15-19]). According to the communication mechanism, the
methods can be divided in: 1) communication between subFSM is performed synchronously [6][8-10] and 2)
communication between sub-FSM is asynchronous, that is,
there is an asynchronous control to activate and deactivate the
sub-FSM [7][15-19] (see Fig. 5 and 6). Cao et al. [16-18] show
that designs that use shared state memory and asynchronous
control achieve higher power reduction, when compared to the
other kinds of designs.
B. One-hot Code to FSM
The one-hot code is characterized by having a state
variable for each one of the states of the FSM and, in every
state, only one of the variables achieves a high digital value (1)
at a time, remaining the other variables at a low digital value
(0). The one-hot encoded FSMs have interesting features [22].
For large FSMs, the one-hot code tends to present excitation
and output Boolean smaller equations than those that use binary
code [22].
Other important advantages of the one-hot code are its
reduction of glitches in the Boolean equations, increasing in
reliability and robustness to radiation, as showed by Cassel et
al. [23] once the Single Event Upset (SEU) and Single Effect
Transient (SET) can be more easily diagnosed. The major
drawback of the code is the increasing number of FFs, once
each state requires a FF. This drawback is highly decreased
with the partitioning of the FSM, as proposed in this work.
71
C. Asynchronous control
An asynchronous control (AC) is a Mealy asynchronous
FSM that serves to activate and deactivate the N existing subFSMs and operates in generalized fundamental mode [24]. It is
described in the flexible extended burst-mode specification
(FXBM), which allows multiple input changes [19].
AC_FXBM is described from the N coded STGs where the
state transitions (divided arcs) that cross the sub-FSM are
declared. The AC specification is obtained just in these
transitions. When the division is performed in an unconditional
state transition, the FXBM specification uses only the involved
state variable. This variable, in FXBM, is seen as a sensitive
signal to the transition and has a monotonic behavior.
The example of Fig. 3 shows the division of an
unconditional state transition B G. Fig. 3a shows its
description in FXBM. When the division is performed in a
conditional cross-state transition, the FXBM specification uses
both the state variables and the input signals. The input signals
do not have a monotonic behaviour (allow glitches) and can be
sensitive to the level (symbol <signal>) and sensitive to the
transition [23]. The example of Fig. 7 shows the conditional
state E, with a cross-state transition E C. Fig. 3b shows its
description in FXBM. There are conditional states where the
transitions cross to another FSM, for example, the state D
(D E and D G), seen in Fig. 7. So, in this specification,
FXBM it is treated as unconditional.
Figure 3. Part of a FXBM specification: a) unconditional; b) conditional.
D. Synthesis procedure
The method of synthesis used starts from the STG
specification that describe the monolithic FSM and is
implemented in the architecture presented in Fig.1. The steps
are related to partitioning and synthesis of the sub-FSMs [18]:
a) state minimization of the STG; b) partitioning in N subFSMs; c) state assignment of the N sub-FSMs on the one-hot
code; d) logic minimization of the N sub-FSMs. The last step is
related exclusively with the synthesis of the synchronous
communication control. Specify and synthesize through the
operation table the synchronous control (signals Zs) for STGs
minimized and encoded of the N sub-FSMs.
III. STUDY OF CASE
In order to show the efficiency of the proposed method, it
is presented a study of case where a detector of two sequences
is analyzed (0101 and 1010). Despite being small sequences,
this is an excellent example of the method application because
the same procedures can be used for larger sequences as well.
Fig. 4 shows the minimized Moore STG of the detector (step
1). Step 2 performs partitioning.
ISSN 977-2177-128009
The division was made in the state transitions D E and

I A of the STG, generating partitions P1 (A, B, C, D) and P2
(E, F, G, H, I) (see Fig. 4). The third step performs the one-hot
encoding. The sub-FSM-2 from P2 requires five states
variables. The initial states D and I of cross-transitions of the
two sub-FSMs are encoded with the same code
(Q1Q2Q3Q4Q5=00010). The other states of both sub-FSMs may
have equal one-hot codes, but different than those used to D
and I (see Fig. 5 and 6). Step 4 performs the logic minimization
of sub-FSM-1 and sub-FSM-2 with the signal Z, which is
responsible by activating and deactivating the sub-FSMs. The
logic circuits of both sub-FSMs can be seen in Fig. 7 to 9. The
Boolean equations of excitation are below:
Equations of excitation of sub-FSM-1:
DA=(XQ4 + XQ3 + XQ2 + XQ1)Z
DB = XQ1 Z
DC = X Q2 Z
DD = X Q3 Z
Equations of excitation of sub-FSM-2:
DE = (X Q4 + X Q2 + X Q3 + XQ5 + XQ1)Z
DF = X Q1Z
DG = X Q2Z
DH = X Q3Z
DI = XQ5Z
Figure 7. Encoded STG specification: sub-FSM-1
A. Synthesis: asynchronous control

The FXBM specification of asynchronous control is
obtained from the STGs encoded of the respective sub-FSMs
(see Fig. 5 and 6) and of the order of activation and
deactivation of the two sub-FSMs with the signal Z. The first
cut occurs in the state transition D E, which is a cut
conditional. The second cut occurs in the state transition I A
that is an unconditional cut. As the state D is a conditional one,
it must be added the input signal X in asynchronous control.
The states D and I are encoded with the signal Q4. The
asynchronous control is initialized (state 0) with the sub-FSM1
activated in state A. Fig. 16 shows FXBM specification and the
Fig. 10 shows the FXBM flow table corresponding with the
signal Y to eliminate conflicts. Fig. 11 and 12 shows the
synthesis and the full-custom logic circuit of control.
Figure 4. STG specification: division on the D E and I A transitions.

Figure 10. FXBM specification of the asynchronous control.
X
X
01000
10000
E/11
F/00
00100
X
00001
X
G/00
H/10
Q4
Y Z
I/11
0 0
0
00 00
0 1
1 1
00
01
00010
X
00
01
11 11
10 00
Figure 6. Encoded STG specification: sub-FSM-2.
00
01
1 0
0
10
10
4
11
3
11
10
10
00
Figure 11. Table of flow of the FXBM asynchronous control.
72
ISSN 977-2177-128009
[2]
[3]
[4]
[5]
[6]
[7]
Figure 12. Logic circuit gC asynchronous control: a) Y; b) Z.
V. DISCUSSION & RESULTS

Two main advantages can be seen immediately in ours
method. In this method it is not needed an extra state, as
implemented by the other established methods, therefore
leading to an economy of one cycle. Other important features
are related to the one-hot code that leads to excitation and
output Boolean smaller equations, reduction of glitches in the
Boolean equations, increasing in reliability and robustness to
radiation.Doing qualitative analysis in relation to the used area
(number of transistors see Fig. 13) of asynchronous controls
for application detector of two sequences, we have: Two
implementations standard-cell and two full-custom (CMOS
static and pseudo-static gC). Our proposal (gC machine)
respectively had a reduction of number of transistors 34%, 47%
and 45%.
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
Figure 13. Results: asynchronous controls.
[18]
VI. CONCLUDING REMARKS

It was proposed a new asynchronous control for the FSM
partitioning method that use shared memory, asynchronous
communication and one-hot encoding. The use of one-hot code
in this implementation allows smaller excitation and output
Boolean equations than those that use binary code. It achieves,
as well, a reduction of glitches in the Boolean equations,
increasing the reliability and robustness to radiation of the
system. The major drawback is cycle time of sub-FSMs. The
solution full-custom of the gC asynchronous control helps
minimize the problem of the cycle time.
REFERENCES
[1]
L. Benini and G. De Micheli, Systems-Level Power Optimization:

Techniques and Tools, ACM Trans. on Design Automation of Electronic
System, 5(2), pp. 115-192, April 2000.
73
[19]
[20]
[21]
[22]
[23]
[24]
Li-Chuan Weng, X. J. Wang, and Bin Liu, A Survey of Dynamic Power

Optimization Techniques, Proc. Of the 3rd IEEE Int. Workshop on
System-on-Chip for Real-Time Applications, pp. 48-52, 2003.
L. Benini and G. De Micheli, Automatic Synthesis of Low-Power GatedClock Finite-State Machines, IEEE Trans. on CAD of Integrated Circuits
and Systems, Vol.15, No.6, pp.630-643, June 1996.
Q. Wu, M. Pedram, and X. Wu, Clock-Gating and Its Application to
Low Power Design of Sequential Circuits, IEEE Trans. on Circuits and
Systems-I: Fundamental Theory and Applications, vol. 47, no.103,
pp.415-420, March 2001.
P. Zhao, J. McNeely, et al., Low-Power Clock Branch Sharing DoubleEdge Triggered Flip-Flops, IEEE Trans. on VLSI Systems, vol. 15, no.3,
pp.338-345, March 2007.
S. H. Chow, et al., Low Power Realization of Finite State Machines A
Decomposition Approach , ACM Trans. on Design Automation of
Electronic System, 1(3), pp. 315-340, July, 1996.
B. Oelmann, et al., Asynchronous Control fo Low-Power Gated-Clock
Finite State Machines, Proc. IEEE Int. Conf. on Electronics, Circuits
and Systems, pp. 915-918, 1999.
L. Benini and G. De Micheli, Synthesis of Low-Power SelectivelyClocked Systems from High-Level Specification, ACM Trans. on
Design Automation of Electronic System, 5(3), pp. 311-321, Julyl 2000.
J. C. Monteiro and A. L. Oliveira, Implicit FSM Decomposition Applied
to Low-Power Design, IEEE Trans. on VLSI Systems, Vol. 10, No. 5,
pp.560-565, October 2002.
B. Liu, et al., FSM Decomposition for Power Gating Design Automation
in Sequential Circuits, 76th Int. Conf. on ASIC, ASICON, pp.944-947,
2005.
M. Koegst et al., Multi-Criterial State Assignment for Low Power FSM
Design. Proc. 24th EUROMICRO Conference, pp.261-268, 1998.
P. Baccheletta et al. Low-Power State assignment Techniques for Finite
State Machines, IEEE Int. Symposium on Circuits and systems, Geneva,
Switzeriand, pp.641-644, May 2000.
S. Iman and M. Pedram, Two-level Logic Minimization for Low
Power, IEEE/ACM Conf. Int. on CAD Digest of Technical Papers,
pp.433-438, 1995.
J.-Mou Tseng and J.-Yang Jou, A Power-Driven Two-Level Logic
Optimizer, Proc. Of the ASP-DAC, pp.113-116, 1997.
B. Oelmann, et al., Automatic FSM Synthesis for Low-Power Mixed
Synchronous/Asynchronous Implementation, Journal of VLSI Design,
Special Issue on Low-Power Design, vol. 12, no.2, pp. 167-186, 2001.
C. Cao and B. Oelmann, Mixed Synchronous/Asynchronous State
Memory for Low Power FSM Design, Proc. of the EUROMICRO
Systems on Digital System Design,Rennes, France, pp.363-370, 2004.
C. Cao, et al., Synthesis Tool for Low-Power Finite-State Machines with
Mixed Synchronous/Asynchronous State Memory, IEE Proc. Comput.
Digit. Tech. vol. 153, no. 4, pp.243-248, July 2006.
C. Cao and B. Oelmann, Low-Power State Encoding for Partitioned
FSMs with Mixed Synchronous/Asynchronous State Memory,
Integration the VLSI Journal, vol. 41, pp.123-134, 2008.
D. L. Oliveira, et al., Asynchronous Control for Low Power FSM with
One-Hot encoding, Proc. XIII SIGE, So Jos dos Campos, Brazil, 2011.
O. Kraus and M. Pedeffke, XBM2PLA: A Flexible Synthesis Tool for
Extended Burst Mode Machines, Proc. of the Design Automation and
Test in Europe Conference and Exhibition, pp.1301-1303, 2003.
K. Y. Yun, Automatic Synthesis of Extended Burst-Mode Circuits Using
Generalized C-elements, Proc. European Design Automation
Conference, pp. 290-295, September 1996.
G. Sutter, et al. Low-power FSMs in FPGA: Encoding Alternatives,
Lect. Notes Comput. Sci., pp. 459-467, 2451, 2002.
M. Cassel and F. L. Kastensmidt, Evaluating One-Hot Enconding Finite
State Machine for SEU Reliability in SRAM-based FPGAs, Proc. 12th
IEEE Int. On-line Testing Symposium, 2006.
C. J. Myers, Asynchronous Circuit Design, Wiley & Sons, Inc., 2004,
2a edition.
ISSN 977-2177-128009
Synthesis of Asynchronous Digital Systems of High

Performance using Simpler Approach
Duarte L. Oliveira, No Alles, Lester A. Faria, Diego Bompean, Thiago Curtinhas
Diviso de Engenharia Eletrnica
Instituto Tecnolgico de Aeronutica
So Jos dos Campos So Paulo Brazil
Asynchronous circuits have several advantages when

compared to their synchronous counterparts: they do not present
clock skew, also they lower the power consumption.
Asynchronous circuits are more robust to noise, temperature
variations, technology migration, and low electromagnetic
emission [6]. The main disadvantage of asynchronous circuits is
the difficulty to design them hazards-free and critical races [4-6].
There have been several attempts to demonstrate the potential
advantages of the asynchronous circuits over its synchronous
counterpart [711].
AbstractSynchronous digital systems have been presenting

serious problems of implementation in DSM (deep sub-micron)
VLSI technology. The asynchronous paradigm has become an
interesting alternative. In this paper we propose a practical
methodology for synthesis of asynchronous digital systems. It part
of the RTL (Register Transfer Level) description of synchronous
systems, which is composed of synchronous finite state machine
(SFSM) + datapath. The proposed method synthesizes the
asynchronous system in the decomposition style, which is defined
by XBM_AFSM (extended burst-mode asynchronous finite state
machine) + datapath synchronous. It uses a simpler procedure to
convert the specification of SFSM in the XBM specification, which
describes the XBM_AFSM. Through a case study we show the
simplicity of the methodology and that result in high-performance
circuits.
I.
Asynchronous project methodologies [4,6] can naturally

eliminate such challenges by removing the clock signal from the
design. Different classes of asynchronous circuits may be used to
implement, by example Systems on Chip (SoCs) that are built
from completely asynchronous modules, but these circuits are not
a widely accepted solution. The reasons are: a) lack of reliable
tools for asynchronous design; b) difficulties from hazard-free
designing and testing; c) limited culture on asynchronous design;
d) lack of asynchronous IPs [12]. However, some asynchronous
digital systems design methodologies have been developed
successfully in the recent years [13-15]. The decomposition style
(data-path + controller (asynchronous finite state machine
AFSM)) is very promising.
INTRODUCTION
VLSI (Very Large Small Integration) circuits have been

increasing quickly. This growth comes from reducing transistor
dimensions (MOS Deep-Sub Micron MOS-DSM), increasing
the number of transistor, and increasing the operation frequency.
The DSM-MOS technology needs to operate with low noise, the
difference between the delay maximum and minimum in wires
and gates is higher when compared to other MOS technologies,
and the delay in wire may be greater than the delay on a gate [1].
Synchronous digital systems (see Fig. 1) use a global clock signal
to synchronize their operations and are quite popular due to
simplicity of design. There is also an abundant offer of
commercial CAD tools for automatic synthesis. A serious
problem in the MOS-DSM technology is the living together of
the global clock signal, because it is a major cause of noise,
electromagnetic emission high, consumes a significant portion of
the power and to define the distribution of the clock signal is a
task with increasing complexity (for example clock skew).
Timing analysis of synchronous digital circuits of high
integration (MOS-DSM) is extremely difficult. In a digital
system the sequential part is the main contributor to dynamic
power dissipation [1,2]. Recent studies have shown that in such
systems the clock consumes a large percentage (15% to 45%) of
the power system [3]. An interesting alternative to digital design
because it eliminates the problems caused by the clock signal is
the asynchronous paradigm.
In this paper we propose a simple methodology for synthesis

of asynchronous digital systems. It is very familiar to the
designers of synchronous digital systems. The proposed
methodology part of the Register Transfer Level (RTL)
description of the synchronous system (see Fig. 1): synchronous
finite state machine (SFSM) + data-path. It synthesizes the
asynchronous system in the decomposition style that is similar to
the synchronous paradigm: asynchronous controller (extended
burst-mode asynchronous finite state machine XBM_AFSM
[16-18]) + data-path synchronous (see Fig. 2). Our methodology
proposes a simple procedure that converts the specification of a
SFSM in the XBM specification, which describes the
XBM_AFSM and implements the system in architecture of Fig.
2. The XBM_AFSM is synthesized in the GRS architecture [19].
We also propose a protocol converter so that our resulting
asynchronous systems if communicates with the external
environment in protocol handshaking of two-phase allowing
better performance (see Fig. 2).
74
ISSN 977-2177-128009
the environment or is realized in the fundamental mode (FM) (for

example: burst-mode and extensions) or input/output mode
(M_I/O) (for example: DI, SI, QDI). In FM the change of a new
set of inputs the circuit should be stabilized. In M_I/O the change
of an output signal can immediately enable the change of an input
signal.
The rest of the paper is organized as follows. In section II an

overview of the asynchronous logic. Section III a summary of the
specifications for FSMs. Section IV presents our methodology.
Section V illustrates our method with an example. Section VI
presents simulation. Section VII our conclusions.
The asynchronous design style by decomposition has two

variants: a) project that involves asynchronous data-path, so
without any delay insertion. The signal transition graph (Petrinet) specification is more appropriate to describe the controller
that will interface with the asynchronous data-path [24,25]. The
asynchronous data-path involves dual-rail components that have
a very high cost; b) project that involves synchronous data-path
(single-rail components), a delay is inserted, and it defines the
cycle time of the transition state (see Fig. 2). The extended burstmode (XBM) specification is more appropriate to describe the
controller that will interact with the synchronous data-path
[17,18]. The variant (b) entails in circuits with lower-power
consumption, smaller area and simplifies the project when
compared with the variant (a). In recent years, some real-life
asynchronous circuits have been designed successfully and
efficiently based on extended burst-mode asynchronous
controllers [10,11].
Figure 1. Synchronous architecture: general structure
D a ta _ in
C o n tr o l
Figure 2. Proposed target asynchronous architeture.
D a ta _ o u t
R i
II.
A o
OVERVIEW: ASYNCHRONOUS LOGIC
Asynchronous digital systems operate by events and not have

a global signal that synchronizes operations. Synchronization is
performed by handshaking protocol type and the system is seen
as a functional block that interacts with the environment in this
type of protocol (see Fig. 3) [4,6]. Asynchronous digital system
can be designed in three different styles [4,6,21,22]: a)
decomposition in datapath + controller; b) micropipeline; c)
composition with macro-modules. The three styles can be
designed in different classes of asynchronous circuits. The class
defines in which delay model the circuit operates correctly and in
which operation mode the circuit if communicates with the
environment [6]. The asynchronous circuits can be classified into
two classes: a) bounded gate and wire delay; b) unbounded gate
and wire delay. Two important asynchronous circuits that follow
the delay model (a) are: burst-mode circuits and extensions [14
16] and timed circuit [6]. The delay insensitive (DI) circuit
follows the delay model (b) [22,23]. This model is more robust
and free of any timing analysis, but the application of this model
is very limited. Two variants less restrictive of this delay model
are: a) speed-independent (SI) circuit which obeys the model
where the delay in gates is unbounded (indefinite), but finite and
in wires the delay is zero [6,24]; b) quasi delay insensitive (QDI)
circuits which obey the model (b), but with the isochronic fork
restriction. This restriction says that wires with fan-out > 1 (fork)
the delays are equal [6,22]. The communication of circuit with
F u n c t io n a l
B lo c k
R o
A i
Figure 3. Functional block: asynchronous communication
III.
SPECIFICATION: FINITE STATE MACHINES
The State Transition Graph (STG) specification is a popular

and formal form to describing Finite State Machines (FSM),
where the vertices represent states and the edges represent state
transitions. The STG contains a finite number of states and
states transitions. It describes FSM in the Moore or Mealy
model. The FSMs are described and synthesizes in the
synchronous and asynchronous paradigms. For each paradigm
the description of FSMs through the STG has its own syntax.
Definition 1. (STG) Let G = {V, A, R} a STG, where:
V = {A1, A2, ...,AN} is a set of vertices; A = {E1, E2, .., EM} is a
set of edges; R = {W1, W2, .., WM} is a set of labels, such that
Wj, where 1 j M, is the label of the edge Ej and Wj can be
empty.
A. Asynchronous paradigm
In this paradigm, Nowick [16] proposed a STG denominated
Burst-Mode (BM). Transitions may occur when one or multiple
inputs/outputs (bursts) change their logic level, 0 1, or 1 0
(transition sensitive signals TSS). When there is no input
75
ISSN 977-2177-128009
exclusion). Fig. 5 shows the STG that describes a Mealy model

SFSM. All states satisfy the property of mutual exclusion (see
Fig. 5).
change, the machine remains in its stable state. The input/output

bursts must be monotonic, i.e., they can change only once during
each transition. An initial state must exist. In BM_STG, the state
transition is labeled with input burst/output burst, where output
burst may be empty and only the labeled signals in the state
transition can change.
/ x y
c
a
A
a
/ x
/ x y
/ x y
y
c / x
a / x y
The BM_STG must satisfy three properties to be in condition

to implementation [16]: 1. polarity of the signals, which is
switching a signal transition {+,}; 2. unique entry point 3.
maximal set. The polarity property defines the condition of stable
state flow. This property eliminates the oscillation of states and
creates the concept of stable states and unstable states. It
facilitates the free critical race states encoding. The property
unique entry point allows in the step of logic minimization a
logical coverage hazard-free logic. For an asynchronous FSM
have a behavior deterministic the branches (decisions) of
BM_STG must satisfy the property maximal set. Yun [17,18]
proposed a STG denominated extended burst-mode (XBM)
specification adding two features: directed dont care signals
(which allow an input signal to change concurrently with output
signals) and conditionals (which depend on level sensitive signals
(LSS) with non monotonic behavior). The restrictions mentioned
above were generalizes to allow the extension proposed by Yun
[17,18].
C
a
/ x y
D
c / x
/ x y
/ x
Figure 5. Mealy model STG
C. Convert STG in XBM_STG specification

The conversion procedure is realized in three steps: 1)
convert the Mealy STG in Mealy homogeneous STG; 2) convert
the Mealy homogeneous STG in XBM_STG; 3) get the
optimized XBM_STG.
Preposition 1. (Mealy homogeneous STG) Let G = {V, A,
R1,R2} a STG, where: V = {A1, A2, ..., AN} is a set of vertices; A
= {E1, E2, .., EM} is a set of edges; R1 = {W1, W2, .., WM} is a set
of labels of input signals, such that Wj R1 is the label of the
edge Ej; R2 = {Z1, Z2, .., ZM} is a set of labels of output signals,
such that Zj R2 is the label of the edge Ej. Says that the STG is
Mealy homogeneous if only if Ai V have that all edges Ek
A that incoming Ai are labeled with the same value of Z.
We illustrated this paradigm with the benchmark SCSI. Fig. 4

shows an XBM specification of SCSI with 4 inputs (Cntgt1, Frin,
Ok, Rin), 2 outputs (Aout,Frout) and initial state 0. The
description Rin+ Frin- / Aout+ in transition 53 means that the
output (Aout: 0
1) will follow the input burst (Rin: 0
1 AND
Frin: 1
0). The level sensitive signal cntgt1 is used to describe
the mutual exclusion between transitions 36 and 34. The
directed dont care signal Rin* in transition 45 mean that Rin
may either change its value or remain in its old value. All state
transition should have at least a signal denominated compulsory.
A compulsory signal is an input signal that in the previous state
transition it is not directed don't-care.
In Fig. 5 the states A and D are homogeneous and the states B

and C are heterogeneous. The conversion algorithm visits all
states of the STG Mealy, to verify if the state is homogeneous or
heterogeneous. The heterogeneous state is decomposed in the
number of states related to the number of different outputs. The
state C in Fig. 5 has two edges incoming with three different
outputs, then the state C is decomposed into three states. Fig. 6
shows the homogeneous STG, where the conversion of the states
B and C were realized. The second step converts the Mealy
homogeneous STG in XBM_STG. It is composed of two steps.
The first steps decomposes each state with self-transition in two
states, if self-transition is labeled by signals LSS. Two edges are
inserted and labeled with the protocol handshaking. The protocol
uses the signals Bt (Begin transition) and Ct (Completion
transition) of the proposed architecture for the implementation of
the state transition (see Fig. 2). Optimization of XBM_STG seeks
to eliminate all paths of state transitions in which the polarity of
the protocol is not violated, and the state is not conditional ((see
Section V). To increase the performance of the proposed
architecture uses a protocol converter that allows communicating
with the external environment in protocol handshaking of twophases. Externally, the architecture uses the signals Ri, Ai, Ro and
Ao, but internally the communication is done only with the
signals Start and Fine (see Fig. 2).
Figure 4. SCSI: Extended Burst-Mode (XBM) specification
B. STG: synchronous paradigm

In the synchronous paradigm the STG only manipulates level
sensitive signals (inputs or outputs) [17,18]. In this paradigm the
STG should satisfy only the maximal set property (mutual
Fig. 7a shows the conversion of the C'' state of the STG in

Fig. 6. The second step replaces each state transition of the STG
by two state transitions and an intermediate state, with the labels
76
ISSN 977-2177-128009
on XBM + protocol handshaking. Of the STG the input signals

are described as LSS signals and output signals as TSS in the
XBM_STG. Fig. 7b shows the conversion of state transition C
C'' of the STG in the XBM_STG. Fig. 8 shows the general
structure of specification, where the vertices of processing is
XBM and other two vertices are to communication with the
environment (see Fig. 2 and Fig. 4).
D a te
Mux 2x1
the datapath generated in the RTL must be converted in the

asynchronous paradigm. The conversion of the STG in
XBM_STG is realized as seen in the previous section. Fig. 10
shows the BM specification of the protocol converter proposed.
Fig. 11 shows the logic circuit of the converter that was
implemented in Miri tool [19].
D
S E T
C
C LR
D a te
S ET
Q
C LR
C LK
CLK
(a )
(b )
D a te
SE T
C
CLR
(c )
Figure 9. Registers: a,b) synchronous paradigm; c) asynchronous paradigm.

Figure 6. Mealy homogeneous STG
Figure 10. Burst-mode specification: protocol converter

Figure 7. Conversion of STG in XBM_STG: a) auto-transition; b) state
transition
Figure 8. General structure: specification

Figure 11. Logic circuit: protocol converter
IV.
METHODOLOGY: ASYNCHRONOUS SYSTEMS
The proposed methodology aims to synthesize asynchronous

systems in the decomposition style: asynchronous controller +
synchronous datapath. The methodology follows the steps
traditional of the behavioral and logic synthesis [18,26]. The
methodology is divided into three steps: a) synchronous
behavioral synthesis (RTL description); b) conversion of the STG
of controller in XBM_STG; c) synthesis by direct mapping of
XBM controller. The behavioral synthesis starts of a description
of the algorithm to be implemented and generates a RTL
description. It can be realized manually [26] or using tools of
synchronous paradigm. The RTL description is composed by the
STG specification of synchronous controller and datapath
compound by synchronous components. Fig. 9 shows the
structure of the registers in the two paradigms. The registers of
V.
CASY OF STUDY
We illustrate our methodology with the benchmark GCD

(Greatest Common Divisor) well-known. The first step
generates from the VHDL description the RTL description as is
shown in Fig. 12 and 13. The registers R1 and R2 are already
placed in the asynchronous paradigm. The description was
obtained using the concepts of synchronous behavioral synthesis
[26]. Fig. 14 and 15 show the structure of GCD asynchronous.
The second step generates the XBM_STG. Using the procedure
shown in Section V, Fig. 16 and 17 shows the XBM_STG initial
and XBM_STG optimized of the GCD. The signals start, fine Bt
and Ct of the architecture (see Fig. 15) is replaced in XBM-STG
(see Fig. 8 and 17). Fig. 18 shows the XBM_STG with state
77
ISSN 977-2177-128009
signal Z0 to resolve the conflicts. Fig. 19 shows the logic circuit

of the XBM controller that was synthesized by Miri tool [19].
Figure 17. Otimized XBM_STG of GCD controller

Figure 12. Datapth synchronous: GCD
Figure 13. STG of GCD
Figure 18. Otimized XBM_STG controller: free of conflits

8
8
X 1
Y 1
8
R e s
R i
G C D
R o
A o
A i
Figure 14. Functional block of GCD
Figure 15. Generate architecture: GCD
Figure 19. XBM controller: logic circuit (a) and (b)
VI.
In order to demonstrate the feasibility of the method was

simulated the benchmark GCD (section V) in the Altera
QUARTUS II version 9.1 to the target family CYCLONE III in
device EP3C25F324C6. Fig. 20 and 21 show simulations of the
asynchronous GCD and XBM controller. In the case, was
realized the GCD of the numbers: X1=180 and Y1 = 63,
obtaining the final solution Res = 9 and processing time of 247.5
ns (between tRi+ and tRo+). Second calculation X1=140 and
Figure 16. XBM_STG of GCD controller
DISCUSSION & SIMULATION
78
ISSN 977-2177-128009
[4]
S. Hauck, Asynchronous Design Methodologies: An Overview, Proc. of

the IEEE, Vol. 83:1 pp.69-93, January 1995.
[5] S. H. Unger, Hazards, Critical Races, and Metastability, IEEE
Transaction on Computer, June 1995, Vol. 44:6, pp. 754-768.
[6] C. J. Myers, Asynchronous Circuit Design, Wiley & Sons, Inc., 2004, 2a
edition.
[7] S. M. Nowick et. al, The Design of a High Performance Cache Controller:
A Case Study in Asynchronous Synthesis Integration, the VLSI Journal,
Vol. 15, no 3, pp. 241-262, October 1993.
[8] S. B. Furber, et al., AMULET2e: An asynchronous embedded controller,
Proc. IEEE, vol.87, pp.243-256, Feb. 1999.
[9] H. Van Gageldonk, et al., An asynchronous low-power 80c51
microcontroller, Proc. Int. Symp. Adv. Res. Asynchronous Circuits
Systems, pp.96-107, 1998.
[10] K. Y. Yun, et al., "The design and verification of a high-performance lowcontrol-overhead asynchronous differential equation solver," IEEE
Transactions on VLSI Systems, vol. 6, no 4, pp.643-655, Dec.1998.
[11] S. Rotem, et al., "RAPPID: An asynchronous instruction length decoder,"in
Proc. International Symposium on Advanced Research in Asynchronous
Circuits and Systems, pp. 60-70, April, 1999.
Y1=162, with Res=2 and processing time of 355.4 ns (between

tRi- and tRo-). The asynchronous GCD needed: 61 LUTs and 23
FFs. For the controller were 14 LUTs, data-path 29 LUTs and 16
FFs, delay element 15 LUTs and 7 FFs, and converter of
protocol were 3 LUTs.
[12] W. Hardt, et. al., Architecture Level Optimization for Asynchronous IPs,
Proc. 13th Annual IEEE Int. Conf. ASIC/SOC, pp.158-162, 2000.
Figure 20. Simulation: asynchronous GCD
[13] L. A. plana, S. M. Nowick, Architectural Optimization for Low-Power

Nonpipelined Asynchronous Systems, IEEE VLSI Systems, vol. 6, no. 1,
pp.56-65, March 1998.
[14] S. F. Nielsen, J. Sparso, J. Madsen, Behavioral synthesis of Asynchronous
Circuits Using Syntax Directed Translation as Backend IEEE Trans. on
VLSI Systems vol. 17, no. 2, February 2009.
[15] T. Chelcea, et al., Burst-Mode Oriented Back-End for the Balsa Synthesis
System, Proc. of DATE Conf. and Exhibition, 2002.
[16] S. M. Nowick, Automatic Synthesis of Burst-Mode Asynchronous
Controllers, Ph.D. thesis, Stanford University, 1993.
[17] K. Y. Yun, Synthesis of Asynchronous Controllers for Heterogeneous
Systems, Ph.D. thesis, Stanford University, 1994.
[18] K. Y. Yun and D. L. Dill, "Automatic Synthesis of Extended Burst-Mode
Circuits: Part I (Specification and Hazard-.Free Implementation) and Part
II (Automatic Synthesis)," IEEE Trans. on CAD of Integrated Circuit and
Systems, Vol. 18:2, pp. 101-132, Feb. 1999.
[19] D. L. Oliveira, et al., Miri: a CAD toll synthesize multi-burst controllers
for heterogeneous systems, Microelectronics Reliability, 43 (2003) 209213.
[20] T. Konishi, N. Hamada, H. Saito, A Control Circuits Synthesis Method
for Asynchronous Circuits in Bundled-Data Implementation, 7th Int.
Conf. on Computer and Information Technology, pp.847-852 2007.
[21] H. Saito, et al., Control signal Sharing Using Data-Path Delay Information
at Control Data Flow Graph Description, Proc. Of 9th Int. Symposium on
Asynchronous circuits and Systems, 2003.
[22] A. J. Martin, Compiling Communicating Process into Delay-Insensitive
VLSI Circuits, Distributed Computer, vol.1, no.3, pp.226-234, 1986.
[23] E. Brunvand e R. F. Sproull, Translating Concurrent Programs into DelayInsensitive Circuits, Proc. ICCD, pp.262-265, 1989.
[24] T. -A. Chu, Synthesis of Self-Timed VLSI Circuits from Graph-Theory
Specifications, Ph.D. thesis, June, 1987, Dept. of EECS, MIT.
[25] J. Cortadella, et al., Petrify: A tool for manipulating concurrent
specifications and synthesis of asynchronous controllers, IEICE Trans.
Inf. Syst., vol.E80-D, no. 3, pp.315-325, March 1997.
[26] D. D. Gajski, Principles of Digital Design, Prentice Hall, 1997.
.
Figure 21. Simulation: XBM controller.
VII. CONCLUSIONS
In this paper we show that synchronous and asynchronous
paradigms may be used together in the design of an
asynchronous system. The decomposition style is familiar to
synchronous paradigm and can be used in the high-performance
asynchronous design. We proposed an architecture that
communicates effectively with the external environment through
the protocol of two phases. We also propose an algorithm to
convert a STG of a synchronous FSM for XBM_STG. For
future work, develop a tool for the methodology proposed.
REFERENCES
[1]
[2]
[3]
D. Goldhaber-Gordon, et al., Overview of Nanoelectronic Devices, Proc.

of the IEEE, vol. 85, No. 4, April 1997, pp.521-540.
C. Constantinescu, Trends and Challenges in VLSI Circuits Reliability,
IEEE Micro, 23 (4), 2003.
I. E. Sutherland, and J. Ebergen, Computers without clocks, Sci. Amer.,
pp. 62-69, August 2002.
79
ISSN 977-2177-128009
Dimensionamento de Portas Logicas e de
Transistores Minimizando Atraso e Area

Gracieli Posser, Guilherme Flach, Gustavo Wilke, Ricardo Reis
Universidade Federal do Rio Grande do Sul (UFRGS)
Instituto de Informatica - PPGC/PGMicro
Av. Bento Goncalves 9500 Porto Alegre, RS - Brazil
Email: {gposser,gaflach,wilke,reis}@inf.ufrgs.br
a resistencia da sada e reduzida, e

a capacitancia de entrada e aumentada, dando maior carga
capacitiva a` porta que a esta carregando.
Portanto, dimensionamento de portas e de transistores sao
problemas de otimizaca o onde o objetivo e minimizar o atraso
sujeito a uma restrica o de a rea, como tamanho maximo do
transistor e a rea maxima do circuito. Os dimensionamentos
de transistores e de portas tambem podem ser definidos para
minimizar potencia ou a rea dando uma restrica o de atraso.
Uma forma eficiente de modelar o problema de dimensionamento e usando Programaca o Geometrica (PG) [1]. PG e
um metodo de otimizaca o matematico capaz de encontrar um
o timo global se um existir em um tempo polinomial.
Para modelar um PG, as equaco es de atraso e a rea devem
ser descritas usando funco es posinomiais. O atraso pode ser
facilmente modelado para uma funca o posinomial usando o
modelo de atraso de Elmore [2] enquanto a a rea do circuito e
uma funca o linear da largura dos transistores, tornando-se uma
funca o linear. As variaveis sao os fatores de escala associados
a cada transistor.
A ferramenta de dimensionamento de portas desenvolvida
neste trabalho e para circuitos CMOS e pode ser configurada
para varias tecnologias de fabricaca o CMOS. Alem disso, a
otimizaca o pode ser feita de duas maneiras: (1) minimizaca o
de atraso sujeita a` uma restrica o de a rea e (2) minimizaca o de
a rea sujeita a` uma restrica o de atraso.
Neste trabalho, ambos dimensionamento de portas e de
transistores sao executados. Cada porta e modelada usando
o modelo RC (Resistencia e Capacitancia). Como entrada, e
fornecida a topologia do circuito, a maxima capacitancia de
entrada, a a rea maxima e o tamanho maximo da porta logica.
As contribuico es desse trabalho sao:
uma ferramenta de dimensionamento de portas e de transistores mais precisa combinando os trabalhos [3] e [4]
que usam Programaca o Geometrica e que pode ser usada
junto com uma ferramenta de geraca o automatica de
leiautes aproveitando a possibilidade de utilizar tamanhos
contnuos de dimensionamento de portas evitando problemas de arredondamento das abordagens tradicionais de
dimensionamento de portas;
uma ferramenta de dimensionamento de portas e de transistores configuravel para varias tecnologias de fabricaca o
alterando os parametros referentes a` tecnologia;
Resumo Neste trabalho uma ferramenta de dimensionamento de portas e de transistores baseada em Programaca o
Geometrica (PG) e apresentada, onde o atraso e calculado usando
o modelo de Elmore. A otimizaca o pode ser feita com o objetivo
de minimizar atraso e/ou a rea (potencia). Uma comparaca o entre
o dimensionamento de portas e de transistores e apresentada
para analisar o custo-benefcio entre o tempo de execuca o e
o atraso mnimo alcancado. Para qualificar nossa abordagem,
os circuitos de teste do ISCAS85 foram mapeados para uma
tecnologia de 45nm. Apos foram executados os dimensionamentos de portas e de transistores minimizando o atraso. O
dimensionamento de portas comparado aos tamanhos de celulas
encontradas numa biblioteca comercial reduziu o atraso em 21%,
em media, considerando os mesmos valores de a rea e potencia
do dimensionamento fornecido pela biblioteca de celulas. Apos
foi feito o dimensionamento de transistores que reduziu o atraso
em 40,4% e a potencia em 2,9%, em media, comparado com
o dimensionamento de portas. Porem, o dimensionamento de
transistores possui um tempo de execuca o maior, usando um
numero
de variaveis duas vezes maior que o dimensionamento
de portas. O dimensionamento de portas otimizando a rea foi
executado considerando como restrica o de atraso o mesmo
valor encontrado na minimizaca o de atraso, possibilitando uma
reduca o, em media, de 28,2% em a rea e 27,3% em consumo de
potencia.
I. I NTRODUC AO
Um aumento na performance do circuito pode ser alcancada
atraves do dimensionamento de transistores, visando a reduca o
do pior caso de atraso. Sendo que, um transistor maior tem
uma capacidade aumentada para carregar (descarregar) uma
carga, reduzindo o tempo requerido para mudar um sinal
de 0 (1) para 1 (0). Entretanto, um transistor maior impoe
uma capacitancia maior para ser carregada (descarregada) pela
porta que a esta alimentando. Portanto, escolher o correto
dimensionamento para o transistor nao e um problema trivial.
O dimensionamento de portas e um caso especial do dimensionamento de transistores onde os transistores de uma
mesma porta sao sujeitos a um mesmo fator de escala. E
mais restritivo que o dimensionamento de transistores, mas
o numero reduzido de variaveis leva a um tempo de execuca o
menor.
Ha um fator de escala o timo para cada porta, considerando
que aumentando o tamanho da porta e, consequentemente, dos
transistores, aumenta sua habilidade para carregar uma carga,
reduzindo o tempo requerido para a porta chavear seu sinal.
Contudo, aumentando o tamanho da porta:
80
ISSN 977-2177-128009
uma comparaca o entre o dimensionamento de portas e de

transistores levando em conta o custo-benefcio entre o
numero de variaveis (tempo de execuca o) e a reduca o no
atraso total do circuito.
uma ferramenta de dimensionamento objetivando ambos:
atraso e a rea.
O artigo e organizado como segue. Na seca o II, sao mostrados alguns trabalhos relacionados em dimensionamento de
portas e de transistores. Funca o posinomial e Programaca o
Geometrica (PG) sao definidos na seca o III. A seca o IV
mostra a formulaca o do problema. Na seca o V, e apresentado
o desenvolvimento do dimensionamento de transistores e de
portas usando Programaca o Geometrica. Na seca o VI, sao
apresentados os resultados da comparaca o entre o dimensionamento de portas e de transistores. A seca o VII mostra
os resultados considerando minimizaca o de atraso seguida
pela minimizaca o de a rea no dimensionamento de portas. E,
finalmente, na seca o VIII sao apresentadas as conclusoes.
[3] apresenta um metodo de dimensionamento de transistores, onde o problema de dimensionamento e modelado e

resolvido usando Programaca o Geometrica. As portas logicas
sao modeladas usando o modelo RC a nvel de chaves (SwitchLevel RC Gate Model). Neste modelo, uma porta logica e vista
como um conjunto de a rvores RC, um para cada possvel vetor
de entrada. O atraso da porta e o maximo atraso gerado por
suas a rvores RC. A a rvore RC e construda trocando os transistores ligados por uma resistencia equivalente conectando os
nos de fonte e dreno do transistor. Ambos, transistores ligados
e desligados contribuem para as capacitancias dos nos, que
sao compostas pelas capacitancias de fonte para substrato e do
dreno para o substrato. Uma capacitancia de carga e conectada
ao no de sada da porta. O atraso e calculado pelo modelo de
Elmore [2] que produz funco es posinomiais, permitindo que
o problema seja formulado como um Programa Geometrico
(PG).
II. T RABALHOS R ELACIONADOS
O problema de dimensionamento de portas logicas tem

sido estudado em muitos artigos usando diferentes maneiras
de resolve-lo. A mais amplamente conhecida e o metodo de
esforco logico [8], que fornece heursticas rapidas ou diretrizes
de projeto para resolver o problema de dimensionamento de
portas aproximadamente. A Programaca o Linear e usada em
[9] e [10]. Programaca o Nao-Linear e usada em [11] e [12].
Joshi e Boyd [13] estao preocupados com a escalabilidade do
circuito, i.e., o tempo de execuca o necessario para dimensionar
circuitos grandes. [14] e [15] utilizam modelos analticos de
potencia, atraso e a rea.
As metodologias tradicionais de dimensionamento de portas [6], [16], [3], [4] usam o modelo de atraso de Elmore
para modelar o atraso como funco es posinomiais permitindo
que o dimensionamento de portas seja formulado como um
Programa Geometrico (PG).
Em [4] e apresentado um metodo de dimensionamento de
portas, onde o fator de escala Xi e associado a cada porta
logica. Estas variaveis sao as variaveis de otimizaca o do PG.
Os atrasos das portas sao estimados por uma funca o linear aos
fatores de escala, onde e calculado o atraso de cada diferente
porta com tamanho 1 e o atraso das outras portas do mesmo
tipo e linear a este valor. A a rea do circuito e a soma da a rea
de cada porta que compoem o circuito. O atraso dos caminhos
e dado pelo produto RC e o atraso do circuito e o maximo
atraso entre todos os caminhos do circuito.
Zhou et. al. [17] apresenta um fluxo de sntese fsica efi
ciente em a rea. Area
maior, causa maior consumo de potencia,
pois ela e a estimativa de primeira ordem de potencia. As
maiores fontes de aumento em a rea, em um fluxo fsico tpico,
e a inserca o de buffers e o dimensionamento de portas. Por
isso, eles apresentam tecnicas de inserca o de buffers e dimensionamento de portas para que o aumento em a rea seja menor
do que os fluxos tradicionais que sao direcionados a encontrar
o menor atraso. No dimensionamento de portas, as portas sao
redimensionadas para melhorar o slack aumentando a menor
quantidade possvel em a rea. Reduzindo o aumento em a rea,
B. Dimensionamento de Portas Logicas
Nesta seca o, sao apresentados os trabalhos relacionados ao

dimensionamento de portas e de transistores, cuja algumas
ideias foram usadas para desenvolver este trabalho.
A. Dimensionamento de Transistores
O dimensionamento de transistores e um problema classico
de projeto que demanda o uso de automaca o, EDA (Electronic Design Automation) e tem recebido muita atenca o na
literatura.
TILOS [5] foi o primeiro algoritmo que fez o dimensionamento de transistores usando o modelo de atraso de
Elmore [2]. Ele identifica um caminho crtico no atraso e
usa um metodo heurstico para reduzir o atraso ao longo
deste caminho. O processo iterativo para quando o caminho
crtico encontra a restrica o de atraso. Ele determina o tamanho
mnimo para todos os transistores e somente aumenta o
tamanho dos transistores no caminho crtico. TILOS tenta
encontrar a restrica o de atraso aumentando o tamanho do
menor numero de transistores possvel.
O trabalho apresentado por Sapatnekar et. al. [6] e um
trabalho classico em dimensionamento de transistores, onde
o objetivo e minimizar a a rea sujeito a uma restrica o de atraso
para ser menor que uma dada especificaca o. O atraso pode ser
controlado variando os tamanhos dos transistores no circuito,
gastando a rea adicional do chip. O atraso e modelado como
funco es posinomiais do tamanho dos transistores, criando um
problema de programaca o geometrica que e transformado
em um problema convexo, garantindo que a soluca o exata e
encontrada.
O dimensionamento de transistores apresentado em [7]
dimensiona os transistores de circuitos CMOS para minimizar
o consumo de potencia enquanto a restrica o de atraso e
obedecida, sem inserir mudancas na estrutura do circuito e
no numero de portas logicas. Utiliza programaca o linear, a
qual tambem e convexa, para computar o dimensionamento.
81
ISSN 977-2177-128009
melhora o comprimento medio dos fios, congestionamento e

consumo de potencia.
Aizik e Kolodny [18] dimensionam as portas de um caminho
considerando o atraso inicial deste caminho e o consumo de
potencia para este atraso. A funca o objetivo do problema de
minimizaca o depende linearmente das capacitancias de fuga
e dinamica. O problema de otimizaca o e resolvido usando
programaca o geometrica pelo resolvedor GGPLAB. Alem
disso, experimentos tem mostrado que:
custoso em potencia.
O mnimo atraso e
Um fator fixo de reduc
a o de tamanho para todas as portas
logicas do circuito pode levar a um eficiente projeto em
termos de energia. O fator o timo de reduca o na escala
das portas logicas nao e uniforme.
Reduzindo o tamanho das portas l
ogicas reduz a
dissipaca o de energia dinamica e de fuga, pois elas
dependem linearmente do tamanho das portas.
Para desenvolver o dimensionamento de portas e de transistores neste trabalho, as ideias dos trabalhos [3] e [4] foram
combinadas com o objetivo de calcular o atraso de forma mais
precisa.
A. Minimizaca o do Atraso para uma Area

(potencia) Maxima
A equaca o (4) mostra a formulaca o do problema de
otimizaca o para minimizar o atraso, considerando que os
valores D sao os atrasos dos caminhos do circuito. Xmin
e Xmax sao os tamanhos mnimo e maximo das portas
(no dimensionamento de portas) ou as larguras mnimas e
maximas permitida para os transistores (no dimensionamento
de transistores). A largura maxima do transistor foi definida
com base na maxima largura de transistor encontrada em uma
biblioteca de celulas comercial para a mesma tecnologia de
max
e a maxima capacitancia de entrada aceitavel
fabricaca o. Cin
para o circuito, evitando uma alta carga para o circuito que o
esta carregando e Amax e a maxima a rea do circuito.
P OSINOMIAL E P ROGRAMAC AO
III. F UNC AO
G EOM E TRICA (PG)
O termo Programa Geometrico foi introduzido em 1967 [1]
e e usado para definir um tipo de problema de otimizaca o
matematica onde a funca o objetivo deve ser uma funca o
posinomial.
Para entender o que e uma funca o posinomial, primeiro
e definido monomio, conforme mostra a expressao 1, onde
o coeficiente c pode ser qualquer numero positivo, e os
expoentes podem ser quaisquer numeros reais.
minimizar
Area
sujeito a Xmin Xi Xmax
(5)
max
Cin Cin
D Dmax
V. D ESENVOLVIMENTO DA F ERRAMENTA
Pode ser dito que o problema de dimensionamento de portas
e o problema de escolher os fatores de escala a fim de
encontrar o mnimo atraso sujeito a limites na a rea total e
outras restrico es ou encontrar a a rea mnima sujeita a limites
do atraso maximo.
A ferramenta de dimensionamento de portas e de transistores usando PG foi desenvolvida da seguinte forma:
1) As portas logicas sao modeladas usando o modelo RC
a nvel de chaveamento [3].
2) No dimensionamento de transistores, para cada transistor
e definida uma variavel que representa a largura do
transistor. Para o dimensionamento de portas, e usado
um fator de escala para cada porta logica que multiplica
a largura dos transistores que fazem parte da porta. Estas
sao as variaveis de otimizaca o do problema e elas afetam
a a rea, o consumo de potencia e a velocidade do circuito.
3) Os valores de capacitancia e resistencia sao obtidos
por simulaca o SPICE considerando transistores PMOS
e NMOS. As capacitancias do transistores sao proporcionais a largura do mesmo e a resistencia de conduca o
e aproximadamente inversamente proporcional a largura
do transistor.
4) O atraso e calculado pelo modelo de Elmore, conforme
ja citado. O atraso do circuito e o maximo atraso entre
todos os caminhos do circuito.
5) A a rea e a soma da largura Wi de cada transistor i que
compoe o circuito, onde n e o numero de transistores
do circuito:
n
an
1 a2
f (x) = (cx1a1 xa2 2 ...xann ) = c xa
1 x2 ...xn
minimizar D = max(D1 ...Dn )

sujeito a Xmin Xi Xmax
(4)
max
Cin Cin
A Amax
B. Minimizar Area
(Potencia) Sujeito a um Atraso Maximo
Para este otimizaca o, e feita uma alteraca o, onde a funca o
objetivo torna-se a a rea e o atraso torna-se uma restrica o. A
formulaca o e a seguinte:
(1)
A soma de um ou mais monomios, isto e , a funca o na forma:

K
ck xa1 1k xa2 2k ...xannk
f (x) =
(2)
k=1
onde ck > 0, e chamada de funca o posinomial ou um

posinomio (com K termos, e variaveis de x1 , ..., xn ).
Um monomio e tambem um posinomio. Dividindo-se um
posinomio por um monomio, o resultado sera outro posinomio.
E, multiplicando varios posinomios ou o elevando a um
expoente positivo, tambem sera obtido um posinomio.
Um programa geometrico e um problema de otimizaca o na
forma:
minimizar
f0 (x)
sujeito a
fi (x)1, i = 1, ..., m,
(3)
gi (x) = 1, i = 1, ..., p
onde fi sao funco es posinomiais, gi sao monomios, e xi sao
as variaveis de otimizaca o.
DO P ROBLEMA
IV. F ORMULAC AO
Este trabalho suporta 2 formulaco es diferentes: minimizar o
atraso sob uma restrica o de a rea e minimizar a a rea sob uma
restrica o de atraso. As duas formulaco es sao apresentadas a
seguir.
Atotal =
Wi
i=1
82
ISSN 977-2177-128009
(6)
6) A potencia de chaveamento e calculada considerando

todas as capacitancias do circuito: capacitancida de carga
(Cload ) e as capacitancias de entrada (Cin). V dd e a
tensao do circuito. e a probabilidade de chaveamento,
que foi considerado como 20% em nossos testes e f e
a frequencia de clock, que nos definimos para 500MHz
para nossos testes.
numeros e apresentada na Tabela III. Variaveis auxiliares sao

necessarias para transformar o problema geometrico da forma
generalizada para a forma padrao. O numero de variaveis
auxiliares usadas tanto no dimensionamento de portas como
no dimensionamento de transistores e o mesmo considerando
o mesmo circuito. Por isso, o numero total de variaveis para
o dimensionamento de portas (DP) e a soma do numero de
portas e o numero de variaveis auxiliares, e o numero total de
variaveis para o dimensionamento de transistores (DT) e dado
pela soma do numero de transistores do circuito e o numero
de variaveis auxiliares.
Usando o dimensionamento de transistores e possvel reduzir o atraso em 40,4%, em media, comparado com a reduca o
de atraso alcancada pelo dimensionamento de portas que e de
21%, sobre uma biblioteca de standard cells. O consumo de
potencia e a a rea foram mantidos quase os mesmos que o
dimensionamento de portas. O u nico problema do dimensionamento de transistores e o numero de variaveis, que e mais
que o dobro do dimensionamento de portas, 102,3% maior,
conforme e apresentado na Tabela III. Isso produz um tempo
de execuca o consideravelmente maior, pois os resolvedores de
programaca o geometrica escalam cubicamente [21].
Cini ) V dd2 f
P = (Cload +
(7)
i=1
VI. D IMENSIONAMENTO DE T RANSISTORES versus

D IMENSIONAMENTO DE P ORTAS
Uma comparaca o entre o dimensionamento de transistores e
o dimensionamento de portas logicas e apresentada nesta seca o
para analisar o custo-benefcio entre o tempo de execuca o e o
mnimo atraso alcancado.
Um conjunto de circuitos de benchmarks do ISCAS85 foi
mapeado usando a ferramenta RTL Compiler [19] da Cadence
para uma biblioteca na tecnologia de 45nm considerando
somente as celulas CMOS dual. Estes circuitos foram inseridos
em nossa ferramenta de dimensionamento onde os valores
de a rea, atraso e potencia foram calculados. O valor de a rea
dos circuitos usando standard cells e usado como restrica o
para dimensionar o circuito em nossa ferramenta que usa PG.
Usando esta descrica o, as portas logicas e transistores que
compoes os circuitos mapeados sao dimensionados usando a
formulaca o 4, objetivando o atraso mnimo ou alta velocidade
para o circuito. GGPLAB [20] foi usado como resolvedor de
Programaca o Geometrica.
Os resultados de comparaca o entre nossa ferramenta de
dimensionamento usando Programaca o Geometrica (PG) e
o dimensionamento disponvel em uma tpica biblioteca de
standard cell (SC) sao apresentados na Tabela I, onde os
valores de atraso, a rea e potencia e suas reduco es (R) sao
mostrados. O atraso e calculado pelo modelo de atraso de
Elmore, a rea e potencia foram calculadas usando as equaco es
6 e 7, respectivamente.
O circuito dimensionado usando nossa metodologia de
dimensionamento obteve uma reduca o no atraso, em media,
de 21% mantendo os mesmos valores de a rea e potencia
fornecidos pelo dimensionamento fornecido na biblioteca de
standard cells.
Apos, o dimensionamento de transistores foi executado considerando a mesma descrica o do circuito. A Tabela II mostra
os valores dos circuitos dimensionados pelo dimensionamento
de portas (DP), que sao os mesmos valores apresentados na
Tabela I, e os valores para o dimensionamento de transistores
(DT). As reduco es (R) de potencia, atraso, a rea e numero de
variaveis sao mostradas em porcentagem. Os valores positivos
indicam que o dimensionamento de transistores (DT) possui
atraso/area/potencia melhores que o dimensionamento de portas (DP).
O numero total de variaveis usadas para resolver o Dimensionamento de Portas (DP) e o Dimensionamento de Transistores (DT) e a diferenca (Dif.) em porcentagem, entre esses
DE ATRASO SEGUIDA PELA

VII. M INIMIZAC AO
REA
DE A
M INIMIZAC AO
Tennakoon e Sechen [22] destacaram que uma vez que o
atraso mnimo e alcancado, uma nova otimizaca o pode ser
executada com o objetivo de minimizar a a rea usando o atraso
mnimo como restrica o. Isso permite que a a rea fique mais
reduzida ja que a minimizaca o de atraso nao esta preocupada
em reduzir a rea. Esta abordagem e utilizada neste trabalho, tal
que, dois problemas de otimizaca o sao resolvidos para cada
circuito de teste. Primeiro, o atraso e minimizado e depois
e feita a minimizaca o de a rea, reduzida-a ainda mais sem
nenhuma penalidade de atraso.
As portas logicas que compoem o circuito sao dimensionadas usando a formulaca o 4 e, depois, a formulaca o 5. A
Tabela I mostra os valores considerando uma minimizaca o de
atraso, onde o atraso e calculado pelo modelo de Elmore.
A Tabela IV mostra os valores para o dimensionamento
de portas minimizando a a rea sob uma restrica o de atraso. O
atraso considerado como restrica o e o mnimo atraso encontrado na minimizaca o de atraso. Nessa tabela, sao apresentados
os valores de a rea e potencia dados pela minimizaca o de atraso
(Min. atraso) e pela minimizaca o de a rea (Min. a rea) e as
reduco es em a rea e potencia obtidas pela minimizaca o de a rea.
A minimizaca o de a rea permitiu uma reduca o, em media,
de 28,2% em a rea e 27,3% em potencia, considerando o valor
de atraso dado pela minimizaca o de atraso. Esta melhora e
possvel quando ha multiplos pontos o timos que minimizam
o atraso para uma dada restrica o de a rea. Dessa forma, o
problema de minimizaca o de a rea e capaz de encontrar a
mnima a rea considerando o mnimo atraso do circuito.
E T RABALHOS F UTUROS
VIII. C ONCLUS OES
Neste trabalho, o problema de dimensionamento de portas e
de transistores usando Programaca o Geometrica foi resolvido
83
ISSN 977-2177-128009

TABELA I
ENTRE O
R ESULTADOS DA COMPARAC AO
DIMENSIONAMENTO BASEADO NOS TAMANHOS ENCONTRADOS EM UMA BIBLIOTECA STANDARD CELL
O DIMENSIONAMENTO DE PORTAS USANDO
C432
C499
C1908
C880
apex1
apex2
apex3
apex5
Media
G EOM E TRICA (PG)

P ROGRAMAC AO
Potencia (W )
Dimens.
Dimens.
SC
PG
22,2
22,4
58,3
58,4
33,6
33,7
31,4
31,1
239,8
239,5
527,1
523,6
254,3
251,9
264,6
258,3
178,9
177,3
R
(%)
-0,9
-0,2
-0,3
1,1
0,1
0,7
0,9
2,4
0,5
Atraso (ps)
Dimens.
Dimens.
SC
PG
718
666
750
651
472
425
451
330
673
504
863
650
687
507
662
431
660
521
PARA
(SC)
45nm MINIMIZANDO ATRASO SUJEITO A` AREA
Area
(m2 )
Dimens.
Dimens.
SC
PG
210,4
210,4
536,4
536,4
304,3
304,3
281,0
277,4
2304
2296
5180
5145
2441
2413
2512
2446
1721
1704
R
(%)
7,3
13,1
10,0
26,8
25,2
24,7
26,3
34,9
21,0
R
(%)
0,0
0,0
0,0
1,3
0,4
0,7
1,2
2,6
0,8
TABELA II
ENTRE O D IMENSIONAMENTO DE P ORTAS L OGICAS
R ESULTADOS DE COMPARAC AO
(DP) E O D IMENSIONAMENTO DE T RANSISTORES (DT)
DE AREA
NESTE TRABALHO PARA 45nm MINIMIZANDO O ATRASO SUJEITO A UMA RESTRIC

AO
C432
C499
C1908
C880
apex1
apex3
apex5
Media
Potencia (W )
DP
DT
R (%)
22,4
21,8
2,7
58,4
56,2
3,8
33,7
32,3
4,7
31,1
30,2
3,9
239,5 231,3
4,9
251,9 245,1
4,8
258,3 255,7
1,0
127,9 124,7
2,9
DP
666,1
651,5
425,0
330,2
503,7
506,5
431,0
502
Atraso (ps)
DT
R (%)
400,6
39,9
421,6
35,3
253,1
40,4
188,3
43,0
294,2
41,6
293,9
42,0
255,9
40,6
301,1
40,4
DP
210,4
536,4
304,3
277,4
2296
2413
2446
1212
PROPOSTO
Area
(m2 )
DT
R (%)
210,4
0,0
536,4
0,0
304,3
0,0
277,4
-1,3
2296
-0,35
2441
-1,2
2512
-2,7
1227
-0,8
TABELA III
N UMERO
TOTAL DE VARI AVEIS
USADAS PARA RESOLVER
D IMENSIONAMENTO DE P ORTAS (DP)
DIFERENC
A (D IF.) ENTRE ESSES N UMEROS
C432
C499
C1908
C880
apex1
apex3
apex5
Media
#
Portas
184
403
259
232
1728
1939
1942
955
#
Transistores
666
1608
1008
900
6842
7476
8244
3821
# Variaveis
Auxiliares
344
755
455
399
3351
3771
3663
1820
EO
D IMENSIONAMENTO DE T RANSISTORES (DT) E A
# Total de variables
DP
DT
Diff. (%)
528
1010
91,3
1158
2363
104,0
714
1463
104,9
631
1299
105,9
5079
10193
100,7
5710
11247
97,0
5605
11907
112,4
2775
5640
102,3
TABELA IV
R ESULTADOS PARA O DIMENSIONAMENTO DE PORTAS MINIMIZANDO AREA

SUJEITO AO ATRASO (M IN . AREA
)
DE ATRASO (M IN . ATRASO ) PARA 45nm
MINIMIZAC
AO
C432
C499
C1908
C880
apex1
apex2
apex3
apex5
Media
Potencia (W )
Min.
Min.
Reduca o
Atraso
Area
(%)
22,4
22,4
0,00
58,4
58,4
0,00
33,7
33,7
0,00
31,1
20,2
34,9
239,5
137,3
42,7
523,6
270,3
48,4
251,9
135,8
46,1
258,3
138,5
46,4
177,3
102,1
27,3
84
Min.
Atraso
210,4
536,4
304,3
277,4
2295,6
5144,8
2413
2446,4
1704
COMPARADO AOS VALORES DE
Area
(m2 )
Min.
Reduca o
Area
(%)
210,4
0,00
536,4
0,00
304,3
0,00
171
38,4
1293,5
43,7
2647,3
48,5
1274,1
47,2
1269
48,1
963,3
28,2
ISSN 977-2177-128009
combinando os trabalhos [3] e [4]. O problema de dimensionamento de portas e de transistores e formulado de duas
maneiras:
1) Objetivando minimizar o atraso do circuito para um
valor maximo de a rea, ou
2) Minimizando a a rea sob uma restrica o de atraso. As
potencias dinamica e de fuga dependem linearmente da
a rea. Por isso, minimizando a a rea se esta diretamente
minimizando o consumo de potencia.
Foi realizada uma comparaca o entre o dimensionamento
de portas e o dimensionamento de transistores desenvolvidos
nesse trabalho que utilizam PG. Para os testes, foram utilizados
os circuitos sintetizados pela ferramenta RLT Compiler da
Cadence para uma biblioteca de celulas em 45nm. O dimensionamento de portas para a minimizaca o de atraso obteve
uma reduca o em 21% no atraso, em media, para os mesmos
valores de a rea e potencia do dimensionamento fornecido pelas
standard cells. O dimensionamento de transistores reduziu o
atraso em 40,4%, em media, comparado com os resultados
do dimensionamento de portas. Embora o dimensionamento
de transistores tenha alcancado melhores resultados comparado ao dimensionamento de portas, ele possui um tempo
de execuca o muito maior pois o numero de variaveis de
otimizaca o e cerca de 2 vezes maior, envolvendo, dessa forma,
uma relaca o de custo-benefcio entre a reduca o no atraso e o
tempo de execuca o.
Em uma segunda analise, os circuitos foram dimensionados buscando minimizar a a rea, onde o atraso foi restrito
ao valor obtido na minimizaca o de atraso. Atraves desta
segunda otimizaca o, pode-se obter uma reduca o em a rea de
28,2% e 27,3% em potencia comparado aos valores dados
pela minimizaca o de atraso. Dessa forma, utilizando as duas
abordagens de otimizaca o, uma apos a outra, e possvel obter
o atraso mnimo e a a rea mnima para o circuito.
Usando uma ferramenta de geraca o automatica de celulas,
como mostrado em [23], pode-se gerar celulas no tamanho
desejado e aproveitar os melhores resultados em atraso, a rea
e potencia, que sao crticas em tecnologias recentes.
Como um trabalho futuro, nos pretendemos expandir essas
analises para avaliar nosso metodo com outros metodos da literatura. Outra tarefa e inserir os tempos de subida e de descida
e as capacitancias dos fios no modelo. Nos tambem pretendemos executar experimentos analisando o atraso usando uma
ferramenta com mais precisao no calculos, pois o modelo de
atraso de Elmore da valores aproximados.
[2] W. Elmore, The transient analysis of damped linear networks with

particular regard to wideband amplifiers, J. Applied Physics, vol. 19,
1948.
[3] S. Boyd, S.-J. Kim, D. D. Patil, and M. A. Horowitz, Digital circuit
optimization via geometric programming, Operations Research, vol. 53,
no. 6, pp. 899932, Nov.-Dec. 2005.
[4] S. Boyd, S.-J. Kim, L. Vandenberghe, and A. Hassibi, A tutorial on
geometric programming, Springer Science+Business Media, LLC 2007,
pp. 67127, 2007.
[5] J. P. Fishburn and A. E. Dunlop, Tilos: A posynomial programming
approach to transistor sizing, in Int. Conference on Computer Aided
Design, Las Vegas, Nevada - USA, 1985, pp. 326328.
[6] S. S. Sapatnekar, V. B. Rao, P. M. Vaidya, and S.-M. Kang, An
exact solution to the transistor sizing problem for cmos circuits using
convex optimization, IEEE Transactions on Computer Aided Design of
Integrated circuits and Systems, vol. 12, no. 11, pp. 16211634, 1993.
[7] M. Borah, R. Owens, and M. Irwin, Transistor sizing for low power
cmos circuits, IEEE Transactions on Computer-Aided Design of Integrated circuits and Systems, vol. 15, no. 6, pp. 665671, 1996.
[8] I. Sutherland, B. Sproull, and D. Harris, Logical Effort: Designing
Fast CMOS Circuits. San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc., 1999.
[9] M. R. C. M. Berkelaar and J. A. G. Jess, Gate sizing in mos
digital circuits with linear programming, in EDAC90: Conference on
European Design Automation, Glasgow, Scotland, 1990, pp. pp. 217
221.
[10] K. Bhattacharya and N. Ranganathan, A linear programming formulation for security-aware gate sizing, in 18th ACM Great Lakes
symposium on VLSI, Orlando, Florida - USA, 2008, pp. 273278.
[11] S. S. Sapatnekar and W. Chuang, Power-delay optimization in gate
sizing, ACM Transactions on Design Automation of Electronic Systems
(TODAES), vol. 5, no. 1, pp. 98114, 2000.
[12] V. Mahalingam and N. Ranganathan, A nonlinear programming based
power optimization methodology for gate sizing and voltage selection,
in ISVLSI 2005: IEEE Computer Society Annual Symposium on VLSI,
Tampa, Florida - USA, 2005, pp. 180185.
[13] S. Joshi and S. Boyd, An efficient method for large-scale gate sizing,
IEEE Transactions on Circuits and Systems, vol. 55, no. 9, pp. 2760
2773, October 2008.
[14] B. Hoppe, G. Neuendorf, D. Schimitt-Landsiedel, and W. Specks,
Optimization of high-speed cmos logic circuits with analytical models
for signal delay, chip area and dynamic power dissipation, IEEE
Transactions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 9, no. 3, pp. 236246, 1990.
[15] M. Borah, R. M. Owens, and M. J. Irwin, Transistor sizing for
minimizing power consumption of cmos circuits under delay constraint,
in 1995 International Symposium on Low Power Design, Dana Point,
California - USA, 1995, pp. 167172.
[16] J. Singh, V. Nookala, Z.-Q. Luo, and S. Sapatnekar, Robust gate sizing
by geometric programming, in 42nd IEEE/ACM design automation
conference (DAC), Anaheim, California - USA, 2005, pp. pp. 315320.
[17] Y. Zhou, C. J. Alpert, Z. Li, C. Sze, and L. H. Trevillyan, Shedding
physical synthesis area bloat, VLSI Design - Special issue on CAD for
gigascale SoC design and verification solutions, vol. 2011, 2011.
[18] Y. Aizik and A. Kolodny, Finding the energy efficient curve: Gate sizing
for minimum power under delay constraints, VLSI Design, vol. 2011,
2011.
[19] Cadence, Rtl compiler, 2009, available from Internet:
http://www.cadence.com. Cited 2009 ago.
[20] Ggplab: a matlab toolbox for gp, 2010, available from Internet:
http://www.stanford.edu/ boyd/ggplab/.Cited 2010 mai.
[21] M. K. K. Kasamsetty and S. S. Sapatnekar, A new class of convex
functions for delay modeling and their application to the transistor sizing
problem, IEEE Trans. on Computer-Aided Design of Integrated Circuits
and Systems, vol. 19, no. 7, pp. 779788, 2000.
[22] H. Tennakoon and C. Sechen, Efficient and accurate gate sizing with
piecewise convex delay models, in 42nd annual ACM IEEE Design
Automation Conference, Anaheim, California, USA, 2005, pp. 807812.
[23] A. Ziesemer, C. Lazzari, and R. Reis, Transistor level automatic layout
generator for non-complementary cmos cells, in IFIP International
Conference on Very Large Scale Integration and System-on-Chip, VLSISOC 2007, Atlanta, GA, USA, Oct 2007, pp. 116121.
AGRADECIMENTOS
Este trabalho e parcialmente suportado pelo Conselho Nacional de Desenvolvimento Cientifico e Tecnologico - CNPq
- Brasil e da Coordenaca o de Aperfeicoamento de Pessoal de
Nvel Superior (CAPES).
R EFERENCES
[1] R. J. Duffin, E. L. Peterson, and C. Zener, Geometric programmingtheory and application, John Wiley & Sons, 1967.
85
ISSN 977-2177-128009
Simulacion de Fallas SET en un Oscilador

Controlado por Voltaje
Walter E. Calienes Bartra, Fernanda L. Kastensmidt and Ricardo Reis
PGMicro / PPGC - Universidade Federal do Rio Grande do Sul
Porto Alegre - RS - Brazil
Email: {wecbartra, fglima, reis}@inf.ufrgs.br
polarizadas, causando un flujo anormal de cargas modeladas

por las siguientes ecuaciones:
ResumenSe define una Falla como un defecto que puede

convertirse en un error. En los circuitos Integrados (ICs), estas
fallas pueden ser permanentes, transitorias o intermitentes. En el
caso de Fallas Transitorias, estas duran muy poco tiempo. Estas
fallas pueden resultar en cambios inesperados en la salida de un
circuito o en una falla permanente. Un tipo de Falla Transitoria
es conocida como Single-Event Transient (SET), que ocurren el
la logica combinacional y en los circuitos analogicos. El estudio
del comportamiento de un circuito bajo fallas es importante
para la eleccion de las tecnicas de proteccion y medida de la
susceptibilidad ante la falla inyectada. En estos das, la simulacion
de ICs. La prediccion del
es un paso importante en el diseno
comportamiento del IC ante fallas es esencial para asegurar
que este haya sido bien implementado. Durante las simulaciones
varios problemas pueden ser detectados y corregidos. En este
artculo se presenta un toolkit para simular los efectos de un
SET en un Oscilador Controlado por Voltaje (VCO) CMOS de
250nm usando la plataforma LabVIEW de National Instruments.
Los resultados de las simulaciones fueron comparados con los
resultados experimentales obtenidos por W. Chen et al. en 2003.
Ip (t) = I0 (e t/F e t/R )
(1a)
Ip (t) dt
Qp =
(1b)
I0 =
Qp
F R
(1c)
donde Ip (t) es la corriente transitoria del SET, I0 es la

corriente generada por las cargas, R es la constante de tiempo
para el establecimiento del trayecto del ion, F es la constante
de tiempo de coleccion de cargas de la juntura y Qp es la carga
total colectada por el pulso transitorio. La figura 1 muestra la
corriente Ip (t) para I0 = 350A, R = 10ps y F = 100ps.
I. I NTRODUCCI ON
Con lo que ocurrio con el satelite Telstar en 1962, debido
a una explosion nuclear de gran altitud, fue posible conocer
los efectos de la radiacion sobre los dispositivos electronicos.
Con el avance tecnologico y la disminucion de tamano de los
transistores, los CIs se vuelven mas suceptibles a los efectos de
la radiacion. las fuentes de radiacion pueden venir del espacio
(protuberancias solares, cinturon de Van Allen, viento solar o
rayos cosmicos) [1], o desde fuentes radioactivas o electromagneticas de la Tierra. Una vez que el circuito es expuesto
a una fuente de radiacion, este puede tener sus valores de
salidas alterados, alterando las caractersticas del circuito o
bien deshabilitandolo de forma permanente, dependiendo de
la cantidad de radiacion a la cual el circuito fue expuesto.
Estas son las principales razones por las cuales es importante
predecir el comportamiento de CI con fallas debido a este
fenomeno. Las Fallas Transitorias, que son uno de los tantos
tipos de fallas debidas a radiacion, son fallas que duran un
corto tiempo. Estas fallas pueden ser consideradas desde ruido
de baja magnitud hasta pulsos suficientemente grandes como
para danar un dispositivo electronico de forma permanente.
Estas tambien pueden causar efectos acumulativos que hacen
que el equipo falle gradualmente hasta su completa perdida.
Entre esta fallas estan los SETs, fenomenos causados por
partculas que impactan contra las uniones PN inversamente
Figura 1. Simulacion de Corriente Transitoria. La carga equivalente total es

de 31.5f C aproximadamente.
El termino I0 depende de la secante del a ngulo de impacto

y de la movilidad promedio de los portadores, y F es
inversamente proporcional a la gradiente del campo electrico
en la region de agotamiento de la juntura [2].
Los circuitos analogicos tambien son afectados por los SET,
solo se necesita inyectar un pulso de larga duracion y gran
carga en ellos. El VCO es un circuito importante para las
comunicaciones y radiofrecuencia, que produce una senal oscilatoria y su frecuencia es controlada por una tension variable.
El principio de operacion de este circuito es el Criterio de
Barkhausen, y usa un par de transistores en configuracion
de Par Cruzado que generan una resistencia negativa que
86
ISSN 977-2177-128009
mantiene a los Circuitos Tanque funcionando [3]. Una de las

partes mas sensibles de este circuito son los transistores de Par
Cruzado, que son iguales y de transconductancia gm . Si una
partcula impacta y causa un pulso transitorio de corriente,
esto variara el valor de gm , haciendo con que el Criterio
de Barkhausen no se cumpla, y as el VCO no oscilara
apropiadamente. Para simular este fenomeno es usado un
Laser para simular el impacto de un ion pesado que cause
un transitorio de mas de 30pC, que es suficiente para simular
un SET en este circuito [4].
LabVIEW es un software de programacion grafica usado
en la industria como una interfase entre los usuarios y los
instrumentos. Este es un lenguaje paralelo, de facil aprendizaje
y depuracion, que lo hace ideal para ingenieros y cientficos
[5], [6]. La programacion el LabVIEW se asemeja a la
creacion de circuitos electricos, haciendo facil el seguimiento
del flujo de datos en el programa. Los programas en LabVIEW
son llamados Instrumentos Virtuales (VIs). Cada VI tiene una
interfase de usuario o Panel Frontal (FP) y un Diagrama de
Bloques (BD) donde se programa la aplicacion.
El objetivo de este trabajo es generar un toolkit en LabVIEW 8.20 que puede usarse para simular SETs causados
por iones o pulsos Laser sobre un VCO CMOS de 250nm
y compararlo con los experimentos de laboratorio y simulaciones reportados en [4]. No existe en la literatura academica
o industrial un paquete de herramientas para LabVIEW que
simule fallas SET en circuitos analogicos [7].
El resto del artculo es organizado de la siguiente forma:
la seccion II describe el toolkit desarrollado. La seccion
III muestra los resultados obtenidos en las simulaciones, la
seccion IV muestra las conclusiones, y los trabajos futuros
son mostrados en la seccion V.
Figura 2.
B. Simulacion del Transistor MOS

La simulacion de Transistor MOS esta basado en el modelo
cuadratico. En este modelo, la corriente de drenador ID tiene
diferentes ecuaciones para modelar las diferentes regiones de
operacion del transistor MOS [3], [8]:
IDsth = IM
Vth
VDS
W VGS
e nT
1 e T
L
(2a)
VDS 2
2
IDres =
n ox W
tox L
IDsat =
n ox W
(VGS Vth )2 (1 + VDS )
2tox L
(VGS Vth )VDS
(2b)
(2c)
donde IDsth es la corriente Sub-Umbral, IDres es la corriente

Resistiva, IDsat es la corriente de Saturacion, IM es la
corriente Sub-Umbral especfica, n es el factor de forma de
la tension Sub-Umbral, T = kB T /q es el voltaje termico
(T = 300K), n es la movilidad de los portadores [9], W y
L son el ancho y el largo del canal del transistor, ox es la
permitividad del o xido de silicio, tox es la espesura del o xido,
Vth es la tension Umbral, VGS es la tension puerta-fuente y
VDS es la tension drenador-fuente. La tension Umbral Vth es
dada por [8]:
II. H ERRAMIENTAS D ESARROLLADAS
Vth = Vth0 + ( |(2)F + VSB |
Los VIs desarrollados para este trabajo simulan etapas

de circuitos completos con transistores, capacitores, diodos
varicap, etc. Esto fue hecho para facilitar la programacion
en LabVIEW. Las simulaciones hechas con estas herramientas
pueden ser consideradas como Simulaciones a Nivel de Puertas
Logicas [1], [7].
|2F |)
(3)
El VI de modelamiento de transistor MOS basico,

Transistor MOS PtByPt.vi, se muestra en la figura
3. Este VI esta basado en (2) y (3). Las entradas Vdrain,
Vgate, Vbulk y Vsource son las tensiones de drenador, puerta,
cuerpo and fuente respectivamente. La entrada Transistor
Type sirve para indicar si el transistor es NMOS o PMOS.
Para configurar este VI se necesita de dos conjuntos de
parametros: Geometricos (Geom Parameters) y Tecnologicos
(Tech Parameters), que son representados como entradas de
cluster en el VI [5]. La Tabla I muestra los componentes del
cluster Tech Parameters en el orden exacto que deben ser
declarados en el VI para la tecnologia de 250nm [8]. Los
parametros geometricos son detallados en [7].
Las Tensiones Vth , VDS , VGS , VDsat y la corriente ID son
mostradas en las salidas Vth, Vds, Vgs, Vdsat y Id del VI,
respectivamente. La salida Req es el valor Req que depende
de la corriente IDSAT = IDres (VDS = VDsat ) cuando el
transistor esta encendido [8]:
A. Generador de SET
El VI SET.vi esta basado en (1). La figura 2 muestra el
bloque VI que lo representa. Las entradas de datos Io (A),
tF (s), y tR (s) son I0 , F y R respectivamente. La entrada
Eval. Time (s) es el maximo tiempo de generacion del SET.
Para desplazar el SET en el tiempo de usa la entrada Shifting:
si le numero de elementos del arreglo de salida Ip(t) (A) es
N = Ts /t y se necesita que el SET se ubique en el tiempo
lt , entonces lt [0, N 1], donde Ts es el valor de la entrada
Eval. Time (s) y t es el valor de la entrada dt (s) (debe
cumplirse que Ts
t). La salida Charge (C) muestra la
carga equivalente de este transitorio de corriente. El FP de
este VI se muestra en la figura 1.
SET.vi
Req =
87
3 VDS
5
1 VDS
4 IDSAT
6
ISSN 977-2177-128009
(4)
(a) Instrumento Virtual

Figura 3.
Transistor MOS PtByPt.vi.
La salida gm representa la transconductancia del transitor

MOS gm , mientras que rds es la resistencia drenador-fuente
rds . Estas caractersticas son definidas a seguir [3]:
gm =
rds
ID
n ox W
=
VGT (1 + VDS )
VGS
tox L
ID
=
VDS
n ox W
=
V 2
tox 2L GT
(5)
(b) Esquematico
(6)
Figura 4.
TABLA I
PAR AMETROS
T ECNOL OGICOS
PARA T RANSISTORES MOS DE 250nm [8].
Parametro
Tension Umbral a VGS = 0 (V)
Efecto Cuerpo (V0.5 )
Funcion Trabajo (V)
Dens. de Donadores (cm3 )
Dens. de Aceptadores (cm3 )
Ancho de Canal (V1 )
Velocidad de Saturacion (m/s)
Dens. de Port. en el Si (cm3 )
Forma Sub-Umbral
Smbolo
Vth0
F
ND
NA
vsat
ni
n
PMOS
-0.40
0.40
-0.30
1 1016
1 1014
-0.10
100000
1.5 1010
1.49
C = 1123.75 623.75 Vtune
NMOS
0.43
0.40
0.30
1 1016
1 1014
0.06
100000
1.5 1010
1.49
(7)
donde Vtune [1, 1.8] es la tension de sintona del Tanque.

Si Vtune esta fuera del intervalo definido, entonces este valor
es forzado para alguno de los lmites de este intervalo.
C. Espejo de Corriente
Figura 5.
La herramienta NMOS Current Mirror PtByPt.vi

simula un Espejo de Corriente hecho con transistores NMOS
y un Pull-Up PMOS. Su VI y esquematico son mostrados en
la figura 4. La corriente Iset, la tension Vout y una adecuada
seleccion de los tamanos es necesaria para fijar la corriente
Iout. La salida Vref muestra la tension puerta-fuente de los
transistores del Espejo. La salida Req es la resistencia rds
de los transistores del espejo. Los conjuntos de parametros
geometricos y tecnologicos para los transistores del espejo
y del Pull-Up son definidos en los clusters correspondientes.
Estos clusters son los mismos que los descritos en la seccion
II-B.
Tank PtByPt.vi
La frecuencia f en Hertz del Tanque es definida por:

f=
2 LC
(8)
donde L es el valor de la inductancia en la entrada L del VI. El

valor f puede ser obtenido en la salida Tank frequency y sirve
para gererar un patron senoidal en la salida Vtank con una
amplitud y desplazamiento (offset) de Vdd. La entrada time es
necesaria para crear el patron seniodal en Vtank. La salidas Rs
y Rp el valor de la resistencia en serie Rs de la inductancia L
y su equivalente paralelo Rp [3]. Estas resistencias dependen
del Factor de Calidad del Tanque Q definido en la entrada Q
del VI:
D. Circuito Tanque
La herramienta Tank PtByPt.vi es un VI que realiza la
simulacion funcional de un Circuito Tanque. Esta basado en
una capacitancia e inductancia en paralelo. Su VI es mostrado
en la figura 5. Para sintonizar el Tanque se debe usar la
entradaVtune con un numero real entre 1 y 1.8V para ajustar
la capacitancia del Tanque C entre 500 y 1 pF; este valor es
definido a seguir:
NMOS Current Mirror PtByPt.vi
Rs =
2f L
Q
Rp = Rs (1 + Q2 )
88
ISSN 977-2177-128009
(9a)
(9b)
total de portadores en el silicio, entonces es posible escribir

(11) de la siguiente manera:
E. Par Cruzado de Transistores MOS

El VI NMOS Cross Pair.vi simula el comportamiento
de un par cruzado de transistores para generar una resistencia
negativa [3]. Esta configuracion es la base de los osciladores.
En este VI es posible inyectar una falla SET y ver el
comportamiento de los circuitos basados en este arreglo de
transistores (como los VCOs). La figura 6 muestra el VI de
esta herramienta y el esquematico del arreglo de transistores,
que es usado junto con dos VIs Tank PtByPt.vi. La
operacion del par cruzado esta basada en el Criterio de
Barkhausen. Para que ocurra la oscilacion debe cumplirse lo
siguiente:
Rp1 + Rp2
1
gm1
1
gm2
=0
gm =
n (Nvar )ox W
(I + Ip (t))
tox
L
(12)
donde n (Nvar ) es la mobilidad de los portadores que depende de Nvar = N I/(I + Ip (t)) y N = NA + ND es
el numero total de portadores en el silicio. Si la corriente
transitoria Ip (t) es cero, entonces se vuelve a (11) y el par
cruzado trabajara normalmente.
Para este VI, solo uno de los transistores del par cruzado
sera afectado por el SET, el transistor de transconductancia
gm2 . As, adicionalmente a los cambios causados en la corriente de drenador y la movilidad, son tambien afectadas las
tensiones de drenador los transistores del par cruzado. Este
efecto puede ser aproximado de la siguiente manera:
(10)
donde Rp1 y Rp2 son las resistencias paralelas de los Tanques

conectados al arreglo y gm1 and gm2 son las transconductancias de los transistores del par cruzado. Originalmente, el
circuito es desarrollado para cumplir con el criterio de (10).
Vdf i =
gm1 + gm2
Vdi
2 (R
gm1
p1 + Rp2 )
(13)
donde Vdi el la tension de drenador del transistor i sin falla

inserida y Vdf i es la tension de drenador del transistor i con
falla. Esta ocurre solo si la igualdad (10) no se cumpliese.
En la figura 6a, Rp1 y Rp2 son las resistencias en paralelo
asociadas a cada Circuito Tanque, y Vdrain1 y Vdrain2 son
las tensiones de salida de cada Tanque. La entrada I es la
corriente de drenador de cada transistor y Inoise es la entrada
de la corriente transitora Ip (t) debida al SET. Las entradas
Vbulk y Vsource sirven para definir las tensiones de cuerpo
y fuente de los transistores respectivamente. La entrada Gain
define el Factor de Ganancia del Par Cruzado (por defecto
es 10). Las salidas Vds2 y Vds1 muestran las tensiones
drenador-fuente de cada transistor, y las salidas Vout2 y Vout1
muestran las mismas tensiones Vds2 y Vds1, pero con el
desplazamiento especificado en Vsource.
(a) Instrumento Virtual
Y R ESULTADOS
III. S IMULACI ON
En este caso, se simulara un circuito VCO de 250nm
afectado por un pulso SET con una carga de 37.125pC usando
las herramientas desarrolladas. Este pulso fue simulado usando
la ecuacion de corriente transitoria y de carga colectada (1) (en
este caso I0 =25mA, R =15ps y F =1.5ns). Las partculas
alfa que impactan contra los circuitos analogicos contribuyen
con poca carga (50fC aproximadamente), que es interpretado
como ruido rosado de corta duracion. Para observar los efectos
de estas fallas sobre estos circuitos es necesario simular pulsos
de gran duracion de gran carga superior a los 10pC.
Un VCO es un circuito analogico que genera una senal
periodica (usualmente senoidal), cuya frecuencia es controlada
por una tension de sintona (la cual es tpicamente menor que
la tension de polarizacion). El circuito es mostrado en la figura
7. Este circuito usa un arreglo de Par Cruzado de Transistores
(formado por los transistores M1 y M2) y de Circuitos Tanque.
Tambien se usa un transistor en modo de resistencia activa
(M3) y un circuito de Espejo de Corriente (formado por los
transistores M4, M5 y M6).
(b) Esquematico
Figura 6.
NMOS Cross Pair PtByPt.vi
Para este caso en particular, la transconductancia de cada

transistor gm = gm1 = gm2 es:
gm =
2k
W
I
L
(11)
donde k = n ox /tox e I es la corriente de drenador de los

transistores. Si no se cumpliese al igualdad en (10), entonces
el circuito no funcionara como debera. Esto puede obtenerse
variando la transconductancia de alguno de los transistores del
par cruzado. Como el SET es representado por una corriente
transitoria que afecta a la corriente de drenador y al numero
89
ISSN 977-2177-128009

2.5
VCO output (V)
2
1.5
1
0.5
0
0
Figura 9.
5n
10n
15n
Time (s)
20n
25n
30n
Simulacion del Circuito VCO afectado por un pulso Laser.
Figura 7. Esquematico del VCO. El Rayo muestra el transistor afectado por

la falla SET.
Para esta simulacion se uso una inductancia de 10nH para

cada Tanque y una tension de alimentacion de 2.5V. El control
de tension de sintona fue ajustado a 1.79V. Esto genera una
senal senoidal de 1.15GHz al igual que en el experimento
original [4]. El pulso de Laser asncrono fue simulando usando
un pulso SET de 37.125pC. Este pulso es aplicado al transistor
M2 en la figura 7 [4]. El BD de esta simulacion es mostrado en
la figura 8. La figura 9 muestra los resultados de la simulacion,
que son comparados con los resultados obtenidos por [4], los
cuales son mostrados en la figura 10.
(a) Experimento Real con Laser (10)
(b) Simulacion en Spectre

Figura 10.
Figura 8.
Laser.
de las capacitancias parasitas en el modelo del transistor.

De acuerdo con los reportes de la experiencia original, la
falla afecto la oscilacion de salida del VCO en 12 periodos,
mientras que la hecha para esta simulacion afecto 14 periodos
de dicha senal. Tambien se noto la influencia que tiene tox , la
cual afecta a la amplitud de la senal simulada: una disminucion
del 19% de tox significa una disminucion del 5% en la
amplitud de la senal del VCO, esto tiene una relacion muy
cercana con el diseno de los transistores para determinadas
condiciones de funcionamiento (corners) [10]. Tambien se
hizo un estudio del Espectro de Fourier de la senal con y sin
pulso SET inyectado para observar su Firma de Decoloracion
BD de la simulacion del Circuito VCO afectado por un pulso
La figura 9 muestra la simulacion hecha con las herramientas desarrolladas en LabVIEW. Esta simulacion es muy
parecida a la experiencia de laboratorio realizada por W. Chen
et al. en el Laboratorio de Microelectronica de la Universidad
de Bordeaux [4], la cual es mostrada en la figura 10a. Esta
comparacion es totalmente cualitativa debido a la no inclusion
Resultados obtenidos por Chen et al. [4]
90
ISSN 977-2177-128009
Se comprobo que LabVIEW es una excelente herramienta

para analisis y simulacion de fenomenos fsicos y electricos.
Las herramientas que provee este entorno de programacion
son de gran ayuda para el analisis de datos, graficas 3D y
analisis del Espectro de Fourier. Se debe tener cuidado con la
precision de los datos usados y las ecuaciones usadas para las
simulaciones.
V. T RABAJO F UTURO
Figura 11.
Se debe utilizar otro modelo de transistor MOS para poder

hacer estudios de fallas transitorias sobre dispositivos menores
a los 250nm, como el modelo ACM [11], [12], la cual
incluye las capacitancias parasitas que son necesarias para una
simulacion mas cercana a la realidad.
Se desarrollaran mas VIs para hacer simulacion de los
efectos de los SET en circuitos como Pares Diferenciales,
Amplificadores Miller y Cascodo, etc., a fin de proveer de
suficientes herramientas para cubrir la mayor cantidad de casos
de fallas por pulsos transitorios.
Se desarrollaran VIs que cubran tambien las fallas causadas
por la Dosis Total de Ionizacion e integrarlas en las herramientas de fallas transitorias ya desarrolladas para generar
un modelo unificado de fallas.
Se crearan VIs propios para realizar funciones de analisis
para que estas herramientas puedan correr en cualquier distribucion de LabVIEW, desde la version 8.20 estudiantil en
adelante.
Firma de Decoloramiento del CMOS VCO de 250nm simulado.
(Fading Signature). A traves de esta simulacion, se determino

que la ampliacion en el espectro de frecuencia corresponde
a una disrrupcion temporal en la salida oscilante del VCO
[4]. Mas aun, esta ampliacion tiene una correlacion con la
duracion de la disrrupcion de la senal en el dominio del
tiempo. La Firma de Decoloracion indica el efecto de la carga
total depositada en el dispositivo y da una idea de la magnitud
de la Transferencia Lineal de Energa (LET) de la partcula
incidente. La figura 11 muestra el espectro de la senal de
salida del VCO simulado, mostrando la influencia del pulso
transitorio, desde 0 hasta 100pC. El incremento de carga hace
que el espectro cercano a la frecuencia de oscilacion se ample.
Este fenomeno depende de la carga inyectada en el VCO. Se
puede ver que la falla por SET se vuelve significativa a partir
de los 90fC de carga inyectada por un SET de R = 15ps y
F = 1.5ns.
R EFERENCIAS
[1] R. Velazco, P. Fouillat, and R. Reis, Radiation Effects on Embeded
Systems. Dordratch, the Netherlands: Springer, 2007.
[2] G. C. Messenger, Collection of charge on junction nodes from ion
tracks, IEEE Transactions of Nuclear Science, vol. NS-26, no. 6,
December 1982.
[3] R. Caverly, CMOS RFIC Design Principles. Norwood, Massachusetts
02062: Artech House, Inc., 2007.
[4] W. Chen, V. Pouget, H. Barnaby, J. Cressler, G. Niu, P. Fouillat,
Y. Deval, and D. Lewis, Investigation of single-event transients in
voltage-controlled oscillators, Nuclear Science, IEEE Transactions on,
vol. 50, no. 6, pp. 2081 2087, dec. 2003.
[5] J. Travis and J. Kring, LabVIEW for Everyone : Graphical Programming
Made Easy and Fun, 3rd ed. Prentice-Hall PTR, 2006.
[6] G. W. Johnson and R. Jennings, LabVIEW Graphical Programming,
4th ed. New York, USA: McGraw-Hill, 2006.
[7] W. E. Calienes Bartra, Ferramentas para a simulaca o de falhas transientes, Masters thesis, Universidade Federal do Rio Grande do Sul,
Diciembre 2011.
[8] J. M. Rabaey, A. Chandrakasan, and B. Nicolic, Digital Integrated
Circuits, a Design Perspective, 2nd ed. Upper Saddle River, New
Jersey 07458: Pearson Education, Inc., 2003.
[9] K. Kano, Semiconductor Devices. Upper Saddle River, New Jersey
07458: Prentice Hall, 1998.
[10] J. M. Rabaey, A. Chandrakasan, and B. Nicolic, Spice model
level 49 for 0.25 micron cmos process, Berkeley Wireless
Research Center, Berkeley University, 2003, disponible en:
http://bwrc.eecs.berkeley.edu/icbook/models.htm.
[11] C. G. Montoro and M. C. Scheider, Mosfet Modeling for Circuits
Analysis and Design. London, Great Britain: World Scientific, 2007.
[12] A. Cunha, M. Schneider, and C. Galup-Montoro, An mos transistor
model for analog circuit design, Solid-State Circuits, IEEE Journal of,
vol. 33, no. 10, pp. 1510 1519, oct 1998.
IV. C ONCLUSIONES
Fue simulado un VCO CMOS de 250nm inyectando un SET
para observar el comportamiento de este tipo de circuito ante
los pulsos transitorios y estos resultados fueron comparados
con los obtenidos en [4], encontrando grandes similitudes
cualitativas con la experiencia experimental.
La Firma de Decoloracion es una herramienta u til para
predecir lo que ocurrira con el circuito oscilador cuando es
afectado por un pulso transitorio. El analisis de esta caracterstica tambien predice el nivel de armonicos que afectan al
circuito y la carga crtica necesaria para considerar un SET
como falla que afectara al dispositivo.
91
ISSN 977-2177-128009
Implementacin De Una Celda CNN Digital en FPGA

para la Aplicacin en Deteccin de Bordes
Ma. del Roco De Jess Ventura, Luis Hernndez
Martnez, Mnico Linares Aranda
Isaac Esa Jimnez Bentez
Departamento de Electrnica
Instituto Nacional de Astrofsica ptica y Electrnica
Sta. Ma. Tonantzintla, Puebla, Mxico
rventura@inaoep.mx
Coordinacin de Ingeniera en Electrnica

Instituto Tecnolgico de Puebla
Puebla, Puebla, Mxico
ResumenEl presente artculo propone el diseo de una celda

para una Red Celular No Lineal del tipo Simplicial (S-CNN) [3],
la cual tiene ventajas respecto a la CNN estndar, tales como la
solucin de funciones que no son linealmente separables y el no
uso de plantillas para su programacin. La celda fue diseada
usando lgica digital e implementada en FPGA donde se verifico
su rendimiento y desempeo. La celda fue programada para
procesar deteccin de bordes.
I.
INTRODUCCIN
El procesamiento de imgenes tiene un amplio espectro de

aplicaciones en diversos campos como lo son: Medicina,
Biologa, Teledeteccin, Arqueologa, Fsica, Astronoma,
Biometra, aplicaciones industriales, aplicaciones militares,
Robtica, etc., aplicaciones que comprenden y tratan un gran
nmero de datos, lo cual con lleva a mayor tiempo de
procesamiento. En el enfoque tradicional, comnmente
utilizado en computadores y microprocesadores, el
procesamiento de datos se realiza con sus consecuentes
limitaciones; estas limitaciones han sido resueltas en gran
medida por medio de pipeline, arreglos en paralelo, redes
neuronales, por mencionar algunas. Son en stas ltimas en
donde se enfoca el presente trabajo.
Entre los tipos de redes neuronales se encuentra la Red
Celular No Lineal estndar (CNN) la cual fue introducida por
primera vez por Chua y Yang [1, 2], y cuya caracterstica
principal es la de estar formada por arreglos espaciales de
clulas que poseen conexiones locales, cuentan con una
estructura homognea, tienen la misma dinmica con una
configuracin idntica de entradas, salidas e interconexiones;
todas las clulas evolucionan en paralelo y cada estado es
independiente salvo por la interaccin que fija la
conformacin de la red. La vecindad o esfera de influencia de
una celda es una caracterstica de la CNN, denotada como Sij,
tiene que ver con el conjunto de celdas involucradas en la red,
tal como se muestra en la Fig. 1.
Figura 1. Esfera de influencia de una celda y sus ocho elementos ms

cercamos.
La esfera de influencia de la figura 1 se puede describir

como:
,
,,
,,
(1)
La expresin descrita en (1) aplica tambin para la CNN

del tipo Simplicial, S-CNN, sobre la cual se basa el desarrollo
de este artculo. El rendimiento de una red depende en gran
medida del buen diseo de la celda bsica y que su tamao
sea pequeo, ya que sta se replicar n veces,
correspondiendo a mayor nmero de celdas, mayor capacidad
de pixeles a procesar en una imagen.
Con este fin se implementaron dos celdas simplicial en un
FPGA Spartan-3E.
Para llegar a los resultados a largo de este artculo, se
aborda brevemente el marco terico sobre la CNN estndar y
la S-CNN en la primera seccin. La segunda seccin expone
el algoritmo de solucin de la S-CNN. La tercera muestra las
simulaciones en Matlab de la celda S-CNN programada para
deteccin de bordes. La cuarta seccin da los detalles de la
92
ISSN 977-2177-128009
0,1 es el estado de la clula Ci,j; N y

entradas en Si,j; , (t)
M son el total de filas y columnas de la S-CNN y n es e1
nmero de clulas que integran la esfera de influencia, donde
m = 2n es la dimensin del espacio conjunto de entradas y
salidas.
La funcin F es una funcin PWL definida sobre una
particin simplicial del dominio constituido por el espacio
conjunto de entradas y salidas ( ) de la CNN. Un dominio
simplicial, como su nombre lo indica, est dividido en
smplices, y un smplice es la generalizacin de un tringulo
como figura geomtrica en , o de una pirmide en , a un
espacio de dimensin genrica. En cada smplice la funcin
PWL es una funcin lineal afn que se puede representar con
un hiperplano, vlido solamente sobre este smplice. El
hiperplano queda determinado por la combinacin convexa
de los m + 1 valores de funcin en cada vrtice del smplice.
Los vrtices son los puntos dentro del dominio definidos por
las intersecciones de dimensin cero de la particin. Ms
detalles de esta propuesta se encuentran en [5]
La funcin F que describe la S-CNN se expresa como la
combinacin lineal:
simulacin y la implementacin de las celdas S-CNN en el

FPGA.
Por ltimo la quinta y sexta seccin presentan las
conclusiones y las referencias respectivamente.
REDES CELULARES NO LINEALES CNN
II.
A. Expresin de la CNN estndar

La estructura de CNN estndar es la ms utilizada dentro
de la literatura. La ecuacin de estados que rige la dinmica
de una celda incluyendo los vnculos locales con otros
elementos de la red, es la siguiente [4]:
,
(2)
y B :
donde los operadores A :
representan las sinapsis de "feedback" y "feedforward" ,
y las
respectivamente, que relacionan las salidas
de las clulas de la esfera de influencia con el
entradas
estado de la clula Ci,j. En el caso de la CNN estndar,
,
,
y bi,j
donde ai,j
es un punto en el espacio conjunto de

donde
es el vector de coeficientes de la
entrada y salida;
funcin F en la base , donde q = 2m es el nmero de vrtices
de la particin simplicial. En esta expresin el vector de
parmetros coincide con los valores de la funcin en cada
vrtice. Esta informacin se puede almacenar en una tabla de
memoria donde a cada entrada de la memoria se le asocia un
vrtice y el contenido de la localidad es el valor de la funcin
evaluada en el vrtice. El clculo de la funcin para un punto
arbitrario que pertenece a un smplice, requerir solo de la
evaluacin de q+1 funciones de la base, y sus coeficientes
asociados, que se extraen de la memoria. A su vez, cada
funcin de la base tiene una expresin sencilla de calcular.
Las S-CNN poseen un campo de accin mucho mayor que
las CNN estndar al no poseer la limitacin de representar
solo funciones de variables separables en la relacin de
entrada salida de estado estacionario.
son coeficientes de peso y

1,
,
1,
,
,
,
1
1
(3)
es una funcin del tipo saturacin que se aplica sobre el

es un valor de referencia.
estado de la clula, y zi,j
La CNN estndar posee funcionalidad limitada debido a la
linealidad de su ecuacin de estado; si se analiza la relacin de
entrada salida de estado estacionario, se encuentra que solo es
capaz de representar funciones linealmente separables.
III.
B. Expresin de la CNN Simplicial SCNN

La S-CNN es un caso particular de CNN presentada en
[4]. La dinmica de la S-CNN est definida por una relacin
lineal a tramos o PWL (Piecewise-Linear) en la sinapsis de la
clula. La ecuacin que rige el comportamiento de la S-CNN
es:
t =F y
,
,
t =
(t),
t ,u
1, ,
,
1, ,
(5)
ALGORITMO DE LA SCNN
La S-CNN se calcula con el sistema dinmico (4), de este

modo, se puede resolver F para cada una de las celdas por
medio de descomposicin convexa [6]. Con este
procedimiento y con los coeficientes asociados se realiza una
interpolacin y de esa manera se obtiene el valor final. En las
siguientes lneas se detalla un ejemplo de esta metodologa,
en donde para evaluar una funcin en un punto determinado,
se hace uso del smplice.
(4)
V3=(0,1)
V2=(1,1)
p=(0.8,0.3)
donde F :
es una funcin lineal a tramos; y , t
0,1 es el vector de salidas de las clulas pertenecientes a la
0,1 es el vector de
esfera de influencia Si,j; u , t
V0=(0,0)
c0=f(V0)=0
c1=f(V1)=1
c2=f(V2)=0.8
c3=f(V3)=0.9
V1=(1,0)
Figura 2. Ejemplo de evaluacin de un punto en el smplice.
93
ISSN 977-2177-128009
SIMULACIONES Y RESULTADOS DE LA
IMPLEMENTACIN
IV.
con los valores de cn se procede a interpolar para obtener el
resultado del punto p (figura2), esto se hace por medio de su
descomposicin convexa, en los siguientes pasos:
1.
Se toma el componente menor diferente de cero del

punto p = (0.8, 0.3), es decir, 0.3 y se procede a
descomponer a p de la siguiente manera:
p
2.
0.5
0
G (U ) = u5 u1 + u2 + u3 + u4 + u6 + u7 + u8 + u9
Nuevamente se toma el menor diferente de cero del

vector que se est sumando y repetimos el procedimiento
hasta hallar el vector nulo.
0.5
1
0
1
1
0.5
0
0
1
0
0.2
a)
b)
0
0
(6)
cumpliendo con:
Figura 3. Deteccin de bordes de una imagen,

a) Original y b) deteccin de bordes escala de grises.
A travs de los resultados se observ un comportamiento

satisfactorio del algoritmo.
Para una implementacin ms completa en cuanto a los
procesamientos de imgenes posibles a realizar, se tendran
que implementar un mayor nmero de bibliotecas y tablas,
que contengan las funciones booleanas con sus respectivas
evaluaciones para cada procesamiento como en [7].
son los coeficientes o

donde
es un vector dado,
son los valores de la funcin
pesos encontrados y
correspondientes a cada vrtice hallado.
Los vrtices encontrados del simplce y los valores
asociados a ellos son:
1
1
f([0,0]) = c0
f([1,0]) = c1
f([1,1]) = c2
6.
(7)
Est es una descomposicin convexa de la forma:

F
5.
Las evaluaciones de (7) se almacenan en una tabla, de la

cual se tomarn los valores asociados a los vrtices, que en
conjunto con la obtencin de los coeficientes o pesos por
medio de la descomposicin, resolvern a (6) para cada celda.
Los resultados de las simulaciones del programa realizado en
Matlab, se muestran en la figura 3.
El ltimo valor es la suma de los valores menores

anteriores menos uno, dado que el smplice usado es
unitario.
0.3
4.
1
1
0.3
0.5
0
3.
A. Matlab
La deteccin de bordes es un procesamiento bsico en el
anlisis de imgenes, el cual es ampliamente utilizado en
sistemas de visin. Este tipo de procesamiento es el que se
aplic para la comprobacin del algoritmo descrito en la
seccin anterior. La funcin boleana que resuelve para un
pixel dado, si es o no es borde es la siguiente [7]:
1
0
B. Xilinx
Se realizaron dos diseos de celdas simplicial en VHDL,
basados en la metodologa usada en las simulaciones de
Matlab. Las celdas S-CNN fueron simuladas, sintetizadas, e
implementadas a travs de las herramientas de Xilinx. Las
Figuras 4(a) y 4(b) muestran los diagramas RTL del primero y
segundo diseo respectivamente.
0
0
0
1
0.8
Los recursos utilizados por ambos diseos se muestran en

las Tablas I y II, de estas se puede observar que el porcentaje
de LUTs en uso es del 51%, para el primer diseo (Tabla I)
mientras que para el segundo diseo es de 8% (Tabla II).
La funcin es lineal dentro de un smplice, por lo que

aplicando la ecuacin (6) se obtiene el siguiente
resultado:
0.3
0.8
0.5
0.2
Se hicieron simulaciones para diferentes valores de

entrada. En la Fig. 5 se muestra los resultados de la simulacin
para el vector de entrada p1=[u1=182, u2=178, u3=171, u4=184,
u5=183, u6=179, u7=192, u8=197, u9=195], sobre el segundo
0.74
94
ISSN 977-2177-128009
diseo. El resultado obtenido es de 241, que es el valor

esperado.
TABLA II.
SEGUNDO DISEO DE CELDA S-CNN
Figura 5. Simulacin de la implementacin en FPGA del vector de entrada

p1.
a)
C. Resultados de la Implementacin
Figura 6. Tarjeta con FPGA Spartan-3E.

b)
En la Fig. 6 se muestra una imagen de la tarjeta Spartan3E XC3S500E con la implementacin del segundo diseo de
la celda, en la cual se est procesando el mismo ejemplo de la
Fig.4. Como resultado se obtiene el valor de 241 en binario
(11110001).
Figura 4. a) Diagrama RTL del primer diseo, b) Diagrama RTL del segundo
diseo.
Comparando el segundo diseo respecto al primero,

vemos una reduccin considerable de recursos. La
optimizacin de la celda se llevo a cabo por medio de la
reutilizacin de bloques, as como la inclusin de bloques de
control. En la Tabla III se puede apreciar los bloques de cada
uno de los diseos implementados.
TABLA I.
PRIMER DISEO DE CELDA S-CNN
95
ISSN 977-2177-128009
Bloques
Mquinas de estado
Multiplexores
Contadores 0 a 8
Contadores 0 a 10
Contadores 0 a 110
Mdulos
descomposicin
Memorias ROM
TABLA III.
Diseo 1
0
0
0
1
1
9
Diseo 2
3
1
1
1
1
1
aumenta siendo de 2.020s para el primer diseo y de 2.43s

para el segundo diseo.
En cuanto al trabajo a futuro se est terminando el diseo
de la red completa, la cual contar con una interface con
Matlab para la entrada y salida de la imagen.
REFERENCIAS
[1]
[2]
RESUMEN DE BLOQUES DE CELDA S-CNN OPTIMIZADA
V.
[3]
[4]
CONCLUSIONES
En este trabajo se vieron las caractersticas de la S-CNN

las cuales son motivo de estudio, como lo son la sencillez de
solucin, el ser una CNN de relativa reciente publicacin, el
tener un margen ms amplio en la representacin de
funciones y el hecho de no requerir plantillas.
Los diseos de celdas S-CNN presentados en este trabajo
muestran la efectividad del algoritmo de solucin de la SCNN, los cuales son un comienzo para un estudio ms
profundo del comportamiento de este tipo de redes.
En cuanto los resultados de la optimizacin de la celda en
el segundo diseo, el ahorro de recursos es considerable
respecto al primer diseo, pero el tiempo de procesamiento
[5]
[6]
[7]
96
L.O. Chua and L. Yang. Cellular neural networks: Applications. IEEE

Transactions on Circuits and Systems, 35(10):1273-1290, 10 1988.
L.O. Chua and L. Yang. Cellular neural networks: Theory IEEE
Transactions on Circuits and Systems, 35(10):1257-1272, 10 1988.
P. Julian, R. doragu, y L.O. Chua, A piecewise-linear simplicial
coupling cell for CNN gray-level image procesing, IEEE Trans.
Circuits Syst. I, vol. 49, pp. 904-913, July 2002.
P. S. Mandolesi, Cmara CMOS Programable con procesamiento
paralelo sobre el plano focal, Tesis de Doctorado, Universidad
Nacional de Sur, Baha Blanca, Argentina, Junio 2007.
P. S. Mandolesi, P. Julin, and A. G. Andreou, A Simplicial CNN
Architecture For On-Chip Image Processing, ISCAS 2004.
M. Chien and E. Kuh, Solving nonlinear resistive networks using
piecewise-linear analysis and simplicial subdivision,IEEE Trans.
Circuits Syst. I, vol. CAS-24, pp. 305317, June1977.
P. S. Mandolesi, P. Julin, and A. G. Andreou, A Scalable and
Programmable Simplicial CNNDigital Pixel Processor Architecture,
IEEE Transactions on Circuits and Systems I: Fundamental Theory and
Applications, vol. 51, no. 5, pp. 988-996, May 2004.
ISSN 977-2177-128009
Real-Time Image Processing based on Neighborhood Operations using FPGA
J.Y.Mori
C.Sanchez-Ferreira, C.H.Llanos
Automation and Control Group

Department of Mechanical Engineering
Faculty of Technology-University of Brasilia
Brasilia, Brazil
jonesyudi@unb.br
Graduate Program in Mechatronics Systems

Department of Mechanical Engineering
Faculty of Technology-University of Brasilia
Brasilia, Brazil
sanchezfer@unb.br, llanos@unb.br
additional restriction, namely the real time processing

requirement. This means that the processing time is critical
for both the system reliability and usability.
The use of General Purpose Processors - GPPs (based on
Princeton or Harvard architectures) is not suitable for such
applications (e.g. image processing implemented in
embedded systems) [3]. These architectures require high
frequency of operation to meet a required throughput,
resulting in high power consumption. The best performance
approach for implementing such systems is designing them
in an ASIC, although the cost and time until prototyping are
the main drawbacks. The use of FPGAs offers a good
processing alternative because it allows the exploration of
the inherent parallelism of the algorithms [1], [9].
Several image processing algorithms make use of
neighborhood operations, which are those carried out over
very specific regions of the image. These operations perform
a sweep over the image, making use of a process called as
windowing. These algorithms usually appear at the beginning
of the image processing chain. Although these techniques are
straightforward, they operate over the whole image,
demanding long time to be executed [13]. Due to the
usefulness in many different applications and taking into
account the requirement of a high performance, such
algorithms are recommended to be implemented directly in
hardware [17].
In this paper we have selected some of the most common
algorithms in image processing (which use neighborhood
operations) in order to analyze their structure in an integrated
FPGA implementation. To do that, a low cost FPGA
platform has been used, for exploring both the intrinsic
parallelism of the algorithm and the possibility of a pipelined
implementation. The tests and comparisons performed
against the software implementation (in a standard PC) have
shown a speed up factor of 55. Additionally, the performance
results have demonstrated the flexibility and feasibility of
using FPGAs as an alternative to common processor
architectures in embedded systems.
This paper is organized as follows: section II presents
some related work. Section III provides a background on
pre-processing of images. Section IV shows the architectures
extracted from the algorithms shown in section III. Section V
presents synthesis results and makes a comparison between
the proposed system and related costs. Section VI shows the
final comments and conclusions of this work.
Abstract This paper describes a system for processing images

in real time implemented on a low-cost FPGA commercial
platform. The platform is composed of a CMOS camera, a
development kit with an FPGA and a LCD. The camera is
configured for sending the captured pixels one by one through
a parallel bus of three channels (red, green, and blue). The
camera also provides a pixel clock for synchronization. The
FPGA receives both the flow of pixels and the clock signal,
achieving the image processing tasks. A new stream is
generated (processed pixels) and sent through another bus to
the LCD, allowing visualization of the processed image. Some
of the most used algorithms for image processing in spatial
domain were selected for implementing in the system. This
work is focused on implementing several algorithms based on
neighborhood operations. The selected algorithms were
analyzed to identify their basic operations. A complete
processing chain was tested, reaching a speed up factor of 55
comparing with the same algorithms running in software (on a
PC).
Image Processing; FPGA; Real-Time; Computer Vision
I.
INTRODUCTION
The market for image, video and sound processing tasks,

(known as multimedia) has had an accelerated growth in the
last decade. This phenomenon had demanded a fast
development of new technological solutions, in which
several restrictions must be attended such as high
performance,
resolution,
bandwidth,
low
power
consumption, portability, low cost, among other. In order to
attend the requirements new techniques and design
methodologies need to be developed [4]. Each day new
products are launched in the market and the competition for
costumers is high, in which both the flexibility and low
development costs of the solutions makes the difference. In
order to achieve a suitable tradeoff, many companies are
expending resources in order to develop hardware/software
architectures capable to attend these requirements [11].
In this context, image processing requires expensive and
complex hardware platforms, and frequently needs to be
developed in embedded systems. On the other hand,
embedded systems have strong restrictions in size and power
consumption, requiring more sophisticated hardware designs.
In the case of some specific applications such as mobile
robotics, UAV, smart sensors, biomedical applications,
consumer mobile devices, among other, there exist an
97
ISSN 977-2177-128009
II.
with a predefined threshold value. If pixel value is greater

than or equal to the threshold the pixel value is converted to
white value. Otherwise, it is transformed into black value,
see (2).
RELATED WORK
The use of FPGAs to accelerate critical tasks has been

studied since early 1990s. One of the first implementations
in FPGAs was the Splash2 reconfigurable computer. In this
implementation a systolic array was used to calculate the
convolution. In this approach the Splash2 was able to
achieve a throughput of 26 MPixel/sec [10]. In [7] a platform
for real time capturing and processing images suitable to
control applications is proposed, and a correlation operation
has been implemented for tracking objects with a
performance of 14 MPixel/sec. In [4] an architecture for
spatial convolution with a throughput of 200 MPixel/sec
(3x3 masks) has been described. Reference [15] shows an
speed up of processing stereo images using edge detection
and SAD (Sum of Absolute Differences) for disparity map.
In [8] several algorithms have been implemented in FPGA,
achieving performances between 6 and 149 times faster than
their software versions. Reference [5] describes a system for
calculating distances using omnidirectional vision. In [14] an
approach to evolutionary hardware with an FPGA
implementation of techniques for extracting features in
images has been proposed. Additionally, Reference [6]
focuses on the implementation details of convolution.
In summary, several studies have been published about
the acceleration of image processing in FPGAs. Most studies
in the literature do not work with the acquisition process and
image display. Otherwise, few studies have been carried out
focused on real time processing. In this context, the main
contribution of this work is related to the implementation of
a complete pipelined system, involving the steps of
capturing, processing as well as image visualization,
achieving high performance and low cost of the
implementation.
III.
C. Rank Order Filtering

Rank order filters are based on neighborhood operations
and the basic idea is to sort the pixel values from a defined
neighborhood. Depending on the chosen algorithm, one can
obtain different results, as shown in Table 1.
The algorithms shown in Table 1 are also known as
morphological operations. Such algorithms are used in
image processing for noise suppression, feature enhancement
as well as contours extraction, among other.
TABLE I.
RANK ORDER FILTERING ALGORITHMS
Algorithm
Low-Pass Filter
Dilation
Erosion
Operations
The output pixel is the median value of
neighborhood.
The output pixel is the maximum value of the
neighborhood.
The output pixel is the minimum value of the
neighborhood.
Opening
Erosion followed by Dilation.
Closing
Dilation followed by Erosion.
Morphological
Smoothing
Morphological
Gradient
Opening followed by Closing.

Closing followed by Opening.
BACKGROUND
D. Convolution/Correlation
Convolution and correlation are well-known operations
in signal processing area (in this case the images can be
considered as two-dimensional signals), and they are widely
used for image filtering. The operations of convolution and
correlation are described by (3) and (4), where i(x, y) and k(x,
y) are the image and filter, respectively.
This section presents the selected algorithms to be

implemented in our architecture. Additionally, several
characteristics related to the parallelism as well as the
pipelined structure are also discussed.
A. Color Convertion
Many image processing algorithms work on gray-scale
images and, in this context, several scientific and industrial
cameras capture images directly in this pattern [2]. The
camera (that is used in this work) captures color images
(RGB), so that the initial algorithm converts the color image
to grayscale. This conversion can be done by calculating the
average of the three color channels, as shown in (1).
E. Binary Morphology
The binary morphology operations are similar to the
The operation of converting a color image to grayscale is
order filters (described in section III-C) and, in this case, the
a point operation and, therefore, the process has a complexity
main difference is that order filters only operate over grayof O (NM) (N and M are the image dimensions).
scale images. The binary morphology operations are made
over binary images (which has one bit per pixel). Such
B. Thresholding
images are obtained using the thresholding operation (see
Another point operation is the threshold, and this
section III.B). The most basic morphological operations are
algorithm compares the pixel value (already in grayscale)
the erosion and dilation, and both operate on a region of the
I = ( R + G + B ) / 3
98
ISSN 977-2177-128009
camera also sends a synchronization clock, indicating that a

new pixel is available on the bus. Based on this, the
architectures were conceived as systolic arrays, which are
synchronized by the pixel clock.
image, by using a mask called structuring element. This

element is a binary matrix that is slid over a desired region,
generating in this way a new pixel by means of logical
operations. Equations (5), (6), (7) and (8) show the logical
operations that occur (assuming 3x3 masks).
A. Color Conversion Architecture

The implementation of (1) requires a division, for which
an approximation was used as shown in (13) and (14).
F. Edge Detection
An edge is the boundary between two regions with
different properties. From the differential and integral
calculus it is known that the derivative of a function
determines its rate of change. An edge generally means an
abrupt change in the intensity of the pixels, e.g. a high rate of
change. In this case, the calculation of the derivative of the
image provides points of maxima, which probably are the
edges of objects (in an image). Images are two-dimensional
functions and, therefore, horizontal and vertical partial
derivatives can be independently used. The gradient operator
shown in (9) and (10) is a suitable tool for achieving this.
( )
First, a fixed value is attributed to (namely 10, in the

example). Then, an integer value for is calculated. It can be
observed that higher values of imply on a better division
approximation. The division is done by using a power of 2,
which can be achieved by a right shift operation. The
implementation of this architecture was achieved by means
of a three stage pipeline architecture (see Fig.1).
where the index j indicates the element of the mask and the
corresponding pixel in the image after the thresholding
process. The output is the value of a new pixel, which is also
binary. By performing both dilation and erosion in several
sequences (combinations) new operations such as opening,
closing, morphological smoothing and morphological
sharpening are achieved (see Table 1).
Figure 1. Pipeline architecture for color conversion.
Each stage executes the operation in one clock cycle and,

therefore, the conversion from color to grayscale is
accomplished pixel by pixel, synchronously with the camera
clock.
B. Thresholding Architecture
For implementing (2) a synchronous comparator is used,
whose output is "0" or "1" as shown in Fig.2.
Figure 2. The synchronous comparator for threshold operation.
This block is capable to perform the comparison,

enabling the output in a single clock cycle, maintaining the
system synchronization.
Equation (9) represents the magnitude of the gradient and

(10) the direction. Some approximations are commonly used
for approximating (9), as shown in (11) and (12).
C. Neighborhood Loader Architecture

The neighborhood size can be chosen according to the
application, and for the sake of simplicity a 3x3 pixel
(| | | |)
neighborhood has been used. The general structure for
providing the neighborhood is shown in Fig.3. The blocks of
the architecture are shift registers with bit width of one pixel.
It can be observed that using convolution masks (to
Every clock cycle a new pixel is received from the camera or
perform a derivative) the values of the gradients can be
a previous processing block. Therefore, a new neighborhood
easily calculated using (11) and (12).
is available for processing at each clock cycle.
|
IV.
THE PROPOSED ARCHITECTURES
The camera used in this work sends the images pixel by

pixel in a serial stream through three channels (R, G, B). The
99
ISSN 977-2177-128009
mask. Equations (3) and (4) show that the convolution mask
is obtained by rotating the correlation mask by 180 degrees,
and vice versa. Therefore, the same architecture can be also
used for calculating both operations. Figure 5 shows the
implemented architecture, in which the inputs are the outputs
of the neighborhood loader block (in [16] a similar
architecture for implementing the convolution is described).
The Fig.5(b) shows two basic operations (multiplication
and addition) together in a pipelined structured for achieving
these operations. The multiplication block performs 9
multiplication operations in parallel (using one clock cycle).
The addition block operates over the multiplication results,
yielding also the output in a single clock cycle. Otherwise,
the architecture is scalable according to the predefined size
of the neighborhood and also the number of the internal
multipliers in the FPGA.
Desired
Neighborhood
Buffered pixels
Pixels not loaded
by the camera yet
(a)
Discarded
pixel
Input pixel
from camera
Available 3x3
Neighborhood
(b)
Figure 3. (a) Input image. (b) neighborhood loader architecture.
D. Rank Order Architecture

This architecture is a systolic array of comparators (as
shown in Fig.4). The architecture inputs are the outputs of
the neighborhood loader block. Each comparator block is
synchronized using the same camera clock. Thus, after 9
clock cycles a 3x3 neighborhood is sorted. It can be observed
that the use of this arrangement is compatible with a 9-stage
pipeline, maintaining in this way the synchronization of the
overall implementation. In [8] some optimized arrangements
for determining the median there are proposed. In this work,
the arrangement of Fig.4 has been used due to the fact that it
provides both the maximum and minimum values.
F. Binary Morphology Architectures

Fig. 6 shows the implementation of the erosion operation,
in which the circuit of Fig. 6 (a) represents (5). Fig. 6(c)
implements (6), in which the processing blocks are
synchronized by the system clock (the camera clock).
Figure 6. Erosion implemented architecture.
Fig. 7 depicts the implementation of the dilation

operation. Fig. 7(a) shows the implementation of (7) and Fig.
7 (c) shows the implementation of (8). The dilation
processing architecture is also synchronized with the system
clock (each block being executed in a single clock cycle).
Figure 4. Systolic Array for sorting 9 numbers.
E. Convolution/Correlation Architecture
3x3 Neighborhood
K1
K4
K7
K2
X
K5
K3
X
X K8
K6
X
K9
X
X
X
Output
pixel
(a)
3x3
Neighborhood
Multiplication
Addition
clock
Figure 7. Dilation architecture
Convolution/
Correlation
result
Both the erosion and dilation operations are performed in

two clock cycles each and both have been implemented in a
pipeline structure.
(b)
Figure 5. Architectures for (a) convolution/correlation. (b) the

convolution/correlation module
G. Edge Detection Architectures

This section shows two architectures implementing the
equations (11) and (12). These equations receive as
parameters the values of image gradient in the directions x
The basic difference between the convolution and

correlation operation (for the same neighborhood) is the
100
ISSN 977-2177-128009
and y. It can be observed that the calculation of Gx and Gy

can be executed in parallel given that they are each other
independent. The two architectures (see Fig. 8) are designed
in a pipeline of two operations (each operation is performed
in a single clock cycle).
Gx
Absolute
Gy
Magnitude
Absolute
Comparer
clock
Gy
Absolute
Magnitude
(b)
Figure 8. Edge detection architectures based on gradient.
V.
Memory
Bits
DSP9x
9
Freq.
(MHz)
Initial
delay
Dilation
24
420.17
26
93.65
34
420.17
7471
138680
32
49.14
8045
In order to compare the results, a complete image

processing chain was implemented. This chain is composed
of all the algorithms described in this paper, as shown in
Fig.10. The Robinson operators are a set of 8 convolution
masks for calculating the gradient. The eight masks operate
in parallel over the same neighborhood, resulting in eight
simultaneous convolution operations (see Fig.11).
(a)
Gx
LC
Edge detection
(Fig.8a)
Edge Detection
(Fig.8b)
Chain for
comparison
Absolute
clock
Architecture
RESULTS
Camera
Image
acquisiton
architecture
Color
Conversion
Neighborhood
Loader
Median Filter
(Rank Order)
Neighborhood
Loader
Erosion
Neighborhood
Loader
Threshold
Maximum Value
(Rank Order)
Robinson
operators (9
convolutions)
Neighborhood
Loader
Dilation
Neighborhood
Loader
Erosion
Image display
architecture
FPGA
LCD
Figure 9. Development platform used in this work.
Figure 10. Complete chain for comparison.
All processing blocks have been implemented and tested

on a platform shown in Fig. 9, which comprises: (a) a DE2
development kit, (b) a CMOS camera model D5M and (c) an
LCD model LTM. DE2 kit have a low cost FPGA (Altera
Cyclone II EP2C35F672C6).
The camera can be configured to acquire up to 5 Mega
Pixel (real resolution), and the LCD has a resolution of
800x480 RGB pixels (24 bits). The overall system was
developed using VHDL and Verilog code, using the Altera
Quartus II software. Table 2 shows the synthesis results (for
each processing architecture).
After the calculation of the 8 convolutions, the highest

value should be selected and the others discarded. For
accomplishing this, the Rank Order module was used, which
can sort up to 10 entries. Fig. 12 shows a real test of
capturing, processing and visualization (see Fig. 10), which
was implemented with the platform depicted in Fig. 9.
TABLE II.
SYNTHESIS RESULTS
Architecture
LC
Memory
Bits
DSP9x
9
Freq.
(MHz)
Initial
delay
Entire FPGA
33,216
483,840
70
440
N/A
Color Conversion
94
420.17
Thresholding
11
154.34
Neighborhood
Loader
1201
12256
224.32
1603
Rank Order
3710
165.48
Convolution/
Correlation
469
36
283.13
Erosion
22
420.17
Figure 11. Robinson Operators for gradient determination.
For a evaluating the performance of the system, the same

processing chain was implemented in C language and run in
a standard PC (Pentium IV, 2.2GHz, 2GB RAM) with a realtime operating system (xPC Target OS from MathWorks).
The throughput of this software implementation was
1.129us/pixel (885.74 KPixels/second).
The hardware processing chain tested requires an initial
time delay necessary to full the pipeline. Considering the
initial delay (see Table II) and the processing chain (see
Fig.10), the total initial delay is 8045 clock cycles. The
101
ISSN 977-2177-128009
operating frequency of the complete architecture was

estimated in 49.14 MHz, which implies a throughput of
49.14 MPixel/sec. Therefore, the proposed architecture in
hardware has reached a performance 55 times faster than the
same implementation in software.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
(a),(b),(c),
(f),(e),(d),
(g),(h)
[7]
[8]
Figure 12. Sequence of processed images with the implemented system

(Fig. 9) and the chain of Fig.10: (a)Original, (b)GrayScale, (c)Median,
(d)Kirsch, (e)Threshold, (f)Erosion, (g)Dilation, (h)Erosion.
VI.
[9]
CONCLUSIONS
This work presented an integrated system for processing

images in real-time embedded on FPGA. The image
processing algorithms implemented are based on
point/neighborhood operations. Such algorithms are
generally present in many applications because they are basic
to the pre-processing of images. The use of a low cost
platform made possible to process and view images in real
time. Exploring both the intrinsic parallelism of the
algorithm and the possibility of a pipelined implementation,
the developed architectures show to be suitable for using in
embedded systems with real-time requirements. The tests
and comparisons performed with a common hardware
platform (standard PC) have shown a speed up factor of 55,
taking into account the software implementation.
Additionally, the performance results showed the flexibility
and feasibility of using FPGAs as an alternative to common
processor architectures in embedded systems. Finally, the
architectures developed are independent of the camera
capture rate and can be easily adapted to other FPGA
devices.
As future works, it is proposed a power analysis to verify
the energy consumption of the architectures. A library of
parameterizable modules will be developed based on this
work. This work serves also as a base to the implementation
of more complex algorithms like feature extraction and
pattern recognition ones.
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
G. Saldaa-Gonzlez, M. Arias-Estrada, FPGA based acceleration

for image processing applications, Image Processing, InTech, 2009.
Helio Pedrini, W.R. Schwarz, Anlise de Imagens Digitais
Princpios, Algoritmos e Aplicaes. Thomson, 2008.
J. Becker, R. Hartenstein, Configware and morphware going
mainstream, Journal of Systems Architectures, 2003.
J. Hongtu, Design issues in VLSI implementation of image
processing hardware accelerators, Doctoral Thesis, Lund University,
Sweden, 2007.
J.Y. Mori, D. Muoz, J.Arias-Garcia, C.H. Llanos, J.M.S.T. Motta,
FPGA-based image processing for omnidirectional vision on mobile
robots, 24th Symposium on Integrated Circuits and Systems Design,
in press.
J.Y. Mori, C. Sanchez-Ferreira, D. Muoz, C.H. Llanos, P. Berger,
An unified approach for convolution-based image filtering on
reconfigurable systems, Proceedings of VII Southern Conference on
Programmable Logic, pp. 63-68, 13-15 April 2011.
K. Shimizu, S. Hirai, CMOS+FPGA vision system for visual
feedback of mechanical systems,
Proceedings of the IEEE
International Conference on Robotics and Automation, May 2006.
M. A. Vega-Rodrguez, A. Gmez-Iglesias, J.A. Gmez-Pulido, J.M.
Snchez-Prez, Reconfigurable computing system for image
processing via the internet, Microprocessors and Microsystems, vol.
31, pp. 498-515, Elsevier, 2006.
Maya B. Gokhale, P.S. Graham, Reconfigurable ComputingAccelerating Computation with Field-Programmable Gate Arrays.
Springer, 2005.
N.K. Ratha, A.K. Jain, D.T. Rover, Convolution on splash 2,
Proceedings of the IEEE Symposium on FPGAs for Custom
Computing Machines, 1995.
N. Kehtarnavaz, M. Gamadia, Real-Time Image and Video
Processing: From Research to Reality, Morgan & Claypool, 2006.
R.C. Gonzalez, R.E. Woods, Processamento de Imagens Digitais.
Edgard Blcher, 2000.
R. Porter, Evolution of FPGAs for Feature Extraction, PhD Thesis,
Queensland University of Technology, 2001.
S. Hadjitheophanous, C. Ttofis, A.S. Georghiades, T. Theocarides,
Towards hardware stereoscopic 3D reconstruction a real-time
FPGA computation of disparity map , Design, Automation & Test
in Europe, pp. 1743-1748, 8-12 March 2010.
S. Wong, M. Jasiunas, D. Kearney, Fast 2D convolution using
reconfigurable computing, The Eight International Symposium on
Signal Processing and its Applications, 2005.
Thomas Brunl, Parallel Image Processing. Springer, 2001.
U. Meyer-Baese, Digital Signal Processing with Field-Programmable
Gate Arrays. Springer, 2007.
ACKNOWLEDGMENT
The authors would like to thanks CAPES Foundation for
the financial support of this work. A special thanks to Altera
Corp. by providing Quartus II Licenses and to DHW
Engenharia e Representao Ltda. for the partnership.
102
ISSN 977-2177-128009
Diseo De Controladores Difusos Utilizando

Algoritmos Genticos
M.Sc. Hernando Gonzlez Acevedo1
Grupo de Investigacin de Control & Mecatrnica - UNAB
hgonzalez7@unab.edu.co
potencial de evolucin [1]. En particular se discutan en esta
publicacin aspectos
relacionados con arquitectura,
integracin a diversas plataformas, lenguajes y algoritmos de
optimizacin, muchos de los cuales siguen teniendo vigencia
en la actualidad. Anualmente esta sociedad ha organizado
eventos acadmicos para presentar los desarrollos cientficos
en esta rea.
Para el diseo del software CACSD, se utiliza de forma
frecuente los algoritmos genticos, porque pueden realizar
bsquedas en espacios de hiptesis que contienen
interacciones complejas entre las distintas partes, donde el
impacto de cada una sobre la funcin de evaluacin es difcil
de especificar. Aunque no se garantice encontrar la solucin
estrictamente ptima, generalmente encuentra soluciones con
un buen grado de acierto. Ejemplos de proyectos para la
sintonizacin de controladores difusos, utilizando algoritmos
genticos, se encuentran en las referencias [2], [3], [4] y [5].
Abstract El siguiente artculo se centra en el desarrollo de

una metodologa para la definicin automtica de las funciones
de pertenencia para un regulador Fuzzy, utilizando algoritmos
genticos, dado un conjunto de reglas establecidas para el
regulador. La metodologa se valid en dos procesos de dinmica
diferente: un sistema de regulacin de caudal y un sistema de
control de velocidad de una transmisin hidrosttica.
Previamente se identific el modelo matemtico de cada proceso,
utilizando mtodos Eyeball y mtodos paramtricos,
respectivamente. Para el monitoreo y control de las variables se
desarroll un sistema SCADA en Labview.
Index Terms Controladores difusos, algoritmos genticos,
mtodos EyeBall, mtodos paramtricos.
I. INTRODUCIN
eneralmente, el diseo de un regulador se realiza para

cumplir con un conjunto de especificaciones que suelen
entrar en conflicto. Por ejemplo, conseguir que un
controlador sea robusto ante variaciones del modelo limita la
consecucin de altas prestaciones, o si se pretende limitar las
acciones de control se limitan tambin las prestaciones
dinmicas que se pueden conseguir. As que el diseo de un
controlador puede entenderse como la bsqueda del mejor
compromiso entre todas las especificaciones exigibles. Por
este motivo se ha venido investigando en el desarrollo de
herramientas computacionales con el fin de resolver estos
problemas de la manera ms sencilla y eficiente posible. Estas
herramientas de software deben atacar problemas de mayor o
menor complejidad: procesos multivariables con saturacin de
actuadores, variantes en el tiempo y susceptibles a diversas
fallas, entre otros.
El trmino CACSD (Computer Aided Control System
Design) se refiere en general a una tecnologa que incluye una
gran variedad de herramientas y plataformas computacionales
para el diseo de sistemas de control. Dicha tecnologa se
vincula de manera estrecha con diversas disciplinas tales
como teora de control, modelos matemticos, mtodos
numricos,
computacin
simblica,
optimizacin,
procesamiento de datos, ingeniera de software, interfaces
hombre-mquina, etc. En 1984 la institucin IEEE (Institute of
Electrical and Electronics Engineer) public un nmero
especial sobre CACSD en el cual se insista sobre su gran
II. SISTEMA DE REGULACIN DE CAUDAL

La universidad Autnoma de Bucaramanga cuenta entre sus
laboratorios con una planta piloto, en la cual se puede analizar
el comportamiento dinmico de diferentes variables, entre
ellas, un sistema de regulacin de caudal conformado por una
vlvula proporcional, referencia DM2453, que se regula con
una seal de 4 a 20 mA, y un sensor de caudal, referencia
DRG-1L35 (encoder), que entrega una seal de pulsos que se
convierte a unidades de Galn por minuto a partir de la
ecuacin 1, donde N es el nmero de pulsos en un periodo de
muestreo de un segundo y Q el caudal. En la figura 1 se
observa la ubicacin de la instrumentacin que hace parte del
sistema.
Ecu. 1
Q = 8750 * N 35
H. Gonzlez. Magister en Ingeniera Electrnica de la Universidad

Industrial de Santander. Profesor Asociado de la Universidad Autnoma de
Bucaramanga y miembro del Grupo de investigacin de Control&
Mecatrnica, adscrito al programa de Ingeniera Mecatrnica.
Fig. 1. Sistema de control de Caudal
103
ISSN 977-2177-128009
de referencia OMRON E2E2-X5MB y la visualizacin se

realiza a travs del equipo de referencia OMRON K3MA-F.
En la figura 3 se muestra una fotografa de la TH con todos
sus componentes del sistema.
A. Identificacin del sistema

La identificacin trata el problema de construir modelos
matemticos dinmicos a partir de datos obtenidos del propio
proceso. Los mtodos Eye Ball son aquellos mtodos con los
cuales se obtienen modelos a partir de mediciones, las cuales
son generalmente grficas; a partir de stas curvas se pueden
encontrar algunos parmetros que determinan las dinmicas
del proceso [6], ejemplo: el tiempo muerto d , la constante de
tiempo , el coeficiente de amortiguacin , la frecuencia de
resonancia, la frecuencia de corte, etc. Este tipo de modelos
no son muy precisos pero dan una aproximacin de las
caractersticas del proceso en sistemas que presentan tiempo
de establecimiento grande.
Para la identificacin del sistema se aplic como entrada al
proceso una seal escaln u(t) de amplitud de 10 mA, y se
registra la seal de caudal y(t) (fig 3). La respuesta transitoria
se aproxima a una funcin de segundo orden (Ecu 2), la cual
se discretiza con un periodo de muestreo de un segundo (Ecu
3).
0 .05266
Ecu. 2
G (s) =
e 13 s
Fig. 3. Transmisin Hidrosttica
A. Identificacin del sistema

Para un sistema lineal e invariante en el tiempo, un modelo
completo esta dado por la ecuacin 4, donde u(k) es la seal
que se aplica al proceso, y(k) la respuesta dinmica de la
variable y e(k) el vector de residuales.
s + 0 .3568 s + 0 .05917
G ( z) =
0 .02335 z + 0 .02073
z 13 z 2 1 .65 z + 0 .6999
Ecu. 3
y ( k ) = G ( q 1 )u ( k ) + H ( q 1 ) e ( k )
SISTEMA DE REGULACIN DE VELOCIDAD DE UNA

TRANSMISIN HIDROSTTICA (TH)
G (z) =
La transmisin hidrosttica la conforma una bomba

hidrulica de pistones axiales Rexroth A4VG TN-28 de
desplazamiento variable con mando elctrico EP-2, y un
motor hidrulico de pistones axiales con eje quebrado Rexroth
A2TM TN-10 de desplazamiento fijo. Para modificar la
velocidad de la transmisin, a partir de una seal de 0 a 5
voltios, se desarrollo una etapa de potencia que modifica
proporcionalmente la corriente de una bobina, la cual a su vez
vara el ngulo de inclinacin de una placa provocando el
desplazamiento de los pistones, aumentando o disminuyendo
el flujo de salida de la bomba. La velocidad de giro del motor
se registra por medio de un detector de proximidad inductivo
Ecu. 4
El mtodo ms fcil para parametrizar G(q1) y H(q1) es

tomar funciones racionales, en las cuales el numerador y
denominador son polinomios y los coeficientes de estos
polinomios son los parmetros [6].
Para obtener el modelo matemtico de la transmisin
hidrosttica se aplica una seal PRBS de amplitud entre cero y
cuatro voltios, y se registra la velocidad (fig 4) a un periodo
de muestreo de 125 ms. A partir de la toolbox Ident de
Matlab, se trabajaron cuatro modelos paramtricos: ARX, BJ,
ARMAX y el OE. Para cada uno se analizaron aspectos como
la respuesta ante una entrada escaln, la autocorrelacin de los
residuales y la correlacin de los residuales con la seal de
entrada. El mejor modelo que se adapta a la dinmica del
sistema es la estructura BJ, definido por la ecuacin 5.
Fig. 2. Mtodo Eye Ball: Respuesta a una entrada escaln
III.
IV.
89 .88 z + 82 .14
z 2 1 .111 z + 0 .4077
Ecu. 5
CONTROLADOR DIFUSO
Un controlador difuso (FLC) es un dispositivo capaz de

interpretar seales de campo, y tomar una accin de control
consecuente de acuerdo con la informacin que tenga
consignada en su base de reglas difusas. En la figura 5 se tiene
un diagrama esquemtico de un controlador basado en la
lgica difusa. El ncleo de un FLC consta de un cerebro
procesador encargado de tomar las decisiones necesarias para
corregir el valor de la variable de inters.
El motor de inferencia difusa cuenta con dos interfaces con
104
ISSN 977-2177-128009
el medio: el mdulo de fusificacin y el mdulo de

defusificacin. Los sensores transmisores convierten las
magnitudes fsicas medibles en seales elctricas, que son
enviadas al mdulo de fusificacin, el cual se encarga de
convertir esas seales en valores difusos. Posteriormente, esos
valores difusos pasan al motor de inferencia difusa, que se
encarga de aplicar las reglas de implicacin difusas que
correspondan, generando as un conjunto difuso de salida, el
cual recoge las decisiones que correspondan de acuerdo con
los antecedentes entregados por el mdulo de fusificacin.
Entonces, la decisin difusa es enviada al mdulo de
defusificacin, que la convierte en una decisin concreta que
pasar a actuar sobre el elemento final de control a manera de
una seal de control, manipulando la variable de proceso de
forma tal que se corrija el error presente.
A. Esquema de control
Una vez determinado el modelo matemtico del sistema,
utilizando mtodos Eye Ball (Sistema de Caudal) o mtodos
paramtricos (Transmisin Hidrosttica) se elabora el
diagrama de bloque de la figura 6, para evaluar la respuesta
dinmica ante una entrada escaln. Los modelos matemticos,
descritos anteriormente, no presentan integrador, por lo tanto
la salida del regulador fuzzy indicar, para cada uno de los
procesos, la razn de cambio de la accin de control. El
limitador corresponde a la mxima accin de control que
puede alcanzar cada actuador. Las variables de entrada para el
controlador fuzzy son la seal del error y la razn de cambio
del error.
Fig. 6. Controlador difuso
Las reglas para el controlador difuso del sistema de

regulacin de caudal y la velocidad de la transmisin
hidrosttica se presentan en la tabla 1, en la cual se puede
observar que la accin de control aumenta o disminuye,
dependiendo si la variable a controlar se acerca o aleja del
nivel de referencia.
TABLA I
BASE DE REGLAS. SISTEMA DE CONTROL DE CAUDAL
ERROR
NEGATIVO
RAZN DE
CAMBIO
POSITIVO
NEGATIVO
NEGATIVO
CERO
POSITIVO
CERO
NEGATIVO
CERO
POSITIVO
POSITIVO
NEGATIVO
CERO
POSITIVO
V.
ALGORITMOS GENTICOS
Los algoritmos genticos se inspiran en la evolucin

biolgica y su base gentico-molecular. Estos algoritmos
hacen evolucionar una poblacin de individuos sometindola
a acciones aleatorias semejantes a las que actan en la
evolucin biolgica (mutaciones y recombinaciones
genticas), as como tambin a una seleccin de acuerdo con
algn criterio, en funcin del cual se decide cules son los
individuos ms adaptados, que sobreviven, y cules los menos
aptos, que son descartados. Hoy en da una de las aplicaciones
de los algoritmos genticos es el diseo de controladores
industriales. Los parmetros de cada regulador son evaluados
en un ambiente simulado donde, dependiendo de su aptitud se
extinguen o sobreviven a la siguiente generacin.
Fig. 4. Mtodo Paramtrico: Respuesta a una entrada PRBS
A. ndices de error.
El comportamiento del error dinmico en un sistema de
control es comnmente utilizado como criterio de diseo
durante la sintonizacin de controladores. Debido a su
comportamiento dinmico, este error es evaluado mediante un
criterio o ndice de comportamiento.
Fig. 5. Diagrama esquemtico de un controlador basado en lgica difusa
CERO
105
ISSN 977-2177-128009
ISE Integral Square Error. Este criterio da mayor

importancia a los errores grandes pero da poco peso a los
errores pequeos. Es fcilmente computable tanto analtica
como experimentalmente pero no es muy selectivo, ya que
variaciones de los parmetros no afectan significativamente el
resultado final de la integral.
Ecu. 6
ISE = e ( k ) 2 dt
donde a y c son los puntos en los cuales la funcin vale cero y

en el punto b, la funcin asume un valor de 1.
Algoritmo gentico. Se configura los parmetros de la tabla
II para el algoritmo gentico donde Nvar es el nmero de
parmetros a ser identificados por el algoritmo gentico, Npop
es la cantidad de cromosomas creados en el inicio del
algoritmo gentico, Niter es el nmero mximo de iteraciones
que ejecuta el algoritmo gentico, Xrate es el porcentaje de
cromosomas que sobreviven en cada iteracin del algoritmo
gentico, mutrate es el porcentaje de mutaciones realizadas
por el algoritmo gentico.
IAE Integral of Absolute value of Error. Se caracteriza

por proporcionar un amortiguamiento y una respuesta
aceptable a la salida del lazo de control, sin embargo, este
criterio no es capaz de optimizar sistemas altamente
sobreamortiguados o altamente subamortiguados.
Ecu. 7
ISE = e ( k ) dt
TABLA II
PARMETROS DE CONFIGURACIN DEL ALGORITMO GENTICO
PARMETRO
ITAE Integral of Time multiply Absolute Error. Bajo este

criterio se obtienen respuestas transitorias con sobre impulses
pequeos y oscilaciones bien amortiguadas. Al estar el valor
absoluto del error multiplicado por el tiempo, errores grandes
son multiplicados por tiempos pequeos, brindando poco peso
a los errores grandes que se dan al inicio del sobreimpulso y
gran peso a los errores pequeos.
Ecu. 8
ISE = t e ( k ) dt
ITSE Integral of Time multiply Square Error. En
comparacin con el criterio ITAE se caracteriza por dar poco
peso a los errores iniciales pero castiga a los errores presentes
tiempo despus del inicio de la entrada al sistema. Al igual
que el criterio ISE, al disminuir los errores grandes
rpidamente se podra obtener un sistema con una estabilidad
relativa pobre, debido al aumento de oscilaciones.
Ecu. 9
ITSE = t e ( k ) 2 dt
Sistema de control de caudal

Ecu. 10
Transmisin Hidrosttica
y[ n ] = 89 .88 u [ n 1] + 82 .14 u [ n 2 ] +
1.111y[n - 1] - 0.4077y[n - 2]
Nvar
Npop
150
Niter
80
Xrate
0.7
mutrate
0.1
C. Control difuso para el sistema de regulacin de caudal.

Se evalu la respuesta transitoria del sistema de control para
cada uno de los ndices del error, observando que el mejor
comportamiento transitorio, menor sobrepaso y tiempo de
establecimiento, se lograba con el ndice IAE. En la figura 7
se observa la respuesta simulada de la variable para diferentes
niveles de referencia y el comportamiento en el dominio del
tiempo de la seal de error, alcanzando un tiempo de
establecimiento de 150 segundos y un sobrepaso del 5 %. Los
Ecu. 11
Controlador difuso. Se establece la base de reglas para el

regulador Fuzzy (Tabla I), la forma de las funciones de
membreca para las variables de entrada (error y razn de
cambio del error) y salida (razn de cambio de la accin de
control). Para el desarrollo del artculo se definieron funciones
triangulares, las cuales se representan con tres puntos a, b y c,
VALOR
Las variables que desea que optimice el algoritmo gentico

corresponden al punto b de la funcin de membreca definida
como negativa y el punto a de la funcin cero. Se asume una
simetra con respecto al eje de las ordenadas para cada
variable, por lo tanto por variable se tiene dos incgnitas y
dado que se definieron tres variables, da un total de seis
elementos a optimizar.
El algoritmo genera los individuos, cada uno contiene los
valores asociados a las funciones de membreca de las
variables lingsticas del proceso, a partir del cual se genera el
regulador fuzzy. A continuacin se evala la respuesta
transitoria de la variable del proceso para diferentes niveles de
referencia (sp[n]), utilizando la ecuacin 10 o 11, segn sea el
caso, y siguiendo el diagrama de bloques de la figura 6. Como
funcin de optimizacin se puede utilizar los ndices de error
IAE, ISE, ITAE e ITSE, evaluando la magnitud de este factor
a partir de la ecuacin 6, 7, 8 y 9, segn el ndice que
seleccionado. Este procedimiento se repite para cada
individuo generndose un vector con el valor del ndice del
error asociado a cada una de las estructuras del regulador
fuzzy. Se evala el algoritmo gentico y se reinicia la
evaluacin de la respuesta transitoria con la nueva poblacin.
B. Diseo de controladores difusos.

Para el diseo de un controlador difuso, utilizando
algoritmos genticos, se desarrollo un cdigo en Matlab que
se caracteriza por trabajar con nmeros reales en lugar de
nmeros binarios [7]. El programa incluye los siguientes
aspectos:
Modelo matemtico del sistema. La dinmica de cada uno
de los procesos es representada como una ecuacin de
diferencias, ecuacin 10 y 11, donde y[n] corresponde a la
variable a controlar y u[n] la accin de control.
y[ n ] = 0 .02335 u [ n 14 ] + 0 .02073 u [ n 15 ] +
1.65y[n - 1] - 0.4077y[n - 2]
106
ISSN 977-2177-128009
conjuntos que defini el AG se observan en la figura 8.
Fig. 7. Respuesta dinmica. a) Seal de Caudal. b) Seal de error.
Fig. 10. Conjuntos difusos para la T.H.. a) Seal de error. b) Seal razn de
cambio del error c) Salida del regulador
VI.
SISTEMA SCADA
Para evaluar cada sistema de control se desarrollo un

sistema SCADA en Labview, siguiendo el diagrama de
bloques de la figura 6. En la figura 11 se presenta la respuesta
transitoria de la seal de caudal para diferentes niveles de
referencia, observando que no existe sobrepaso para ningn
punto de operacin y se alcanza un tiempo de establecimiento
promedio de 120 segundos. La accin de control que genera el
regulador Fuzzy, la cual se integra posteriormente, presenta un
comportamiento suave lo cual es adecuado dado que la
vlvula es de respuesta lenta. Si la accin de control no fuera
de esta naturaleza se generara una oscilacin en la seal de
caudal debido al retardo de 13 segundos, propio del sistema.
Fig. 8. Conjuntos difusos para el sistema de caudal. a) Seal de error. b) Seal

razn de cambio del error c) Salida del regulador
D. Control difuso para el sistema de regulacin de

velocidad
Al igual que en el caso anterior, se evalo la respuesta
transitoria para cada ndice, encontrando que la mejor
respuesta se obtiene con el ndice ITSE. En la figura 9 y 10 se
observa la respuesta transitoria de la variable del proceso y la
seal de error, respectivamente, y los conjuntos difusos
definidos por el A.G y. El tiempo de establecimiento fue de 3
segundos no presentando sobrepaso la respuesta transitoria.
Fig. 11. Respuesta transitoria de la seal de caudal
En la figura 12 se observa la respuesta transitoria de la

velocidad de la transmisin hidrosttica. La oscilacin que
presenta la seal se debe al error de lectura en el encoder: la
diferencia de un pulso en la seal corresponde a una variacin
en la velocidad de 120 rmp. El tiempo de establecimiento es
de 3 segundos, lo cual corresponde a los resultados de la
simulacin.
Fig. 9. Respuesta dinmica. a) Seal de Velocidad. b) Seal de error
107
ISSN 977-2177-128009
Fig. 12. Respuesta transitoria de la seal de velocidad para la T.H.
VII.
CONCLUSIONES
La utilizacin de los mtodos de caracterizacin de sistemas

(mtodos no paramtricos y paramtricos), permiten
identificar
modelos
matemticos
sencillos
cuya
implementacin matemtica es factible, permitiendo el diseo
off line de reguladores a partir de algoritmos genticos
ajustables a la dinmica del proceso.
Es importante resaltar que los ndices de error ISE, IAE,
ITAE e ITSE se convierten en un buen indicador para definir
la funcin de optimizacin para el diseo del compensadores
Fuzzy, pero no se puede resaltar cual es mejor, por lo tanto es
recomendable evaluar la respuesta que se obtiene al
implementar cada uno. De otro lado, para el diseo del
controlador difuso es recomendable aplicar diferentes niveles
de referencia, dada la caracterstica no lineal del controlador.
Si la sintonizacin se realiza con una entrada escaln no se
garantiza que el sistema opere para los dems niveles de
referencia.
BIBLIOGRAFA
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Denham, M.J. Design issues for CACSD systems. Proceedings of the

IEEE. 1984.
Won-Seok Oh; Young-Tae Kim; Chang-Sun Kim; Tae-Seok Kwon;
Hee-Jun Kim. Speed Control of induction motor using genetic algorithm
based fuzzy controller. Industrial Electronics Society, 1999.
Bousserhane, I.K.; Hazzab, A.; Rahli, M.; Kamli, M.; Mazari, B..
Adaptive PI controller using fuzzy system optimized by genetic algorithm
for induction motor control. International Power Electronics Congress,
10th IEEE. 2006.
Hui Fang Wang; Chao Ying Liu; Xue Ling Song; Zhe Ying Song; Kai
Li. Parameters self-adaptive fuzzy controller based on genetic
algorithm. Grey Systems and Intelligent Services, 2007. GSIS 2007,
IEEE International Conference on. 2007.
Samsudin, K; Ahmad, F.A; Mashohor, S; Latif, N.M. Comparison of
direct and incremental genetic algorithm for optimization of ordina
Fuzzy Controllers. Software Engineering, Artificial Intelligence,
Networking, and Parallel/Distributed computing, 2008. SNPD 08. Ninth
ACIS International Conference on. 2008.
Gutierrez Colmenares, Ana Isabel. Identificacin. Universidad
Autnoma de Bucaramanga. Especializacin en Automatizacin
Industrial. 1999.
Haupt R., Haupt S. Practical Genetic Algortihms. 2 ed. New Jersey:
John Wiley & Sons, Inc; 2004. 253 p.
108
ISSN 977-2177-128009
Combinao de Estratgias para Lidar com Falhas

Permanentes nas Interconexes de uma Rede Intra-Chip
Anelise Kologeski1, Caroline Concatto2, Fernanda Lima Kastensmidt1, Luigi Carro2
1
Programa de Ps Graduao em Microeletrnica (PGMICRO), 2Programa de Ps Graduao em Computao (PPGC)

Porto Alegre, Brasil
{alkologeski, cconcatto, fglima, carro}@inf.ufrgs.br
Resumo O uso de estruturas tolerantes a falhas em diferentes

circuitos est crescendo, devido ao fato de ser quase impossvel
produzir circuitos integrados sem qualquer defeito em
tecnologias nanomtricas. Nossa proposta tolerante a falhas
pode garantir a funcionalidade de uma rede intra-chip com
mltiplos defeitos em qualquer interconexo, e com mltiplas
interconexes defeituosas, por combinar adequadamente
roteamento adaptativo, diviso de dados e remapeamento.
Resultados mostram que o atraso na comunicao e a energia
podem ter mnimo impacto quando comparados com o sistema
livre de falhas, e comparaes com o cdigo de Hamming
comprovam a vantagem da estratgia.
I.
modeladas como falhas de curto circuito entre fios do mesmo

canal ou canais distintos, por exemplo. E as falhas
intermitentes so falhas de crosstalk seguindo o modelo MAF
[2]. Porm a utilizao de tcnicas para tolerncia a falhas
mltiplas pode trazer alto custo em rea, desempenho e
potncia para as redes intra-chip em geral. O uso de tcnicas
adaptativas pode reduzir estes custos, pois somente reas com
falhas so configuradas para utilizar as tcnicas de proteo.
Desta forma, usando a melhor configurao entre o
roteamento adaptativo, a diviso de dados e o remapeamento,
possvel obter melhores resultados do que tcnicas
tradicionais empregadas na literatura.
INTRODUO
O uso de tolerncia a falhas crucial para permitir que

circuitos com alguma quantidade de defeitos ainda alcance o
mercado, incrementando o rendimento e o tempo de vida de
um chip, alm de garantir a correta funcionalidade do
dispositivo. Com base nos resultados prvios de teste e
diagnstico, a rede intra-chip pode utilizar solues
embarcadas tolerante a falhas, que podem proporcionar a
correta comunicao na rede com impacto reduzido.
Redes intra-chip so compostas por roteadores, interfaces
de rede (NI) e canais de comunicao conhecidos por links,
canais ou interconexes, e servem para conectar todos os
ncleos que precisam se comunicar, proporcionando amplo
paralelismo na comunicao. Porm, a integrao de muitos
componentes em um nico chip trouxe diversas preocupaes
para os projetistas de circuitos integrados com relao a
confiabilidade e ao rendimento dos chips produzidos, uma vez
que diversos tipos de falhas podem acontecer, como ilustrado
na figura 1. Falhas podem ocorrer em diversos pontos:
elementos de processamento (ncleos), interconexes,
roteadores ou interfaces de rede. Devido a grande quantidade
de interconexes, existe uma grande probabilidade das falhas
acontecerem nelas, e a tendncia que elas aconteam
prximas umas das outras [1]. Consequentemente, tcnicas de
tolerncia a falhas esto sendo amplamente estudadas e
embarcadas nos circuitos para garantir um rendimento mnimo
e uma certa confiabilidade aos sistemas, e as falhas podem ser
estudadas de acordo com o seu tipo e/ou localizao.
Este trabalho foca em falhas do tipo permanente e
intermitente nas conexes. Falhas permanentes podem ser
R1
NI1
R2
R3
NI2
Ncleo 1
Ncleo 2
R4
NI4
R5
R7
Ncleo 8
Roteador
Interconexo
Roteador-Roteador
Ncleo 6
R8
NI8
Ncleo 7
R6
NI6
Ncleo 5
NI7
Interface de
Rede (NI)
Ncleo 3
NI5
Ncleo 4
Ncleo
NI3
R9
NI9
Ncleo 9
Interconexo
Roteador-NI
Situaes de
Falha
Figura 1. Exemplo de falhas em uma rede intra-chip.
A seguir, so apresentados os trabalhos relacionados. A

combinao das tcnicas propostas descrita na seo 3. Na
seo 4 alguns resultados so discutidos e abordados e na
seo 5 segue a concluso deste trabalho.
II.
TRABALHOS RELACIONADOS
Em [3] os autores afirmam que a taxa de fios e conexes

defeituosas tende a ser de aproximadamente 1 at 15%, e a
localizao das falhas tende a afetar mais de uma
interconexo por regio, causando falhas em cluster [1],
tornando evidente a necessidade de proteo contra mltiplas
falhas.
Os trabalhos relacionados podem ser classificados
principalmente pela forma como as tcnicas de tolerncia a
falhas atuam para prover confiabilidade. As tcnicas podem
ser divididas em dinmicas ou estticas, de acordo com o tipo
de diagnstico que utilizado. As tcnicas dinmicas so
109
ISSN 977-2177-128009
aquelas que esto sempre atuando no hardware para proteglo, principalmente contra falhas transientes. Neste caso, o
diagnstico e a deteco ocorrem em tempo de execuo, e a
correo do dado pode ou no ser feita simultaneamente.
Alguns exemplos de tcnicas dinmicas so o cdigo de
Hamming, a paridade, a redundncia e o TMR. Porm, elas
exigem um alto custo de rea extra, afetando diretamente a
potncia. J as tcnicas estticas necessitam de deteco e
diagnstico prvio [4 - 6], para configurar as estruturas de
tolerncia a falhas. A rea necessria para tcnicas estticas
tambm significativa, porm os ganhos ocorrem quando o
hardware tolerante a falhas no precisa ser completamente
utilizado, evitando gastos desnecessrios.
Em [7], a combinao de cdigo de Hamming foi
utilizada para proteger os buffers, o roteador e as
interconexes. Os autores protegem o dado de apenas uma
nica falha em cada interconexo, focando em falhas
transientes no buffer e crosstalk. Casos com mltiplas falhas
no podem ser tratados por [7]. Resultados reportados
mostram uma penalidade na frequncia de 32% para uma
tecnologia de 180 nm, e mais de 50% em rea extra sem
incluir o aumento de fios nas interconexes. Em [8], para 130
nm, foi apresentada uma tcnica que utiliza cdigo de
Hamming em cada metade dos dados, e a retransmisso pode
ser utilizada quando Hamming no suficiente para a
correo dos dados com falha. Mesmo assim, a retransmisso
pode no ser suficiente para manter a confiabilidade dos
dados, quando existem mltiplas falhas. As principais
desvantagens deste mtodo so a quantidade extra de rea e o
consumo excessivo de potncia, pois a rea do roteador tornase mais de 3 vezes maior do que o roteador no protegido, e a
latncia tambm incrementada em quase 4 vezes. Em [9]
existe uma proposta similar a de [8], e a principal diferena
a substituio do cdigo de Hamming por paridade. Porm,
[9] tolera crosstalk e apenas uma nica falha permanente em
cada interconexo. Se, por exemplo, existem 2 falhas
permanentes em uma interconexo, sendo cada uma delas
localizada em uma das metades do fio, significa que o bit de
paridade sempre estar errado, e mesmo utilizando a
retransmisso no ser possvel ter um dado sem a presena
de erro. Em [10], a redundncia utilizada em alguns
componentes no roteador e nas interconexes para prover
confiabilidade. Um BIST foi includo na implementao para
prover o diagnstico da falhas. Porm, para uma interconexo
que corresponde em torno de 5% da rea total da rede, a
conectividade mostrou-se baixa, em torno de 30% para at
100 fios defeituosos, numa rede intra-chip 8x8.
Em [11] o cdigo de Hamming utilizado e as
interconexes so completamente duplicadas, permitindo que
at trs falhas sejam toleradas, e quatro falhas sejam
detectadas. Quando acontecem duas falhas em cada conjunto,
ento a implementao permite apenas detect-las, j que o
uso de Hamming permite correo apenas de falha nica para
cada conjunto de fios. Embora a tenso nos fios seja reduzida,
as interconexes foram completamente duplicadas,
aumentando a probabilidade de falhas.
Para solucionar a limitao da cobertura de falhas com as
tcnicas dinmicas, [12] utiliza uma tcnica esttica para
permitir a utilizao de interconexes parcialmente

defeituosas, principalmente para lidar com a presena de alto
trfego de dados na rede. A capacidade dos links para tolerar
falhas permanentes pode ser dividida em 25%, 50%, 75% e
100%, de acordo com a localizao das falhas. Porm, [12]
considera que as falhas sempre esto concentradas dentro do
mesmo grupo de fios, enquanto que nossa proposta pode
manter a comunicao com at 50% de fios defeituosos
localizados em qualquer parte da interconexo.
Outros trabalhos na literatura utilizam estratgias baseadas
nas funes de roteamento para lidar com interconexes
defeituosas e evitar roteadores defeituosos [13 - 14]. Contudo,
o uso de tabelas e canal virtual necessrio para evitar-se o
problema de deadlock na rede, sendo que ambas as opes so
sinnimos de rea e potncia extra. J [15] e [16] combinam
roteamento adaptativo e mapeamento para incrementar a
confiabilidade das redes intra-chip. Ambos os trabalhos
apresentam uma estratgia que leva em conta o grafo da
aplicao, a probabilidade de falhas e o roteamento. Porm,
ambas as propostas no podem lidar com falhas entre um
roteador e um ncleo, optando por deixar um roteador livre de
ncleo ou inutilizar o ncleo conectado ao roteador em caso
de falha no link que os ligam. Em [4], uma estratgia com
roteamento parcialmente adaptativo para lidar com
interconexes defeituosas foi desenvolvida, tendo-se uma
mnima mudana no caminho do roteamento XY, atravs do
uso da topologia torus. Consequentemente, canais virtuais e
tabelas no so utilizados, e a rea extra consideravelmente
pequena (menos de 1%). Por este motivo, escolhemos a
proposta de [4] para combinar com duas outras estratgias a
fim de garantir confiabilidade na rede: diviso de dados e
remapeamento das tarefas.
III.
ESTRATGIAS DESENVOLVIDAS
A tcnica proposta neste trabalho consiste na combinao

do roteamento adaptativo proposto em [4] e na diviso dos
dados ocupando 50% dos fios livres de falhas das
interconexes que foram consideradas defeituosas,
aproveitando apenas os fios que podem proporcionar a
comunicao de forma correta. Essa abordagem evita que fios
adicionais nas interconexes sejam necessrios e minimiza a
quantidade de hardware adicional no caminho crtico da
comunicao. Ambas as estratgias so tambm combinadas
com a tcnica de remapeamento, que pode ser facilmente
aplicada em redes com ncleos homogneos. De uma maneira
muito simples, o peso das comunicaes pode ser calculado
de acordo com a intensidade do trfego em cada canal, e
ncleos com baixa comunicao podem ser facilmente
mapeados para regies defeituosas, a fim de minimizar os
prejuzos na comunicao.
A primeira tcnica aplicada em caso de falha ser a
tcnica de roteamento adaptativo, devido a sua simplicidade e
ao seu reduzido impacto em tempo de comunicao. Por ser
um algoritmo parcialmente adaptativo, existem alguns casos
em que um caminho livre de falhas no pode ser encontrado
em [4], especialmente na presena de mltiplas falhas. A
figura 2 mostra algumas das situaes em que o roteador
deixa de ser utilizado: quando as falhas se encontram em
alguma interconexo da rede entre o roteador e o ncleo,
110
ISSN 977-2177-128009
desde que esta interconexo seja o nico meio de

comunicao entre eles; ou quando ambas as entradas ou
sadas de uma mesma direo encontram-se defeituosas no
permitindo o acesso ao roteador atravs daquela direo. Para
os casos em que o roteamento adaptativo no pode ser
utilizado, a soluo de dividir os dados para envi-los
ocupando apenas metade da interconexo empregada.
situaes ilustradas na figura 4, em que o uso da diviso de

dados associado por DD, e varia de acordo com a taxa de
envio estabelecida para os pacotes. O melhor caso ilustrado
pela situao 1, mostrando que um pacote dividido acrescenta
poucos ciclos de latncia, enquanto que o pior caso
ilustrado pela situao 2, dobrando o tempo de comunicao.
Canal de Sada
Dout7
Dout6
Dout5
DD
Canal de Entrada
Interconexo
(8 bits)
Dout7 / Dout3
Dout6 / Dout2
Dout4
Dout3
Dout2
DD
Din7
Din6
Din5
Din4
Dout5 / Dout1
Dout4 / Dout0
Din3
Din2
Dout1
Din1
Dout0
Din0
Figura 2. Casos de falha que [4] no pode lidar.
Quando nenhum caminho alternativo possvel atravs do

roteamento adaptativo, ento necessrio utilizar a diviso de
dados a fim de evitar reas de isolamento na rede. Sabendo-se
que uma interconexo est defeituosa, improvvel que todos
os fios desta interconexo estejam realmente com falhas, e
por isso a tcnica proposta visa permitir que apenas os fios
livres de falhas de uma interconexo sejam utilizados,
provendo sempre uma comunicao limitada em metade da
capacidade original da interconexo sem defeitos.
A vantagem desta proposta evitar que um fio defeituoso
deixe toda a interconexo inutilizvel. Sendo assim, os dados
que antes trafegavam em canais com uma largura de X bits,
iro ser divididos no momento em que atingem a
interconexo defeituosa, e dali seguem o percurso ocupando
apenas X/2 bits da interconexo at encontrarem o alvo final,
onde atravs da interface de rede so reagrupados novamente.
Por exemplo, um fio que se encontra na primeira posio, da
primeira metade de uma interconexo com X fios, pode ser
transportado na posio at 1 + X/2 a fim de evitar falhas que
possam estar na primeira metade da interconexo. A mesma
idia vale para fios defeituosos que esto na segunda metade
da interconexo, como o caso do bit localizado na posio
X/2+1, que pode ser deslocado para o transporte em at X/2
posies acima. Desta forma, os dados so divididos para
ocupar um canal defeituoso atravs do uso de
multiplexadores posicionados na entrada e na sada de cada
canal do roteador, que permitem previamente a seleo de
quais fios sero utilizados, de acordo com o teste. O controle
dos multiplexadores varia de acordo com o tamanho da
interconexo utilizada e com o grau de confiabilidade
escolhido. Neste trabalho, escolheu-se tolerar at 50% de fios
defeituosos em cada interconexo, considerando a largura de
canal com 8 bits, como mostra a figura 3 para uma situao
de 4 falhas na interconexo.
muito importante ressaltar que a tcnica de diviso de
dados pode causar um impacto varivel no tempo de
comunicao, que pode ser de at duas vezes o tempo original
de comunicao numa situao sem falhas, para o pior caso.
Porm, este impacto no regra, como mostram algumas
Figura 3: Exemplo do uso da diviso de dados em uma interconexo de 8 bits

com 50% de fios defeituosos.
Figura 4: Exemplos de impacto no tempo de comunicao com o uso de DD.
A diviso de dados utilizada somente para os casos no

solucionados pelo roteamento adaptativo, tornando-a
vantajosa na maioria das vezes, j que evita isolar a
comunicao em determinadas reas da rede. Assim, a
diviso de dados a nica maneira de lidar com a
comunicao que precisa utilizar necessariamente um
caminho defeituoso. Quando nenhuma falha diagnosticada,
os recursos podem ser desligados a fim de reduzir potncia
[17-18]. De qualquer forma, quando uma interconexo
necessita da diviso de dados, apenas os roteadores
envolvidos na comunicao, que utilizam o caminho
defeituoso, precisam estar habilitados para trabalhar com a
tcnica, o que significa que o acrscimo na potncia e na
energia pode ser reduzido quando poucas falhas existem.
Para os casos em que a diviso de dados pode prejudicar
severamente o tempo de comunicao (como ilustrado com
DD no segundo caso da figura 4), a terceira e ltima
estratgia que pode ser empregada o remapeamento das
tarefas. Com o remapeamento, ncleos de baixo impacto na
comunicao so posicionados de modo estratgico,
utilizando a interconexo defeituosa com mnimo impacto no
111
ISSN 977-2177-128009
tempo de comunicao. Para obter os resultados de atraso

para cada ncleo em um determinado intervalo de tempo, o
nmero estimado de pacotes a serem enviados por cada
ncleo multiplicado pela largura de banda e dividido pela
taxa de envio dos pacotes para cada ncleo, como mostra a
equao 1. Desta forma, o ncleo com menor atraso pode ser
mapeado para o local defeituoso, de modo a minimizar o
impacto da falha na rede.
Atraso = (#_pacotes * tamanho_do_pacote) / taxa (1)
A soluo de remapeamento utilizada baseada no

espelhamento do posicionamento original escolhido
inicialmente. Para manter as caractersticas do mapeamento
original, optou-se por espelhar a posio dos ncleos da rede,
sendo o espelhando dos ncleos na posio vertical,
horizontal ou em ambas as direes ao mesmo tempo, como
mostra a figura 5. Assim, sempre possvel obter at 4
arranjos de um mapeamento, considerando um nmero
qualquer de ncleos em uma rede com topologia grelha ou
torus. No caso do exemplo escolhido, todos os 4 ncleos
podem ser mapeados para a posio defeituosa de forma
muito simplificada, sem afetar o posicionamento original.
R1
R2
A
R3
R1
C
R4
B
R2
B
R3
TABELA I. RESULTADOS DE SNTESE.
Espelhamento Vertical
R2
D
R3
C
A
R4
R1
R4
A
C
R3
Mapeamento Original
R1
R2
nossa estratgia ainda tem melhores resultados, enquanto que

Hamming s apresenta bons resultados de potncia operando
na freqncia mxima, uma vez que a freqncia mxima do
cdigo de Hamming a menor.
O impacto da potncia nos fios tambm foi estimado, com
simulaes desenvolvidas atravs da ferramenta HSPICE,
utilizando o modelo distribudo [19]. Na tabela II so
apresentados os resultados para uma variao nos dados de
500 MHz e comprimento de fio correspondente a 1 mm, sem
o uso de repetidores. O que se pode perceber que a potncia
nos fios apresenta valores significativos em relao a um
roteador, sendo 6 vezes mais do que a potncia de um
roteador sem tolerncia a falhas para uma rede 4x3, e 9 vezes
para a mesma rede com Hamming, respectivamente.
Para os resultados de tempo de comunicao e energia,
dois benchmarks foram utilizados com 12 ncleos cada:
VOPD [20] e MPEG4 [21]. Para cada benchmark, analisou-se
o nmero de interconexes utilizadas, como consta na tabela
III. Ento, pode-se perceber que apenas 54% das
interconexes so utilizadas para o VOPD, enquanto que 70%
so utilizadas para o MPEG4. Desta forma, em situaes de
falha, pode-se considerar uma alta chance de que elas
aconteam ou sejam mapeadas em locais no utilizados na
rede, podendo minimizar severamente o impacto das falhas
na comunicao.
Area
lgica
(m2)
Frequncia
Mxima
(MHz)
Potncia @
Freq. Max.
(mW)
Potncia @
500MHz
(mW)
Original
10.954
885
1,68
1,42
RRADD sem
DD ativo
RRADD com
DD ativo
Hamming
14.104
870
1,70
1,43
14.104
588
2,41
2,07
12.614
510
2,16
2,12
B
R4
Roteador
Espelhamento Horizontal Espelhamento Vertical e Horizontal
Figura 5. Possibilidades de mapeamento adotadas com o espelhamento dos

ncleos, para uma configurao com 4 ncleos hipotticos.
TABELA II. RESULTADOS DE POTNCIA PARA TODO O CONJUNTO DE FIOS

NUMA REDE INTRA-CHIP COM 12 NCLEOS.
Para o trabalho em questo, todos os ncleos so

posicionados dentro de processadores idnticos na rede, e por
este motivo podem trocar facilmente de mapeamento. Para
ncleos heterogneos, com dimenses variadas, a estratgia
proposta somente pode ser possvel se alguma redundncia
for utilizada na rede.
Variao dos dados

500 MHz
Rede
4x3
Potncia para 1 mm
(mW@500MHz)
Original e RRADD (864 fios)
9,04
Hamming (1152 fios)
12,06
IV. RESULTADOS
Para os resultados de sntese, uma biblioteca standard cell
de 90 nm com tecnologia CMOS foi utilizada com a
ferramenta Power Compiler da Synopsys. Na tabela I os
resultados de rea extra, frequncia mxima de operao e
potncia so apresentados para largura de canal de 8 bits, e a
sigla RRADD significa Roteador com Roteamento
Adaptativo e Diviso de Dados. O que se pode perceber
que nossa proposta tem 28% de rea extra, enquanto
Hamming tem apenas 15%. Porm, o impacto no caminho
crtico de nossa proposta apresentou-se melhor, uma vez que
Hamming tem uma longa cadeia de portas XOR para
codificao e decodificao dos dados. Quando os resultados
de potncia so normalizados a 500 MHZ, pode-se ver que a
TABELA III. QUANTIDADE DE INTERCONEXES UTILIZADAS POR CADA

BENCHMARK.
Interconexes
Entre
Roteadores
Torus
Entre Roteador
e Ncleo
Total
Rede 4x3 Torus (total)
34
14
24
72
Utilizadas para VOPD
16
22
39
Utilizadas para MPEG4
24
24
50
Para todas as simulaes, as comparaes foram

realizadas quando todos os ncleos efetuam suas
comunicaes enviando 1000 pacotes de informao para os
seus respectivos destinos. Para o MPEG4 existe uma exceo
nos casos em que as taxas so inferiores a 1MB/s, pois se
112
ISSN 977-2177-128009
optou por enviar apenas 1 pacote a cada mil ciclos, de modo

que comunicaes mais lentas no prejudicassem
completamente a execuo da aplicao, sem retardar
cseveramente o tempo de execuo.
A) Situaes Toleradas Apenas com Roteamento Adaptativo
Os resultados so relacionados ao nmero de
interconexes da rede que podem utilizar roteamento
adaptativo, o que representa um total de 48 das 72
interconexes de uma rede 4x3. Para fins de comparao com
outras solues apresentadas na literatura, tambm foram
colocadas comparaes com Hamming. Situaes de falha
nica foram consideradas em cada uma das interconexes
entre roteadores, e cada proposta executou na freqncia
mxima permitida sem o uso da diviso de dados. Os
resultados so apresentados na tabela IV, e nossa proposta
apresentou no mximo 6,6% de acrscimo no tempo de
computao para o pior caso simulado, enquanto que o
Hamming mostra um impacto constante para todos os casos
que o roteamento adaptativo utilizado.
TABELA IV. RESULTADOS DE SIMULAO CONSIDERANDO FALHA NICA E
ROTEAMENTO ADAPTATIVO.
VOPD
MPEG4
Total de interconexes utilizadas
17 (35%)
26 (54%)
Mximo impacto apresentado no tempo

de comunicao
Pior tempo de Computao
2,2%
6,6%
135 us
84,3 us
Tempo de computao com Hamming
232 us
130 us
Cada caso escolhido possui um impacto diferente no tempo

de comunicao. Quando os resultados so comparados com
Hamming, a localizao da falha no importa para os
resultados de tempo e energia, e o mapeamento considerado
o original.
Na figura 6 um comparativo apresentado entre a
dissipao de energia para cada um dos casos exibidos na
tabela VI, e o que se pode notar que nossa proposta
apresentou uma boa eficincia energtica em relao ao
Hamming, que sempre utilizado em tempo de execuo
para corrigir falhas permanentes e transientes. Nossa proposta
mostrou que mesmo tendo 28% de rea extra, possvel
reduzir a energia da rede desligando partes que no precisam
da tolerncia a falhas, uma vez que as falhas j foram
diagnosticadas na rede. De acordo com a figura 6 e a tabela
VI, a energia aps o remapeamento pode ser reduzida em
quase 20% (caso 3) quando comparada com nossa proposta
antes do remapeamento e at 55% (caso 5) quando
comparada com a energia da implementao que utiliza
Hamming. Isso demonstra que o remapeamento pode, sempre
que possvel, ser empregado para tentar minimizar os atrasos
das comunicaes com falhas.
B) Situaes que Exigem Diviso de Dados

Para os resultados apresentados a seguir na tabela V, uma
situao de falha nica foi considerada para cada
interconexo entre ncleo e roteador, e o tempo mdio de
impacto em cada comunicao foi levado em conta. Os
resultados de potncia e energia tambm so apresentados
para cada benchmark.
TABELA V. RESULTADOS DE SIMULAO CONSIDERANDO A DIVISO DE
DADOS.
Tempo
Mdio
(s)
VOPD
Potncia
(mW)
Energia
(J)
Tempo
Mdio
(s)
MPEG4
Potncia
(mW)
Energia
(J)
Nossa
Proposta
200,1
30,86
6,17
119,7
30,86
3,67
Hamming
232,0
37,98
8,81
133,3
37,98
5,06
C) Situaes que Combinam o Remapeamento

Para os exemplos apresentados a seguir, o remapeamento
foi escolhido de forma que as falhas foram mapeadas para
interconexes no utilizadas pela rede ou para interconexes
com menor impacto na comunicao (quando no possvel
remapear para uma interconexo no utilizada), a fim de
melhorar o tempo de comunicao total da rede e,
consequentemente, a energia. Algumas situaes aleatrias
foram escolhidas para mostrar o impacto do remapeamento,
conforme apresentado na tabela VI, e foram consideradas
falhas em diferentes interconexes entre roteador e ncleo.
Figura 6. Comparativo de energia entre as propostas abordadas aps a

utilizao do remapeamento.
D) Conectividade da Proposta
A figura 7 mostra a conectividade da rede de acordo com
o nmero de falhas, que varia de 0 at 100, distribudas em
qualquer fio da rede. Foram avaliados o melhor e o pior
cenrio de conectividade para um caso genrico,
considerando uma rede 4x3 e uma rede 8x8, e os resultados
foram comparados com a conectividade de [10]. O melhor
cenrio corresponde ao caso onde as falhas esto
completamente distribudas entre as interconexes, e o pior
caso de falha considerado quando as falhas atingem mais de
50% das interconexes. Para a rede 8x8 com 100 fios
defeituosos distribudos pela rede, o pior cenrio mostra uma
perda de 30% de conectividade para nossa proposta, enquanto
que [10] tem quase 70% de perda na conectividade.
V. CONCLUSO
Para prover tolerncia a falhas nas interconexes, foram
utilizadas as tcnicas de roteamento adaptativo e diviso de
dados, e ambas as tcnicas podem ser combinadas com o
remapeamento das tarefas para minimizar o impacto da falha
113
ISSN 977-2177-128009
no tempo de computao das aplicaes, de forma a

minimizar tambm a energia dissipada pela rede. A proposta
apresentada pode proteger a rede contra mltiplas falhas nas
interconexes e tambm contra mltiplas interconexes
defeituosas, sem o uso de fios extras, e mostra melhores
resultados do que tradicionais tcnicas como Hamming e
redundncia de componentes.
[8]
[9]
[10]
100
90
80
[11]
% Conectividade
70
60
50
40
[12]
Melhor caso rede 4x3 e 8x8
30
Pior caso rede 4x3
20
Pior caso rede 8x8
10
Pior caso rede 8x8 [KAKOEE, 2011]
[13]
0
0
10
20
30
40
50
60
70
80
90
100
# Falhas
Figura 7. Conectividade da rede de acordo com o nmero de falhas nas

interconexes.
[14]
REFERNCIAS
[1]
[2]
[3]
[4]
[5]
[6]
[7]
Agrawal, Vishwani D.; "Testing for Faults, Loooking for Defects," Test
Workshop (LATW), 2011 12th Latin American, Keynote Talk, March
2011.
Cuviello, M.; Dey, S.; Xiaoliang Bai; Yi Zhao;, Fault Modeling and
Simulation for Crosstalk in System-on-Chip Interconnects,
Proceedings of the IEEE/ACM International Conference on ComputerAided Design, San Jose, CA, pp. 297-303, 1999.
Dehon, A.; Naeimi, H.; Seven strategies for tolerating highly defective
fabrication, IEEE Design & Test of Computers,vol.22, no.4, pp. 306315, 2005.
Concatto, C.; Almeida, P.; Kastensmidt, F.; Cota, E.; Lubaszewski, M.;
Herve, M.; Improving yield of torus NoCs through fault-diagnosisand-repair of interconnect faults, 15th IEEE International On-Line
Testing Symposium (IOLTS), pp.61-66, 2009.
Herve, M.; Cota, E.; Kastensmidt, F.L.; Lubaszewski, M.; Diagnosis
of interconnect shorts in mesh NoCs, 3rd ACM/IEEE International
Symposium on Networks-on-Chip, 256-265, 2009.
Yang, H.; Papachristou, C.; A Method for Detecting Interconnect
DSM Defects in Systems on Chip, IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol.25, no. 1, pp.197204, 2006.
Arthur Frantz; Fernanda Kastensmidt; Luigi Carro; Erika Cota;
Dependable Network-on-Chip Router Able to Simultaneously Tolerate
Soft Errors and Crosstalk, Proceedings International Test Conference
(ITC), vol. 1, pp. 1 9, 2006.
[15]
[16]
[17]
[18]
[19]
[20]
[21]
Lehtonen, T.; Liljeberg, P.; Plosila, J.;, "Online Reconfigurable SelfTimed Links for Fault Tolerant NoCs," VLSI Design, IEEE
International, 2007.
Braga, M.; Cota, E.; Kastensmidt, F.L.; Lubaszewski, M.; , "Efficiently
using data splitting and retransmission to tolerate faults in networks-onchip interconnects," Circuits and Systems (ISCAS), Proceedings of
2010 IEEE International Symposium on, vol., no., pp.4101-4104, May
30 2010-June 2 2010.
Kakoee, M.R.; Bertacco, V.; Benini, L.; , "ReliNoC: A reliable network
for priority-based on-chip communication," Design, Automation & Test
in Europe Conference & Exhibition (DATE), 2011 , vol., no., pp.1-6,
14-18 March 2011.
Ganguly, A.; Pande, P.P.; Belzer, B.;, "Crosstalk-Aware Channel
Coding Schemes for Energy Efficient and Reliable NOC
Interconnects," Very Large Scale Integration (VLSI) Systems, IEEE
Transactions on , vol.17, no.11, pp.1626-1639, Nov. 2009.
Palesi, M.; Kumar, S.; Catania, V.; , "Leveraging Partially Faulty Links
Usage for Enhancing Yield and Performance in Networks-on-Chip,"
Computer-Aided Design of Integrated Circuits and Systems, IEEE
Transactions on , vol.29, no.3, pp.426-440, March 2010.
Dutta Choudhury, A.; Palermo, G.; Silvano, C.; Zaccaria, V.; "Yield
Enhancement by Robust Application-specific Mapping on Network-onChips, Second International Workshop on Network on-Chip
Architectures (NoCArc'09), pp. 37-42, 2009.
Schonwald, T.; Zimmermann, J.; Bringmann, O.; Rosenstiel, W.;
Fully Adaptive Fault-Tolerant Routing Algorithm for Network-onChip Architectures, 10th Euromicro Conference on Digital System
Design Architecture, Methods and Tools, pp. 527-534, 2007.
Koibuchi, M.; Matsutani, H.; Amano, H.; Mark Pinkston, T.; A
Lightweight Fault-Tolerant Mechanism for Network-on-Chip. 2nd
ACM/ IEEE International Symposium on Networks-on-Chip, pp. 13-22,
2008.
Tornero, R.; Sterrantino, V.; Palesi, M.; Ordua, J.M.; A multiobjective strategy for concurrent mapping and routing in networks on
chip, IEEE International Symposium on Parallel & Distributed
Processing, pp.1-8, 2009.
Changbo Long; Lei He; Distributed sleep transistor network for power
reduction, IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, pp. 937-946, 2004.
Shi, K.; Howard, D.; Sleep Transistor Design and Implementation Simple Concepts Yet Challenges To Be Optimum, International
Symposium on VLSI Design, Automation and Test, 2006.
Sakurai, T.;, "Approximation of wiring delay in MOSFET LSI," SolidState Circuits, IEEE Journal of , vol.18, no.4, pp. 418- 426, Aug 1983.
Vu-Duc Ngo; Huy-Nam Nguyen; Hae-Wook Choi; Analyzing the
Performance of Mesh and Fat-Tree topologies for Network on Chip
design, LNCS (Springer-Verlag), pp 300-310, 2005.
Bertozzi, D.; Benini, L.; Xpipes: a network-on-chip architecture for
gigascale systems-on-chip, IEEE Circuits and Systems Magazine,
vol.4, no.2, pp. 18- 31, 2004.
TABELA VI. COMPARAO DE RESULTADOS ANTES E DEPOIS DO REMAPEAMENTO.

Benchmarks
Antes do Remapeamento
Depois do Remapeamento
Tempo de
Computao (s)
80,6
Energia
(J)
2,37
Tipo de
Remapeamento
Horizontal
MPEG4 (caso 2)
84,6
2,49
Vertical
76,6
2,25
MPEG4 (caso 3)
146,4
4,53
Vertical
119,2
3,67
MPEG4 (caso 4)
125,2
3,86
Horizontal
113,3
3,49
MPEG4
133,3
5,06
(Original - Hamming)
133,3
5,06
VOPD (caso 5)
135,7
3,99
Horizontal
133,4
3,92
VOPD (caso 6)
206,5
6,37
Horizontal e Vertical
197,2
6,08
VOPD
232,0
8,81
(Original - Hamming)
232,0
MPEG4 (caso 1)
114
Tempo de
Computao (s)
76,6
Energia
(J)
2,25
8,81
ISSN 977-2177-128009
Transistor Level Design of a Switched Capacitor

Integrating ADC with Programmable Input Range and
Resolution
Thiago Brito Bezerra, Rafael O. Gomes, Sebastian Y. C. Catunda2, Antonio Petraglia3, Raimundo C. S. e Freire4
Universidade Federal do Maranho, So Lus - MA, Brazil
Universidade Federal do Rio Grande do Norte, Natal RN, Brazil
3
Universidade Federal do Rio de Janeiro, Rio de Janeiro RJ, Brazil
4
Universidade Federal de Campina Grande, Campina Grande - PB, Brazil
brito.tb@gmail.com, rafaeloliveiranunes@hotmail.com, catunda@dee.ufma.br, petra@pads.ufrj.br, rcsfreire@dee.ufcg.edu.br
2
Abstract - A transistor level design of a switched-capacitor

integrating ADC converter is presented, where the input range
can be programmed in order to adjust to the output signal level
of a sensor. The proposed circuit also allows the definition of the
ADC resolution, which allows a trade-off between resolution and
conversion speed depending on the application.
Keywords analog-to-digital converter, switched capacitors,
integrating converter, programmable circuits.
I.
INTRODUCTION
Programmable integrated circuits enable its adjustment

after fabrication to fit more than one application within a
certain set of applications. A programmable measurement
system can be applied to the measurement of different
quantities involving a set of sensors with different
characteristics of signals and employing a single analog-todigital converter (ADC). For allowing the measuring system
to operate with different types of sensors with different
characteristics of signals, the conditioning circuit must be
programmable to provide different values of gain and DC
levels adjustment. The output signal range for each sensor
should be adjusted to as close to the input range of the ADC as
possible, to ensure maximum measurement quality. One
solution to implement the adjustment on the signal range is the
use of a measurement system with programmable conditioning
circuit. The procedure proposed in [1] made it possible to find
the minimum set of programming values within a range
determined by ensuring the measurement full range and
maintaining the loss of measurement resolution within
acceptable limits. Most recent proposals deal with signal
conditioning circuits that handle the output signal of the
sensor, to make it fit the input of the ADC. As an example, a
conditioning circuit with programmable gains was proposed in
[2].
employ an ADC with programmable input range to fit each

sensor output signal range. In [3], an architecture of a
programmable ADC, in order to avoid amplification stages
and thus to make it not dependent on the conditioning circuit,
was proposed using discrete components modeled in SPICE
level. In this work an ADC based on this architecture designed
in transistor level for the technology AMS 0.35 m. The ADC
also enables the configuration of the converter resolution,
allowing the trade-off between resolution and conversion
speed depending on the application.
II.
The proposed architecture is based on a continuous-time

multi-slope integrating analog-to-digital converter [4], but the
technique of switched capacitors is used instead of resistors, as
shown in Figure 1. This architecture uses only one reference
voltage and one capacitor to implement the transfer of positive
and negative charges to the integrator. The adding or
subtracting operation of charges in the integrator is
accomplished through proper selection on switching sequence,
according to the result of output voltage in the comparator
circuit. Switched capacitor circuits that simulate resistance
using capacitors and switches, are used to replace the use of
resistors, because it allows a better accuracy in the operation
of the converter and are more suitable for integration. The
gains implemented with this technique are functions of
capacitor ratios, which have better accuracy than the ratio of
capacitors and resistors.
For this circuit design, some specifications are defined, as
shown in Table I.
A possible different approach, that could reduce the

number of signal conditioning amplification stages, is to
PROPOSED ARCHIITECTURE
115
TABLE I.
SPECIFICATIONS OF THE DESIGNED CIRCUIT.

Input Range
Resolution
Power Supply
Technology
Architecture
0 to 2.56 V
Up to 8 bits
3.3 V
AMS 0.35 m
Discrete Multi-ramp
ISSN 977-2177-128009
A. Description of circuit operation

The circuit shown in Figure 1 operates with two nonoverlapping clock phases, and , which controls switches
"1" and "2" respectively.
Figure 1. Architecture of the analog-digital multi-ramp integrator switched

capacitor.
The switches "A" and "B" are also controlled by the phase
and , if the output voltage of comparator circuit is a logic
"1", otherwise, if the logic level on comparator output is "0",
the key "A" and "B" are controlled by and , respectively.
Thus, the operation of the circuit in a time instant k can be
summarized in Table II.
TABLE II.
Vo
QI
QR
QO
VCo
SEQUENCE OPERATION
N N P CR
VI = N
VR
N
CI
(1)
Hence, considering a fixed clock frequency one has the

trade-off between resolution and speed, since the resolution is
proportional to log2(N) and speed is inversely proportional to
N.
B. Voltage output range of the integrator
The proposed circuit was designed to be used with a single
voltage source. Thus, the output voltage of the integrator
cannot get below zero volts. To avoid this situation, the
minimum output voltage on the integrator should be
considered, and this case will happen when the input voltage
is zero, since the input signal only adds positive values to the
output voltage. Thus:
CR
Co
VCo min = VR
(2)
To limit the output voltages of the integrator at a minimum

value of zero, using (2) to determinate the voltage comparator
as:
C IVI
C RVR
Qo ( k 1 )
Qo ( k 1 ) + Q I ( k ) + Q R ( k )
Qo ( k 1 ) QR (k )
Qo ( k 1 ) + Q I (k ) Q R ( k )
VCo ( k 1 )
VCo ( k 1 ) VR
CR
Co
architecture, the ratio of input voltage and positive and

negative counts can be found by:
VCo ( k 1 ) + VI
CI
C
+ VR R
Co
Co
VCo ( k 1 ) + VI
CI
C
VR R
Co
Co
VCP = VR
At the beginning of conversion, the output capacitor is set

to zero through the RST switch, which remains open
throughout the time of conversion. In phase , the counters NP
and NN are initially zeroed and begin charging the capacitor
CI. In the phase, the voltage at the output capacitor is equal
to the value of the input signal multiplied by the gain CI/CO,
completing a cycle. In phase of the second cycle, if the
voltage at the output capacitor is greater than the voltage at the
inverter input of the comparator, the closing sequence of
switches will be, "A" to subtract the voltage at the output
capacitor by the product of voltage reference to the gain CR/CO
and then "B" which will discharge CR. Otherwise, if the
voltage at the output capacitor is less than the voltage at the
inverter input of the comparator, the closing sequence of
switches will be "B" and then "A", adding the reference
voltage, multiplied by the gain CR/Co, and voltage input signal,
multiplied by the gain CI/Co, to the output capacitor. At the
end of each cycle, the counters are incremented or
decremented according to the logic level of the comparator.
The conversion process described continues until N cycles,
where N=2r +1 and r is the converter resolution in bits. For this
CR
Co
(3)
The output voltage on the integrator acquires positive

contributions of the input signal and the reference voltage, so
the maximum output voltage on the integrator will happen to
the maximum voltage input signal. If this value is equal to the
reference voltage, then it will be determined by:
VCo max = VCP + VR
CR
C
+ VR I
Co
Co
(4)
Assuming that the minimum value of CI is equal to the CR

and the output voltage is also given by VR, its possible to
determinate from (4), the relationship between CO and CR as:
Co 3C R
(5)
The maximum and minimum voltage attained at the

integrator output are presented in figure 2, for an analogdigital converter with 8-bit resolution as a function of input
voltage, normalized by VR, with CO=3CR and CI=CR.
116
ISSN 977-2177-128009
of the ratios for adjusting the dynamic range are given as

1={1, 1.5, 2, 3, 4, 6, 8, 12, 16}. In short, this methodology
asserts that the gain must not be over dimensioned, in order to
assure the full measurement range. Hence, these gain values
guarantee a loss of resolution less than 1 bit for a desired gain
from 1 to 32. These values also define the ratio between CI
and CR as previously mentioned. The circuit was designed
using the technology AMS 0.35 m and using the CR=200 fF,
so CI={200f, 300f, 400f, 600f, 800f, 1.2p, 1.6p, 2.4p, 3.2p}F.
0.9
0.8
VCo/VR
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
The proposed architecture in Figure 4 was designed in the

technology AMS 0.35 m.
VI/VR
Figure 2. Behavioral simulations of voltage range on the integrator output

versus input voltage, normalized by VR
C. Programmable dynamic range values

In order to not modify the voltage range on the output
capacitor, a programmable capacitor is chosen in a way to
replace the fixed capacitor CI in the architecture shown in
Figure 1, with its minimum value equal to CR. The maximum
value of VI in (3) defines the input range. Thus, the CI value
can be determined by:
VR
(6)
CR
V I max
It can be observed in (6) that reducing the input dynamic
range is equivalent to apply a gain to the input signal. In this
study, five capacitance values were chosen, allowing a digital
selection to the dynamic range of the converter. In Figure 3
are demonstrated the values of capacitors and the
corresponding maximum dynamic range of the converter for
each capacitance, for CR=200 fF.
CI =
Figure 4. Architecture of the switched capacitor integrating A/D converter

with programmable input range and resolution.
The operational transcondutance amplifier chosen

architecture is a folded cascode, which has high output
resistance and high voltage gain. Figure 5 shows the OTA
schematics.
Figure 3. Behavioral simulation of the dynamic range of the converter input

to programmable capacitor values.
III.
CIRCUIT DESGIN
From the methodology presented in [1] and the

conditioning circuit with a gain stage shown in [2], the values
117
Figure 5. Architecture of folded cascode OTA
ISSN 977-2177-128009
Table III shows the dimension of the transistors used in

this OTA.
TABLE III.
division by two for the final result. Based on the selection of

the number of bits of the counter is changed the resolution of
the ADC.
TRANSISTORS DIMENSION OF THE FOLDED CASCODE
IV.
AMPLIFIER
W(m)
40
30
10
20
60
M1, M2
M3, M4, M5, M6
M7, M8
M9, M10
M11, M12
SIMULATION RESULTS
The simulations were firstly performed using behavioral

simulation of the control algorithm of the switched capacitor
in order to analyze the charge contributions of the input signal
and reference signal. The behavioral results are showed in
Figure 7, for a resolution of 5 bits, and Figure 8 for an 8-bit
resolution.
L(m)
0.7
2
2
2
2
The OTA figures of merit are: an open-loop gain of 78 dB,

a GBW of 10 MHz, a slew-rate of 80 V/s and signal
excursion from 62 mV to 3.3 V. These results meet the design
specifications.
For the design of the comparator a structure with good
accuracy was chosen, which has three functional blocks: preamplifier, decision stage and output buffer. This architecture
showed in Figure 6.
The circuits were simulated in transistor level, and results

compared and validated with the behavioral level results. The
simulations at transistors level were performed for an input
signal 1 V and a reference voltage of 2.56 V. The
programmable capacitor was adjusted to CI=200 fF and the
equivalent clock frequency of 1MHz. Two resolutions were
considered to these specifications, 5 and 8 bits. The
comparison between the voltage values of the output capacitor
was performed, obtaining satisfactory results in the conversion
process.
Figure 6. Architecture of the comparator
Figure 7. Behavioral simulation with a resolution of 5 bits
The dimensions of the transistors used in the comparator

are show in Table IV.
TABLE IV.
TRANSISTORS DIMENSION OF THE THREE STAGE

COMPARATOR
M1, M2, M3, M4, M5, M6

M8, M9
M7, M10
M11
M12, M15, M17, M19
M13
M14, M16, M18
W(m)
4
0.5
0.4
20
2
1
4
L(m)
0.35
0.35
0.35
0.35
0.35
0.35
0.35u
The comparator has shown an open-loop gain of 83 dB

and rise time of 130 V / us, meeting the required
specifications.
The UPDOWN counter and the shift register employed in
the circuit, were written in Verilog and synthetized at the
transistors level. The shift register is responsible for the
Figure 8. Transistor Behavioral simulation with a resolution of 8 bits
The transistors level simulation is shown in Figure 9 for a

resolution of 5 bits, and Figure 10 for an 8-bit resolution.
118
ISSN 977-2177-128009
Figure 9. Transistor level simulation with a resolution of 5 bits.

Figure 12. Transistor level simulation of digital value according to the
analog input with a resolution of 5 bits
As it can be seen the results are close to the results

obtained in behavioral simulation and it shows that the
converter behaves like the mid-tread convention [5].
Static and dynamic measurements were performed to
verify the operation of the converter. Figure 13 shows the
DNL measure and Figure 14 illustrates de INL measure.
Figure 10. Transistor level simulation with a resolution of 8 bits.
For a reference voltage VR = 2.56 V, a resolution of 5 bits

and for Vi varying over the full conversion range the result of
behavioral simulation of this ADC is illustrated in Figure 11.
The ratio of the capacitors is given by Co = 3CR e CI = CR.
Figure 13. Differential nonlinearity error
Figure 11. Behavioral simulation of digital value according to the analog

input with a resolution of 5 bits.
The transistor level simulation results of the ADC,

considering the same especifications of the previous
paragraph, is shown In Figure 12.
119
Figure 14. Integral nonlinearity error
ISSN 977-2177-128009
It is shown in figures 13 and 14 that the worst negative

DNL of the converter is -0.1 LSB and the worst case positive
is 0.1 LSB. The worst negative INL of the converter is -0.8
LSB and the worst case positive is 0.13 LSB.
The most important dynamic specification of an ADC is
the signal-to-noise ratio. The signal used to generate the SNR
was a sinusoid of 60 Hz. This specification is calculated using
the result of an FFT plot, it was used 512 points showed in
Figure 15.
range of measurement without loss, to a maximum loss of

resolution of 1 LSB.
The circuit also enables the trade-off between the ADC
resolution and conversion speed depending on the application.
The results in behavioral and transistor level simulations
demonstrated the operation of the A/D converter developed in
this work, thus allowing the verification and validation of its
performance. The circuit is currently in the layout phase, and
it will be sent for fabrication. After that, experiments will be
performed to test and validade the proposed architecture
ACKNOWLEDGMENT
The authors acknowledge the CAPES, CNPq, and
FAPEMA for the financial support in terms of study and
research fellowships.
REFERENCES
[1]
CATUNDA, S. Y. C. ; NAVINER, J. F. ; DEEP, G. S. ; FREIRE, R.

C. S. . Designing a programmable analog signal conditioning circuit
without loss of measurement range, in: Instrumentation and
Measurement. In: IEEE Transactions, vol 52, number 5, pages 14821487, ISSN 0018-9456, 2003.
[2]
BELFORT, Diomadson R. ; CATUNDA, S. Y. C. ; SOUSA, F. R. ;

Dantas, Joao P. M. ; FREIRE, R. C. S. . Programmable analog signal
conditioning circuit for integrated systems. In: Instrumentation and
Measurement Technology Conference, IMTC 2008, 2008, Victoria.
Instrumentation
and
Measurement
Technology
Conference
Proceedings. New Jersey : IEEE, 2008. v. 1. p. 1848-1852.
[3]
NUNES, R. O. ; GOMES, E. C. ; CATUNDA, S. Y. C. ; BELFORT,

Diomadson R. ; FREIRE, R. C. S. ; SOUSA, F. R. . Conversor
analgico-digital integrador a capacitor chaveado com faixa de entrada
programvel. In: XVIII Congresso Brasileiro de Automtica, 2010,
Bonito, MS.
GOEKE, Wayne. 8.5-Digit Integrating Analog-to-Digital Converter
with 16-Bit, 100,000-Sample-per-Second Performance.HP Journal 40
(2): 8-15; Abril,1989
IEEE standard for terminology and test methods for analog-to-digital
converters IEEE Std 1241-2000
GUEDDAH, N.; MASMOUDI, M. A Programmable Resolution A/D
Converter Modeling. Lebanese Science Journal. Vol. 7, No. 2, 2006
Figure 15. FFT plot for the converter with resolution of 5 bits
From the result of the FFT the signal-to-noise ratio and the
effective number of bits is obtained from the following
equation:
ENOB =
( SNR 1.76)
6.02
(7)
[4]
The signal-to-noise ratio is equal to 42.26 dB and the

effective number of bits is 6.72. This result demonstrates that
the converter works effectively when working with a
resolution of 5 bits. There will not be lost code during
conversion.
V.
FINAL CONSIDERATIONS
In this work the design at transistor level of a switchedcapacitor integrating A/D converter with programmable input
range and resolution in the technology AMS 0.35 m was
proposed. A converter with programmable resolution greater
than 8 bits can be implemented, making the circuit of the
counter up/down realize higher counts of cycles.
Increasing the resolution reduces the value of the LSB,
requiring components with better accuracy. To reduce the time
of conversion clock frequencies above 1 MHz could be used.
The technique used for adjustment of the ADC input range
was to change the input capacitance, and the behavioral
simulations of different input ranges of the converter proved to
be satisfactory. The programmable capacitor used has a
minimum number of capacitances, and ensures the signal
[5]
[6]
[7]
WNOROWSKI, J.J. Increasing A/D Conversion Resolution by

Dynamic Scale Adjustment S.M., MIT Massachusetts Institute of
Technology, BS, USA, 2008.
[8]
TAHA, M. R. Saleem. Speed Improvements for Dual-Slope A/D

Converters. IEEE Transactions on instrumentation and measurement,
IM-34, NO.4, 1985.
[9]
GEELEN, G; PAULUS, E.; SIMANJUNTAK, D.; PASTOOR, H.;

VERLINDEN, R. A 90nm CMOS 1.2V 10b Power and Speed
Programmable Pipelined ADC with 0.5pJ/Conversion-Step ISSCC
International Solid-State Circuits Conference 2006. Session 12
Nyquist ADCs 12.1, 2006
[10] CHOI, H. C.; KIM, Y. J.; YOO, S. W.; HWANG S. Y. LEE, S.-H. A
Programmable 0.8 V10-bit 60-MS/s 19.2-mW 0.13- m CMOS ADC
Operating Down to 0.5V. IEEE Transactions on circuits and systems
Express Briefs, VOL 55, No. 4, 2008
120
ISSN 977-2177-128009
Otimizao do algoritmo Non-local means utilizando

uma implementao em FPGA
Lucas Lucena Gambarra, Jos Antnio Gomes de Lima, Hamilton Soares da Silva, Leonardo Vidal Batista, Daniel Soares
e Marques
Centro de Informtica
Universidade Federal da Paraba
Joo Pessoa, Brasil
{lucas, jose, hamilton, leonardo}@di.ufpb.br danielsmarx@gmail.com
Abstract This paper proposes a hardware implementation for

the non-local means algorithm for image denoising with a lower
computation time using pipelines, hardware parallelism and
piecewise linear approximation. It is about 170 times faster than
the original non-local means algorithm, yet produces
comparable results in terms of mean-squared error (MSE) and
perceptual image quality.
I.
INTRODUO
Filtragem de rudo em imagem um dos problemas mais

importantes e amplamente estudados em tratamento de
imagem e viso computacional. O objetivo remover o rudo
efetivamente, preservando os detalhes da imagem original,
tanto quanto possvel. A reduo de rudos muitas vezes
necessria como um pr-processamento para outras tarefas,
tais como: compresso, segmentao e reconhecimento [1].
Em 2005, Buades [2] apresentou um algoritmo inovador
para este propsito, conhecido como non-local means (NLM).
Anteriormente ao NLM, vrias abordagens haviam sido
propostas para eliminar o rudo. Tais mtodos sempre operam
localmente na imagem. O NLM, por sua vez, utiliza a
informao codificada em toda a imagem para remover o
rudo, sendo notavelmente mais eficaz quando comparado a
algoritmos mais antigos. Entretanto, apresenta alta
complexidade computacional, dificultando sua utilizao para
enfrentar questes prticas, at mesmo para imagens
relativamente pequenas. O prprio Buades [2] sugeriu utilizar
apenas uma janela (NLM em janela) de pesquisa, em vez de
toda a imagem, para realizar a filtragem. Porm, mesmo
reduzindo bastante o tempo de execuo, a complexidade
continuou elevada.
Neste contexto, seria desejvel uma alternativa para
reduzir o tempo de execuo do NLM. com este propsito
que o presente trabalho prope uma implementao em
hardware especfico para o NLM em janela.
A Seo II introduz a teoria sobre o NLM e analisa a
complexidade. A Seo III mostra o fluxo da filtragem de
rudos e a estratgia para a implementao em hardware.
Informaes sobre sntese e a metodologia de simulao so
explicadas na Seo IV. A Seo V expe os resultados
alcanados em hardware e compara com a implementao em
software. Finalmente, a Seo VI apresenta as concluses.
II.
ALGORITMO PARA O NON-LOCAL MEANS
Primeiro, introduzido o algoritmo NLM original. E, em

seguida, apresentada a verso que utiliza janelas de busca
[2]. Uma anlise sobre a complexidade computacional de
cada verso feita no final.
A. Non-local means[2]
Dada uma imagem discreta com rudo = }, o
valor estimado , para um pixel , computado como
uma mdia ponderada de todos os pixels da imagem,
, ,
onde a famlia de pesos { , } , depende da similaridade

entre os pixels e , e satisfaz a condio 0 (, ) 1 e
, = 1.
A similaridade entre dois pixels e depende da
semelhana entre os vetores de intensidade de nvel de cinza
( ) e ( ), onde denota uma vizinhana quadrada de
tamanho fixo () centralizada no pixel . Esta
similaridade mensurada atravs da distncia Euclidiana
ponderada || ||22, , onde > 0 o desvio padro
do kernel Gaussiano[3].
Os pixels que possuem vizinhanas semelhantes quanto ao
nvel de cinza tm pesos maiores. Os pesos so
definidos como,
, =
1
()
|| ||22,
2
onde () a constante de normalizao
|| ||22,
2
e o parmetro atua como um fator de filtragem, controlando

o decaimento da funo exponencial.
O NLM no compara apenas o nvel de cinza em apenas
um ponto, mas sim a configurao geomtrica em uma
vizinhana inteira, o que o faz mais robusto que outros filtros
de rudo.
121
ISSN 977-2177-128009
Nessa abordagem, so usadas vizinhanas com 77 pixels

( = 7), implicando 49 operaes para calcular a distncia
Euclidiana ponderada. Analisando este clculo, percebe-se
que as 49 operaes so independentes entre si, possibilitando
realiz-las em paralelo. Alm disto, cada operao pode ser
separada em uma sequncia de etapas para implementao
em pipeline. Isto permite que, aps o devido preenchimento
do pipeline, seja realizado um clculo de distncia Euclidiana
ponderada por ciclo de relgio
B. NLM em janela e complexidade

Admitindo como o nmero de pixels da imagem, e
utilizando vizinhanas de dimenso , a complexidade do
algoritmo 2 4 . Entretanto, imagens naturais tendem a
ter grandes reas uniformes, com varincia geralmente
pequena entre pixels prximos. Esta concluso pode ser
alcanada, tambm, analisando os resultados em [4], que at
mesmo reportam melhor desempenho em reas uniformes
quando comparados com o algoritmo NLM original.
Restringindo a busca por pixels semelhantes a uma janela de
pixels e a complexidade final do algoritmo passa a ser
2 2 2 .
III.
B. Memria dinmica especializada (MDE) para

vizinhanas
Aps a construo do componente responsvel por efetuar
o clculo da distncia Euclidiana ponderada, foi observado
que o mesmo no seria efetivamente utilizado caso no
houvesse informaes sobre os 49 pixels a cada ciclo de
relgio. Desta forma, a execuo continuaria sequencial, pois
seria preciso aguardar os dados vindos da memria
(sequencial) para, s ento, operar. Uma memria
especializada com suporte para acesso simultneo a vrios
endereos mostrou-se indispensvel.
Uma alternativa seria replicar a janela em clculo por 25
memrias (supondo-as capazes de permitirem duas leituras
por ciclo de relgio). Entretanto, a Fig. 2 permite ver a
maneira como so acessadas as posies de memria ao
passar de uma vizinhana para outra: os pixels em vermelho
claro e em vermelho pertencem vizinhana 1; ao passar
para a vizinhana de os pixels em vermelho claros so
descartados e os pixels em vermelho escuro passam a compor
a vizinhana . Conclu-se ento que, na passagem de uma
vizinhana para outra, apenas sete dos 49 pixels so
realmente novos (os demais 42 precisam apenas ser
realocados). Esta observao viabilizou a construo de uma
soluo mais eficiente.
ACELERAO DO NLM EM FPGA
Utilizando o valor 7 para e 21 para W (conforme

sugesto dos criadores do NLM para filtragem de imagens em
nveis de cinza [2]), o tempo de execuo para o
processamento em software no reduzido a nveis razoveis.
Outra forma imediata de reduzir a complexidade seria
diminuir o tamanho da vizinhana, ou seja, o valor de
(para 5, ou at mesmo 3). Muitas abordagens que usam desta
prtica geralmente provocam distores na imagem filtrada
por falta de informao para calcular os pesos [5].
Para tornar o algoritmo rpido o suficiente para permitir
sua utilizao em muitas aplicaes, o presente trabalho
prope uma implementao em hardware, utilizando um
dispositivo FPGA (Field-Programmable Gate Array), da
verso em janela do NLM. O desenvolvimento em hardware
possibilita utilizar pipelines, paralelizao de instrues,
assim como a construo de memrias especializadas que
permitem buscar prontamente os dados necessrios para
efetuar os clculos.
Restries de desenvolvimento em hardware, tais como a
utilizao de valores fracionrios e clculos envolvendo a
funo exponencial, puderam ser contornadas utilizando
aproximaes por inteiros e aproximaes lineares por partes,
respectivamente.
A Fig. 1 mostra o fluxo da filtragem utilizando o NLM em
hardware. Adiante, detalhada cada fase do fluxo
apresentado.
Figura 1. Fluxo da filtragem de rudos para o NLM em FPGA.
Figura 2. Duas vizinahanas consecutivas: vizinhana n-1, centrada no

pixel n-1; vizinhana n, centrada no pixel n.
A. Paralelizao do clculo da distncia Euclidiana

quadrtica ponderada
Para melhorar o desempenho do NLM, foi elaborada uma
forma de paralelizar o clculo da distncia Euclidiana
ponderada.
O modo como os pixels so trocados na passagem de uma

vizinhana para outra pode ser observado na Fig. 3: setas
azuis, os pixels que so trocados entre registradores; setas
vermelhas, os que so descartados; e, finalmente, setas
122
ISSN 977-2177-128009
verdes, posies que recebem sete novos pixels relativos

nova vizinhana. Fica claro que apenas sete buscas
simultneas so realmente necessrias, desde que sejam
trocados de forma gil os demais 42 pixels da vizinhana
anterior, garantindo que as informaes necessrias para
calcular a distncia Euclidiana ponderada estaro sempre
disponveis a tempo. Para isto, foi desenvolvida uma
memria dinmica especializada (MDE) para permitir o
acesso simultneo s sete posies de memria. Foi suficiente
replicar a janela em clculo por apenas quatro memrias.
Visto que cada uma pode ter dois valores lidos (ou escritos)
ao mesmo tempo, possvel ler as sete posies
simultaneamente.
algoritmos so muito lentos quando comparados com uma

implementao em hardware [6].
Analisando as caractersticas do algoritmo, possvel
constatar que, para o clculo do NLM, a preciso da
exponenciao no essencial, mas sim sua caracterstica de
decaimento em funo da distncia Euclidiana. Dado um
valor de 2 (Equaes 2 e 3), possvel usar algumas retas,
como mostrado em [7], para aproximar o decaimento
exponencial, ver Fig. 4.
Figura 3. Troca de pixels de uma vizinahana para a seguinte.
C. Memria para janelas

A verso do NLM proposta no presente trabalho utiliza
janelas de 2121 (ou 441) pixels. Para que todos os pixels
possuam vizinhanas 77 vlidas, mesmo nas bordas da
janela, foram usadas janelas estendidas, com 27x27 (ou 729)
pixels. Para filtrar um pixel so necessrias uma vizinhana
(49 pixels) e uma janela estendida (729 pixels), ambas
centralizadas neste pixel. A entrada composta por pares de
pixels. Ento so necessrios 389 (metade de 729 mais 49)
para filtrar um pixel. Para cada pixel da janela (no estendida)
calculado seu peso em relao vizinhana do pixel a ser
processado. Ento, um total de 441 ciclos so gastos nesta
etapa. Durante o clculo dos pesos, a memria sendo utilizada
no deve ser alterada. Para melhorar o tempo de execuo, foi
acrescentada outra memria dinmica especializada (MDE),
para que, enquanto uma estiver sendo utilizada para calcular
os pesos a outra possa ser preenchida, a fim de que sempre
haja dados disponveis para calcular os pesos, mantendo o
pipeline preenchido.
Uma vez que so gastos 441 ciclos calculando pesos, e
so necessrios apenas 389 ciclos para armazenar os dados
necessrios na memria, conclui-se que h tempo suficiente
para preencher a memria extra, garantindo que sempre
haver dados para preencher o pipeline.
D. Aproximao da funo exponencial
Como pode ser observado nas Equaes (2) e (3) o NLM
faz uso intensivo do clculo de exponenciais. Entretanto,
muitos processadores no possuem uma unidade de
exponenciao em hardware. A operao implementada em
software sendo realizada por meio da combinao de tabelas
de busca, multiplicaes de ponto flutuante e adies. Esses
Figura 4. Aproximao do decaimento para 2 =200 usando dez retas.
Os pontos de uma reta podem ser calculados por
= +
onde o coeficiente angular e o coeficiente linear. Alm

das operaes de soma e subtrao, a multiplicao e a
diviso podem ser implementadas com pipelines [8,9].
E. Clculo do valor final do pixels
Obtidos os pesos relativos a cada pixel, resta calcular o
valor final do pixel. De acordo com (1), basta calcular a
mdia ponderada de todos os pixels da janela, o que feito
novamente com uso de pipelines.
IV.
IMPLEMENTAO E SIMULAO
O filtro proposto foi descrito usando a linguagem de

descrio de hardware (HDL) SystemVerilog. A descrio foi
compilada e sintetizada para o dispositivo EP3SL50F484C2
da famlia Stratix III usando o software Quartus II v.10.1 da
Altera .
A. Simulao
Para validar a funcionalidade, simulaes de filtragem de
rudo foram realizadas utilizando o software QuestaSim
v.10.0 da Mentor Graphics. Pixels de uma mesma imagem
(com rudo) foram enviados tanto para o filtro descrito em
SystemVerilog, quanto para outro descrito em linguagem
Java, supostamente correto. Os pixels resultantes de ambos
so comparados para verificar a correo. Este processo pode
ser visto na Fig. 5.
123
ISSN 977-2177-128009
A Tabela I e a Fig. 7 mostram uma comparao entre o

algoritmo NLM em FPGA e o NLM em software com relao
ao MSE, para rudos com diferentes desvios padro. Note que
a diferena se deve ao uso de retas para aproximar o
decaimento exponencial, e aproximao de nmeros
fracionrios por inteiros.
TABELA I. COMPARAES DE MSE
Desvio Padro
Mtodo utilizado
Figura 5.
V.
Metodologia de simulao.
RESULTADOS
10
15
20
NLM em software
8,91
24,17
40,56
113,61
NML em FPGA
8,89
24,68
41,42
113,71
Nesta sesso so apresentados os resultados de sntese em

FPGA, assim como os resultados e as comparaes do tempo
de execuo e da qualidade entre a verso em FPGA e a
verso em software do NLM em janela.
A. rea e frequncia
Para o dispositivo EP3SL50F484C2 da famlia Stratix III
da Altera, o filtro de rudos desenvolvido ocupa 34% da
lgica e 2% da memria. Foi alcanada uma frequncia
mxima de operao de 104,78 MHz (perodo de relgio de
aproximadamente 9,54ns).
B. Qualidade da filtragem
O filtro proposto em hardware obteve resultados
comparveis ao NLM em software em termos do erro
quadrtico mdio MSE. Observando a Fig. 6 pode-se
comparar visualmente as verses do NLM em janela em
software e em FPGA.
Figura 7. Grfico relativo Tabela I.
Figura 6. Da esquerda para a direita e de cima para baixo: imagem

original; imagem com adio de rudo; resultado do NLM em software
(MSE 28,36) e NLM em hardware (MSE 28,10).
C. Performance em tempo
Conforme sugerido por Buades [2], foi utilizado M=7, e
uma janela de pesquisa de 2121 pixels. Como as operaes
relativas ao clculo da distncia Euclidiana ponderada so
efetuadas em hardware paralelo.
So gastos 441 ciclos calculando pesos para filtrar um
pixel e todas as outras operaes so realizadas em pipeline.
Essa informao juntamente com perodo de relgio de
(9,45ns obtidos na simulao) permite inferir o tempo
despendido para filtrar uma imagem. Para uma imagem com
2 pixels so gastos 441 2 .
Na Tabela II, experimentos em um PC com um
processador Core I5 2.53GHz e 6GB de RAM demonstraram
que o NLM em software gasta aproximadamente 12 minutes
para filtrar uma imagem de 10241024 pixels, enquanto que
a verso em FPGA aqui proposta gasta 22 segundos
aproximadamente para filtrar a mesma imagem.
Comparado ao algoritmo original, o resultado do trabalho
proposto aproximadamente 170 vezes mais rpido,
conforme demonstrado na Tabela II e na Fig. 8.
124
ISSN 977-2177-128009
TABELA II. RESULTADOS DE DESEMPENHO

Tamanho
da
imagem
NLM em software
NLM em FPGA
Razo
512*512
198,32s
1,10s
180,3
740,12s
4,41s
167,8
1024*102
4
2592*194
4
Os resultados obtidos na simulao mostram que o algoritmo

executado em mdia 170 vezes mais rpido que na
implementao em software, e os resultados da reduo de
rudos so semelhantes tanto em MSE quanto em percepo
visual.
Mtodo utilizado
3740,50s
21,2s
REFERNCIAS
[1]
176,4
[2]
[3]
[4]
[5]
[6]
[7]
Figura 8. Grfico relativo a Tabela II
VI.
[8]
CONCLUSO
Este trabalho prope uma implementao para o algoritmo

NLM em janela usando um hardware de propsito especifico.
[9]
125
P. Coup, P. Yger, C. Barillot., Fast Non Local Means Denoising for

3D MR Images: SpringerLink, 2006.
A. Buades, B. Coll, and J Morel. A non-local algorithm for image
denoising. IEEE International Conference on Computer Vision and
Pattern Recognition, 2005.
Stio http://www.stat.wisc.edu/~mchung/teaching/MIA/reading/
diffusion.gaussian.kernel.pdf.pdf
M. Mahmoudi, G. Sapiro, "Fast image and video denoising via
nonlocal means of similar neighborhoods". Signal Processing Letters,
12(12):839.842, 2005.
N. SHAHAM, Mtodos para acelerao do "non--local means"
algoritmo de reduo de rudo - Dissertao (Mestrado em
Informtica). 1.ed. Rio de Janeiro: Pontifcia Universidade Catlica do
Rio de Janeiro, v.I, 2007.
R. Pottathuparambil, R. Sass, "Implementation of a CORDIC-based
Double Precision Exponential Core on an FPGA", Proceedings of the
Fourth Annual Reconfigurable Systems Summer Institute (RSSI '08),
Urbana, Illinois, USA, July 7-9, 2008.
R. Bellman, R. Roth, "Curve Fitting by Segmented Straight Lines",
Journal of the American Statistical Association Vol. 64, No. 327 (Sep.,
1969), pp. 1079-1084.
A. Panato, S. Silva, F. Wagner, M. Johann,R. Reis, S. Bampi, "Design
of very deep pipelined multipliers for FPGAs," Design, Automation
and Test in Europe Conference and Exhibition, 2004.
N. Takagi, S. Kadowaki, K. Takagi, "A hardware algorithm for integer
division," Computer Arithmetic, 2005. ARITH-17 2005. 17th IEEE
Symposium on , vol., no., pp. 140- 146, 27-29 June 2005.
ISSN 977-2177-128009
A Battery Charge Monitor Topology for Implantable

Medical Devices
Mrcio Bender Machado1,2, Mrcio Cherem Schneider 1, Alfredo Arnaud3
1
UFSC Universidade Federal de Santa Catarina, Brazil

2
IF-Sul Instituto Federal Sul-Rio-Grandense, Brazil
3
DIE, Facultad de Ingeniera y Tecnologas, Universidad Catlica, Uruguay
marciobma@gmail.com, marcio@eel.ufsc.br, aarnaud@ucu.edu.uy
Abstract This work proposes a very low power circuit, able to

monitor the charge of batteries used in implantable devices
through voltage and impedance analysis. The battery monitor is
composed of a sample-and-hold circuit and a Gm-C filter, in
which the transconductance amplifiers employ series and
parallel association of transistors. Besides, the system contains
level-shift circuits that limit the signal within the range of the
following A/D converter (0 V to 1.25 V). The system
functionality was verified through both simulation and
prototypes implemented on a 0.35 m technologies.
I. INTRODUCTION
Many biomedical applications such as pacemakers and
implantable prostheses depend on the operation of a battery.
Thus, implantable devices need some measuring circuit to
determine the battery charge consumed and/or estimate the
remaining charge. This information is critical to health
professionals, who must recommend the replacement of the
device, or postpone a delicate surgical procedure [1]. Thus, the
circuit that monitors the battery charge is fundamental,
requiring reliability, security, and very low power
consumption.
Some kinds of battery monitors circuits are available in the
literature [1] -[6]. In general, the majority of these circuits use
the traditional topology of charge integration, shown in Fig. 1,
which has continuous and undesirable energy consumption
over the lifetime of the battery. This topology integrates the
current of battery using a passive device as a resistor. In this
example, the value of the charge consumed is stored in digital
memory.
Fig. 1. Traditional topology of charge integration.
Other topologies estimate the residual charge using a

current integration method and the impedance or voltage
measurement throughout battery lifetime, since the batteries
for implantable devices present physical characteristics such
as voltage and impedance that present well defined
dependencies in terms of the consumed charge [5], [6].
This work proposes an integrated and safe topology that
monitors the battery charge of implantable devices from
measurements of internal resistance and voltage taken in a
specific period time over the battery lifetime. Unlike
traditional systems of current integration, which have
continuous energy consumption over the lifetime of the
battery, the proposed system remains on only when the
measurement is taken. Therefore, the average power
consumption is extremely low.
In our topology, the battery impedance is measured from
the variation of the battery voltage level in terms of a
controlled variation of battery current in a short time slot. In
order to achieve a good trade-off between area and power
consumption a design-oriented current-based MOSFET model
[7], [8] has been used.
II. THE PROPOSED TOPOLOGY
The proposed system is based on the monitoring of the
impedance and voltage characteristics of battery at defined
time intervals throughout battery lifetime. In order to reduce
the power consumption, it is suggested that the measurements
are taken once a day. The extracted values of impedance and
voltage are processed by an A/D converter and converted into
a digital signal, which is stored in a memory.
The developed topology is shown in Fig. 2. When switch
S1 is closed and both S2 and S3 are open, the battery voltage
value is stored in Csh1. When S1 is open and both S2 and S3
are closed, the additional consumption due to the current
source of 5 A (Iforce) forces a drop in the battery voltage,
whose value is stored in Csh2. The difference between the
voltages on the capacitors is processed through a differential
amplifier. Thus, the internal impedance is inferred by
measuring the voltage variation due to the current variation
imposed to battery.
126
ISSN 977-2177-128009
Fig. 2. The battery charge monitor topology.
TABLE I.
Variable
Expression
Drain current
I D = I F I R = I S (i f ir )
Specific Current
Source/drain to bulk
voltage
Pinch-of voltage
Drain-to-source
saturation voltage
Source
transconductance
Power spectral density
of thermal noise
(saturated transistor)
inversion level at the source(drain), is the mobility, n is the

slope factor, Cox is the oxide capacitance per unit area, t is
the thermal voltage, W is the channel width, L is the channel
length, VT0 is the equilibrium threshold voltage, VS is the
source-to-bulk voltage, VD is the drain-to-bulk voltage, VG is
the gate-to-bulk voltage, kB is the Boltzmann constant, T is the
absolute temperature, and N* and Not are technological
parameters related to number of interface traps.
TRANSISTOR MODEL
'
I S = nCOX
t2 W
2 L
)]
VP VS ( D ) = t 1 + i f (r ) 2 + ln 1 + i f (r ) 1
VP
VG VT 0
n
(
( 1+ i
VDS , SAT t 1 + i f + 3
g ms( d ) =
2I S
f (r)
1 + if + 1
8
2
Sint = kBTg m
3
1 + if 1
Thus, the internal battery impedance RBAT is
RBAT =
VCsh1 VCsh 2
Iforce
(1)
When the measurements are taken the switch on/off is

closed; otherwise, the switch is off and the power
consumption of the monitor is negligible. Note that the current
source Ib is always on, since it is employed for biasing the
other parts of the implantable device. The battery voltage
measurement is taken by a level-shift (source follower) circuit.
III.
DEVELOPED CIRCUITS
A. The transistor model

The main expressions of the transistor model used in this
work are summarized in the table I, where if(r) is the
normalized forward(reverse) saturation current, or the
B. The impedance monitor

The impedance monitor consists of a sample-and-hold
circuit (S1, S2, Csh1 and Csh2 in the Fig. 2) which stores the
battery voltage drop due to the forced current (Iforce). This
circuit is composed of two poly capacitors and two minimal
area PMOS switches. The small differential-voltage stored in
the sample-and-hold circuit, is properly amplified through a
differential amplifier, which also performs a low-pass filtering
function to attenuate the spurious signals generated by the
switching circuit. For the filtering and differential amplifier
functions we have used the active low-pass OTA-C filter [8],
[9] shown in Fig. 3, that presents a good trade-off between
accuracy and silicon real estate. Its transfer function is
H ( s) =
Vout
Gm1
1
=
Vbat+ -Vbat- Gm2 1 + sCf / Gm2
(2)
where Gm1 and Gm2 are the tranconductance of each

amplifier and Cf is the filter capacitor.
For lowfrequency applications, large time constants and,
thus, small values of transconductance are required. At the
same time, the OTA (operational transconductance amplifier
or simply Gm) at the output is required to operate at a
relatively high differential voltage. To comply with the
(sometimes conflicting) requirements of large time constant
and high linearity as required by the filter, we used the current
division technique of the OTA (differential pair) output
127
ISSN 977-2177-128009
current through the series-parallel association of transistors

[9], [10], as shown in Fig. 4. With this topology, it is possible
to operate at higher linearity and small transconductance
value, as required by the filter.
2
2
2
I 2 g
2
2 + 4 D1 2 m I ( VT ) + 2
2
N . gm1
D M 5
gm
D M 1
2
2
I2 g
2
+ 2 D12 m ( VT
+ 2
)
I

D M 7
gm1
2
2
= 2 ( VT
Vin
)+
+ ...
(7)
where the first term in square brackets refers to the sourcecoupled pair (M1 and M2), the second term in square brackets
refers to the series-parallel current mirror (M3, M4, M5 and
M6), while third term in square brackets refers to the current
mirror (M7 and M8) in Fig. 4.
To reduce silicon area, the filter capacitor, Cf in the Fig. 3,
was implemented as a gate capacitor biased in strong
inversion, which leads to an area around five times smaller
than that of a polysilicon capacitor on the same 0.35 m
process. Considering operation in strong inversion and the
expressions of table I, the relationship between the inversion
level and the gate capacitance CG can be written as
Fig. 3. Differential OTA-C filter used to amplify the signal from the
impedance monitoring.
CG C ' oxWL 1
n i f + 2
(8)
The differential amplifier presents a non zero common

mode output signal. In order to move this signal to the
appropriate range of the A/D converter, the source follower
shown in Fig. 5 is used.
Fig. 4. OTA circuit that employs series and parallel associations of transistors
for current division.
Analyses of noise and offset have been developed for the

differential amplifier. Using the equations shown in the table I
and references [8] and [9], the thermal (3) and the flicker (4)
noise of the each OTA circuit is given by
Vn 2 thermal =
Vn 2 flic ker =
4 nk BT ( f 2 f1 )
1+ if1 +1
Gm
2 nk BT ln ( f 2 / f1 ) N ot _ n
+
N *C 'ox
(WL )1
( 1+ i
f1
N ot _ n
2 2 N ot _ p
+ 1
+
N
(
WL
)
(
WL )3
(3)
Fig. 5. Source follower circuit.
(4)
Using the expressions of table I, VOUT in Fig. 5 is given by
where f1 is related to the time interval in which the filter is on

and f2 is the bandwidth of the filter.
Using (3) and (4), we can calculate the total noise of the
differential amplifier considering the flicker and thermal noise
terms as
Gm2
2
2
2
Vn Amp
= VnOta
1 + VnOta 2
Gm1
2
AVT
2
A2
, 2 =
2WL
2WL
(6)
where AVT and A are technology parameters. The total

mismatch in each OTA is about
VIN VT 0
+ Kt
n
(9)
where
I
I
K = 1 + B 2 + ln 1 + B 1
I S1
I S1
(5)
For the sake of simplicity, only variation in the threshold

voltage, VT, and in the current factor will be considered for
mismatch analysis [11]. Thus
2
VT
=
VOUT
(10)
C. The voltage monitor

In order to monitor the voltage variation of the battery over
its useful lifetime a level-shift circuit, which properly moves
the signal for the range of the A/D converter, is used. The
voltage monitor shown in the Fig. 6 presents the voltage
output VOUT=VBAT-2VGS. The value of the bias current and the
aspect ratio of the diode connected transistors were designed
for a proper value of VGS.
128
ISSN 977-2177-128009
were emulated by a power supply connected to a resistor. Fig.

8 shows the test of the impedance monitor circuit for a series
resistance ranging from 100 to 10 k. In the Fig. 9 are
shown both, the experimental and simulated (using BSIM
3v3.1 model) behavior of the voltage monitor considering the
battery variation from 2.8 V to 2.0 V.
The table II summarizes the main characteristics of the
implemented circuit.
Fig. 6. Level-shift circuit, used to monitor the battery voltage. VBAT is the
battery voltage.
IV. EXPERIMENTAL RESULTS

The design of the battery monitor whose layout is shown in
Fig. 7, was implemented on the AMS 0.35 m technology.
For this design we considered the Li/I2 battery [2], used as a
standard for pacemakers. The voltage over lifetime of a Li/I2
battery ranges from 2.8V to 2.0V and the impedance ranges
from 100 to 10 k. For the purpose of safety, we considered
the minimum battery voltage equal to 2.4V and a maximum
impedance of 5 k. The analog blocks have been designed to
provide inputs to the A/D converter ranging from 0 to 1.25V.
Fig. 8. Experimental measurement of the impedance monitor, considering a
power supply in series with a resistor (100 to 10 k).
Fig. 9. Experimental and simulated measurement of the voltage monitor

circuit in function voltage battery variation.
The results extracted from the test chip confirm the

functionality of the circuit. The linearity of the circuit,
especially in the range of the component under measurement
(up to 5 k, considering Li/I2 battery used in pacemakers) is
quite acceptable. The stage of the digital control circuit was
implemented using a FPGA circuit.
Fig. 7. Layout and micrograph of the battery monitor in the AMS 0.35 m
technology.
The experimental results of the monitor prototype are

shown in Fig. 8 and Fig. 9. The characteristics of the battery
129
ISSN 977-2177-128009
TABLE II.
MAIN CHARACTERISTICS OF THE MONITOR CIRCUIT
Variable
Value
Gain of the differential

amplifier
Gm1
56 nA/V
Gm2
7 nA/V
Voltage shift of the

voltage monitor
1.6 V
Total area
~ 0.15 mm2
Current consuption/day
~ 6.5 nAh
V. CONCLUSION
We have presented a battery monitor circuit. Its main
advantages are the extremely low power consumption and the
simplicity of the method. The monitor functionality has been
experimentally verified for a prototype implemented in a 0.35
m CMOS technology.
ACKNOWLEDGMENT
The authors would like to acknowledge CNPq and CAPES,
Brazilian agencies for scientific development, and the Genius
Institute of Manaus, Brazil, for the general support of this
work.
REFERENCES
[1]
Symposium on Circuits and Systems, Lansing, US, pp. 1068-1072,

Aug. 2000.
[2] J. G. Webster, Design of Cardiac Pacemakers, New York: IEEE
Press, 1995.
[3] F. Silveira, D. Flandre, Low Power Analog CMOS for Cardiac
Pacemakers, Boston: Kluwer Ac. Pub, 2004.
[4] L. S. Y. Wong, S. Hossain, A. Ta, J. Edvinsson, D. H. Rivas and h.
Ns, A Very Low-Power CMOS Mixed-Signal IC for Implantable
Pacemaker Applications, IEEE J. Solid-State Circuits, vol. 39, no. 12,
pp. 2446-2456, December 2004.
[5] M. Obel, N. Skldengen, J. Lindber, Method and Circuit for
Determining the Battery Status in a Medical Implant, U.S. Patent
6,748,273, June 2004.
[6] C. R. Rogers, D. R. Merritt, C. L. Schmidt, J. Mukul, System and
Method for Monitoring Power Source Longevity of an Implantable
Medical Device, U. S. Patent 6,901,293, May 2005.
[7] A. I. A Cunha, M. C. Schneider, C. Galup-Montoro, An MOS
Transistor Model for Analog Circuit Design, IEEE J. Solid-State
Circuits, vol. 33, no. 10, pp. 1510-1519, October 1998.
[8] M. C. Schneider, C. Galup-Montoro, CMOS Analog Design Using
All-Region MOSFET Modeling, Cambridge University Press, 2010.
[9] A. Arnaud, R. Fiorelli and C. Galup-Montoro, Nanowatt, Sub-nS
OTAs, With Sub-10-mV Input Offset, Using Series-Parallel Current
Mirrors, IEEE J. Solid-State Circuits, vol. 41, no. 9, pp. 2009-2018,
September 2006.
[10] C. Galup-Montoro, M. C. Schneider, I. J. B. Loss, Series-Parallel
Association of FET's for High Gain and High Frequency Applications,
IEEE J. Solid-State Circuits, vol. 29, no. 9, pp. 1094-1101, September
1994.
[11] M. J. M Pelgrom, A. C. J. Dunmaijer, A. P. G. Welbers, Matching
Properties of MOS Transistors, IEEE J. Solid-State Circuits, vol. 24,
no. 5, pp. 1433-1440, October 1989.
A. E. Zadeh, A micro-power precision switched-capacitor charge

meter system for implantable medial devices,. IEEE Midwest
130
ISSN 977-2177-128009
A Schrodinger-Poisson CAD tool for calculation of

tunneling through different gate-oxide potential
profiles
Gabriela A. Rodrguez, Arturo Sarmiento and Edmundo Gutierrez
National Institute of Astrophysics, Optics and Electronics
Luis Enrique Erro No. 1, Puebla, 72840, Mexico
Emails: {gardz, jarocho, edmundo} @inaoep.mx
AbstractThe modeling of nano-scale MOSFET involves a

self-consistent solution of the Schrodinger-Poisson equations. In
this paper, we introduced a CAD tool that allows the user to
specify any potential profile by resorting to a piece-wise defined
formulation. In this way, not only is the user able to incorporate
the most commonly used potential profiles found in MOSFET
gate-oxides, but also to experiment with new profiles that result
from nano-scale complex multiple-stacked oxides. A comparison
of gate current densities for different potential profiles is given.
In this work, a CAD tool has been developed that allows the
user to supply any type of potential profile, which can be used
as preliminary design tool and for research and educational
purposes as well.
The mathematical and numerical models are presented in
section II and III respectively. In section IV, simulations
assuming different potential profiles are shown and finally, in
section V, some conclusions are drawn.
I. I NTRODUCTION
II. S CHR ODINGER

AND P OISSON EQUATIONS
Nowadays, with the aim to obtain a better device performance and higher density integration, the dimension reduction
to nanometer scales of the MOSFET is an inevitable trend. At
this scale, various quantum effects become dominant over the
device performance. For instance, gate-oxide tunneling and
channel energy quantization, are some quantum mechanical
effects that affect the MOSFET performance [1]. Due to
this reduction in size, the gate oxide thickness is scaled to
nanometer dimensions and the substrate is highly doped to
avoid short channel effects. As a direct consequence, there
is a high electric field in the silicon-oxide interface, which
results in gate oxide tunneling. Besides, the quantum well
potential formed in the inversion layer confines the inversion
channel under the interface. Therefore, Classical Mechanics
is not appropriate to describe the charge transport of a nanoscale MOSFET and Quantum Mechanics has to be taken into
account. In confined quantum systems, as in this case, the
coupled Schrodinger-Poisson equation system is used in order
to model the charge transport of nano-scale MOSFETs.
In the last two decades, several research groups have
proposed different numerical methods to solve the coupled
Schrodinger-Poisson equations [2]-[8]. However, these numeric solvers are able to use only a small variety of potential
profiles in the gate oxide-substrate interface, such as symmetric, square, asymmetric and parabolic profiles. It clearly results
that in order to tackle the new effects arising in nano-scale
interfaces, where multiple gate oxide stacks are used, a wider
variety of gate oxide potential barrier profiles are required.
The time-independent Schrodinger equation describes the

motion of an electron (see Eq. 1) and the Poisson equation
describes the relation between the space charge and the
electrostatic potential (see Eq. 2).
]
[
2 d2
h
2 + V (z) = E
2m dz
d2
q
= (N d n(z))
dz 2
(1)
(2)
In the Schrodinger equation (Eq. 1), is the wave function,

E is the energy eigenvalue, V (z) is the potential energy and
m is the effective mass. In the potential equation (Eq. 2), is
the dielectric constant, is the electrostatic potential, n(z) is
the electron density distribution and N d is the ionized donor
concentration.
The carrier charge density n(z) is determined from the solution of the Schrodinger equation by using the wave function
given as:
(
)
Ei
m kT
2
n(z) =
| i | ln 1 + exp(
)
kT
2h2 i
(3)
where k is the Boltzmann constant, T is the temperature and

h is the reduced Planck constant.
Thus, the electron density is obtained after solving the
Schrodinger equation and then it is introduced as an input
131
ISSN 977-2177-128009
initial condition into the Poisson equation. Due to the redistribution of charges, the new potential is calculated by solving
the Poisson equation and once again the Schrodinger equation
is solved with this new potential in an iterative manner.
The electrostatic potential is related to the potential energy
in the Schrodinger equation by equation 4, where Ec is the
potential energy profile due to the band offset at the interface
oxide-substrate.
Fig. 1.
V (z) = q + Ec
In this work, the main contribution lies in the fact that

the potential profile Ec can be represented by a piece-wise
function which allows the user to define a wider selection of
profiles. During the discretization procedure, the electrostatic
potential and the potential energy profile Ec must be
evaluated at each point of the mesh.
The relation between the Schrodinger and the Poisson
equations is an iterative procedure that is finished until a
convergence criterion is fulfilled, namely when the difference
between potential energy values in successive iterations becomes smaller than a certain tolerance:
and
j1 2j + j+1
q
= (N d n(zj ))
2
z
Jj =
Once the convergence criterion is reached, the current

density is calculated by using equation (Eq. 5).
(
)
q
h
d
Im
m
dz
2
h
[j1 + (Vj 2j E) + j+1 ] = 0
2m 2
(8)
(9)
respectively. Where j indicates the position along z-axis and

z is the step size. Similarly, the current density is represented
by a finite difference equivalent as shown in equation 10.
| V (z)i+1 V (z)i |
J(z) =
Discretization in z-axis of nano-scale MOSFET
(4)
[
]
qh
i,j+1 i,j1
Im
m i
2z
(10)
In addition, in order to avoid non-convergence, a damping

factor w = 0.1 is used to calculate the potential V (z) used in
the Schrodinger equation (see Eq. 4) [5], [7].
(5)
V (z)i = V (z)i + w(V (z)i+1 V (z)i )
(11)
III. N UMERICAL M ODEL
IV. S IMULATIONS AND R ESULTS
In order to solve the Schrodinger and Poisson equations,

they are discretized by using the central finite difference
approach given as:
In order to analyze the impact of different potential profiles

in the Schrodinger-Poisson system solution, simulations were
carried out using the following profiles: Square (see fig. 2(a)),
Step (see fig. 3(a)), Parabolic (see fig. 4(a)), Gaussian (PoschlTeller) (see fig. 5(a)), Potential well formed between high-k
dielectric and silicon interface [9] (see fig. 6(a)) and Triangular
Potential well [10] (see fig. 7(a)).
In order to carry out the simulations procedure, the initial
and boundary values for Vg are set as:
df (z)
f (zj+1 ) f (zj1 )
=
dz
2z
(6)
d2 f (z)
f (zj1 ) 2f (zj ) + f (zj+1 )
=
dz 2
2z
(7)
In this case, it is assumed that the cross section is uniform

in the x and y axes with periodic boundary conditions, that is
to say, structure discretization is along the z-direction with a
uniform mesh step as depicted in Figure 1.
After substituting the finite differences scheme from 6 and
7 into equations 1 and 2, we obtain:
Vg |z=0 = 0
and
Vg |z=L = VG
where VG is the electric potential applied in the gate. Besides

the initial values for VD and VS are null. Dielectric constants
SiO2 = 4, Hf O2 = 25 and an effective mass m = 0.25m0
were assumed in all the cases. The thickness of the oxide
132
ISSN 977-2177-128009
(a)
(b)
(a)
Fig. 2. Square Profile. (a) Shows the bending of the potential when Vg = 0
and (b) shows the electron density.
(a)
(b)
Fig. 4. Parabolic Profile. (a) Shows the bending of the potential when Vg = 0
(a)
(b)
(b)
Fig. 3. Step Profile. (a) Shows the bending of the potential when Vg = 0
Fig. 5. Gaussian Profile. (a) Shows the bending of the potential when Vg = 0
is assumed to be approximate 2nm and the thickness of the

channel is assume to be 4-5nm. For the high-k dielectric
potential, an equivalent oxide thickness EOT = 2nm was
assumed [11]. In all the cases a uniform mesh of 0.2nm is
used.
In figures 2(a)-7(a), it is shown how the potential is bending
when a voltage VG = 0 is applied, while in figures 2(b)-7(b)
it is shown the variation of electron density when a voltage
VG = 0 is applied.
It is interesting to note, that in some cases when a voltage
VG is applied (e.g. VG = 1.0, fig. 6(b)), the electron density
redistributes into two potential wells. This is because an extra
potential well is formed due to the bending of the potential.
The electron densities provided by our CAD tool are similar
to those reported in [3], [4], [7], specifically for the case of
square profile.
In addition, the probability current density was also calculated using equation 5. Figure 8 shows the current density for
each of the potential profiles. In the potentials profiles with a
lower channel width, (e.g. figs. 2(a), 3(a),4nm) the voltage
required to tunnel through the barrier is lower than in wider
channels (4(a), 5(a),5nm). Due to the electric properties of
the high-k dielectric potential, the voltage required to tunnel
through the barrier is lower even though the channel width is
greater (5nm).
In summary, potential profiles play a significant role in the
current density. Thus, in order to get a more real representation
of the behavior of a nano-scale MOSFET, it is important to

take into account the potential profiles.
V. C ONCLUSIONS AND F UTURE W ORK

A CAD tool for solving the Schrodinger and Poisson
equations has been presented in this paper, it allows as an
input any potential profile, which can be used as a preliminary
and fast tool for research and design applications. Moreover,
a comparison of current density for different potential profiles
was performed. Through analyzing the results, we observed
that changing the potential profiles produces a significant
difference in the current density. Hence in order to obtain more
accurate simulations, more realistic potential profiles need to
be taken into account.
As future work, we propose to extend this work to two
dimensions, which will allow the user to define any potential
profile along the channel length and width axis of the nanoscale MOSFET. The calculation tool is being calibrated against
experimental results, which will be presented in the near
future.
133
ISSN 977-2177-128009
(a)
(b)
Fig. 6. High-K Dielectric Profile. (a) Shows the bending of the potential
when Vg = 0 and (b) shows the electron density.
Fig. 8.
(a)
(b)
Fig. 7. Triangular Profile. (a) Shows the bending of the potential when
Vg = 0 and (b) shows the electron density.
R EFERENCES
Probability Current Density
[10] Chang L. Yang K., Yeo Y-C., Choi Y-K., King T-J., and Hu C.
Reduction of direct-tunneling gate leakage current in double-gate and
ultra-thin body MOSFETs.Electron Devices Meeting. IEDM Technical
Digest. International. pp.521-524, 2001.
[11] Zhao Y. and White M.H. Modeling of Direct Tunneling Current through
Interfacial Oxide and High-K Gate Stacks. Solid-States Electronics Vol.
48, issue 10-11, 2004.
[1] Trellakis A. Andlauer and Vogl P. Efficient Solution of the SchrodingerPoisson Equations in Semiconductor Device Simulation Lecture Notes in
Computer Science 3743, pp. 602-609, 2006.
[2] Tan I-H, Snider G.L., Chang L.D. and Hu E.L. A self-consistent solution of Schrodinger-Poisson equations using nonuniform mesh.Journal of
Applied Physics Vol. 68, No. 8, October, 1990.
[3] Lo S.H,Buchanan D.A.and Taur Y. Modeling and characterization, of
quantization, polysilicon, depletion, and direct tunneling effects in MOSFETs with ultrathin oxides.IBM Journal of Research and Development
Vol. 43, No. 3, May, 1999.
[4] Abramo A.,Cardin A., Selmi L. and Sangiorgi E. Two- Dimensional
Quantum Mechanical Simulation of Charge Distribution in Silicon MOSFETs.IEEE Trans. on Electron Devices Vol. 47, No. 10, October, 2000.
[5] Driskill S.Heterostructure Device Simulations using a Self-Consistent
Schrodinger-Poisson Formulation.REU Conference, 2004.
[6] Curatola G., Doornbos G., Loo J., Ponomarev Y. and Iannaccone G.
Detailed Modeling of Sub 100nm MOSFETs Based on Schrodinger DD
Per Subband and Experiments and Evaluation of the Performance Gap
to Ballistic Transport. IEEE Transactions on Electronic Devices, Vol. 52,
No. 8, 2005.
[7] Datta S.Quantum Transport: Atom to Transitor. Cambridge University
Press, 2005.
[8] Karner M., Gehring A.,Holzer S. Pourfath M.,Wagner M.,Goes W.,
Vasicek M., Baumgarter O., Kernstock C., Schnass G., Zeiler G., Grasser
T., Kosina H. and Selberherr S. A multi-purpose Schrodinger-Poisson
Solver for TCAD applications. Journal of Computational Electronics Vol.
6, pp. 179-182, 2007.
[9] Darbandy G., Ritzenthaler R., Lime F., Garduo I., Estrada M., Cedeira A.
and Iiguez B. Analytical modeling of direct tunneling current through gate
stacks for the determination of suitable high-k dielectrics for nanoscale
double-gate MOSFETs. IOP Semiconductor Science and Technology Vol.
26, 2011.
134
ISSN 977-2177-128009
UMA FERRAMENTA EDUCACIONAL PARA

O ENSINO DE SIMULATED ANNEALING E
POSICIONAMENTO
Tania Mara Ferla, Guilherme Flach, Ricardo Reis
PGMICRO / PPGC - Instituto de Informtica
Av. Bento Gonalves 9500
Porto Alegre - Brasil
{tmferla, gaflach, reis}@inf.ufrgs.b
r
Resumo Neste trabalho apresentamos uma ferramenta
parametrizvel e interativo baseado em Simulated
educacional e interativa que pode ser usada como
Annealing. Busca-se com a implementao desse
material de ensino do algoritmo Simulated Annealing
posicionador, uma ferramenta didtica e interativa que
(SA). O SA aplicado para resolver o problema de
possa ser usada como material de ensino para SA e
posicionamento de circuitos e busca encontrar solues
posicionamento, que permita uma melhor compreenso
prximas a tima para problemas NP-completos. A
do funcionamento do algoritmo.
interface proporciona uma visualizao dos passos do SA
tornando-a mais iterativa e didtica. A ferramenta
desenvolvida visando padres web (HTML 5, CSS e
JavaScript) o que permite acessibilidade em diversas
plataformas.
A ferramenta roda em qualquer navegador de

Internet com suporte a HTML5, permitindo que seja
utilizado em diversos dispositivos eletrnicos como
computadores e tablets sem necessidade de instalao.
Palavras-chave Ferramentas de Ensino VLSI,

Simulated Annealing (SA), Posicionamento, Algoritmos
para Sntese Fsica.
Este artigo est dividido da seguinte forma: na

seo II apresentada ferramentas de ensino em VLSI;
de na seo III apresentado o conceito bsico sobre
SA. A implementao do algoritmo detalhado na
seo IV. Na seo V ser descrita a interface do SA.
A seo VI demonstra os resultados, dos testes
realizados no algoritmo. E a seo VII mostra as
concluses deste trabalho.
I.
INTRODUO
Em um projeto de circuitos integrados deve-se

levar em considerao alguns objetivos, tais como rea,
velocidade, dissipao de potncia, tempo de projeto,
testabilidade do circuito e o custo total. Para atingir
tais objetivos, algumas etapas da sntese de leiaute de
circuito devem ser consideradas, tais como
particionamento, posicionamento e roteamento. O
posicionamento define a posio das clulas de um
circuito, sem as sobrepor, procurando diminuir o
comprimento das conexes entre as clulas[1]. Os
mtodos de posicionamento podem ser divididos em
trs categorias principais: (1) simulated annealing; (2)
particionamento e (3) analticos.
O SA [4] conhecido por ser um mtodo capaz de
fornecer solues prximas da tima, mas em um
tempo elevado. Isso faz com que mesmo circuitos
relativamente pequenos (dezenas de milhares de
clulas) no possam ser tratados inteiramente com SA
em um tempo vivel. Entretanto, o SA encontra espao
em solues hbridas como o caso do posicionador
Dragon [5] o qual mistura particionamento e SA. O SA
pode ser usado para resolver problemas menores
derivados do problema completo.
II. FERRAMENTAS DE ENSINO PARA VLSI

Com a disseminao da aprendizagem a distncia
esto sendo desenvolvidas cada vez mais ferramentas
Web que possibilitam o acesso a diversos tipos de
contedos de forma dinmica atravs da internet. Isso
no diferente na microeletrnica, onde tecnologias
web podem ser utilizadas no apenas para ensino a
distncia, mas para aperfeioamento e auxlio da
aprendizagem em sala de aula. Atualmente existem
vrios applets, que possibilitam a visualizao de
algoritmos muito utilizados em projetos VLSI como,
por exemplo em [6]. Possibilitando assim, a
visualizao passo a passo de como eles funcionam,
permitindo compreender mais rapidamente qual sua
dinmica. Existem recursos online, onde esto
disponveis vrios recursos para a rea de VLSI, fsica
e matemtica, como por exemplo [6][7][8] e [9].
Neste trabalho, apresentado um posicionador
135
ISSN 977-2177-128009
III. SIMULATED ANNEALING

O SA uma metaheurstica para otimizao
proposta por em 1983 por Kirkpatrick et al. [12]
baseado em um mtodo de busca proposto por
Metropolis et al. [13] em 1953. SA utilizado para
resolver
diversos
problemas
NP-completos,
principalmente na rea de EDA, mas tambm em
muitas
outras
reas
como
por
exemplo
telecomunicaes, biologia, geologia, informtica e
medicina [14].
O SA foi fundamentado numa analogia a um
processo
conhecido
como
recozimento
da
termodinmica, utilizado em metalurgia para fundir
metal. Basicamente, o processo consiste em: aumentar
a temperatura do metal para um valor mximo e depois
realizar um resfriamento de forma lenta at a
resolidificao. Dessa forma, o procedimento termina
quando a temperatura chega a um valor prximo a zero
onde a probabilidade de se aceitar uma soluo pior
quase inexistente.
Seguindo esses princpios, o algoritmo SA pode
ser definido basicamente por 3 funes principais: a
funo de perturbao, a funo de clculo de custo, e
a funo que aceita a perturbao ou no, como cdigo
abaixo [3].
Faa enquanto (temp*=0.99 > valor final){
Faa perturbao();
Se ( novo_custo < custo_atual || rand() < e^(deltacost / temp) )
aceita();
seno
rejeita();
}
obtido atravs da perturbao. E T a temperatura a

qual deve ser adaptada conforme o tamanho do
circuito - quanto maior for o circuito maior deve ser a
temperatura.
Quanto maior a temperatura, maior o valor gerado
por (1) e consequentemente maior a chance de uma
perturbao ser aceita. Similarmente, quanto maior for
o delta (i.e quanto maior o aumento do custo), menor
ser a chance de aceitao.
O SA em altas temperaturas se comporta como um
algoritmo aleatrio, ou seja, faz perturbaes aleatrias
e aceita praticamente todas elas. Mas em baixas
temperaturas se comporta como um algoritmo guloso,
ou seja, aceita somente as perturbaes que melhoram
o estado atual do algoritmo. E quanto mais guloso,
maior a probabilidade de ficar preso em um mnimo
local, pois s aceita os resultados melhores.
No caso do posicionamento implementado, a
funo de perturbao escolhe aleatoriamente uma
clula e a posiciona, tambm de forma aleatria. A
funo que calcula o custo, tem o half-perimeter wire
length (HPWL) [10] como sendo o valor de custo.
O HPWL uma estimativa do comprimento de
fios necessrio para conectar os pontos de uma rede.
Ele calculado atravs do semi-permetro do retngulo
mnimo que engloba todos os nodos da rede.
A figura 1 (a) mostra a distribuio inicial das
clulas e suas respectivas conexes e (b) o resultado
final, aps a simulao. Pode-se observar o HPWL
inicial (a), antes do posicionamento com SA e como
ficou o posicionamento, HPWL e o overflow depois do
posicionamento (b).
Onde:
temp: temperatura ou nmero de iteraes
faz_perturbao(): a funo que move uma
clula, ou seja, faz a perturbao
novo_custo = custo novo, obtido depois que as
alteraes foram feitas, sendo que o custo o
HPWL
custo_atual: custo anterior a perturbao
rand: nmero aleatrio entre 0 e 1
deltacost: custo novo custo antigo
aceitar(): aceitar a perturbao
rejeitar(): rejeitar a perturbao
(a)
Caso a soluo gerada pela perturbao seja melhor

que a anterior, a perturbao mantida. Caso contrrio,
um nmero aleatrio no intervalo [0,1] gerado e se
for menor que (1) a perturbao mantida.
e delta/ T
(1)
(b)
Fig. 1 (a) HPWL inicial=45825 e (b) final=5065
A funo de aceitao (1) depende de duas

variveis: delta e T. O delta a variao do custo
136
ISSN 977-2177-128009
(significando uma melhora no resultado) ou se for um

nmero aleatrio entre 0 e 1, aceita a perturbao,
seno, rejeita e desfaz o movimento (perturbao).
Pode-se observar, na figura 1, que para encontrar o

menor HPWL, o SA aproxima as clulas que tem
conexes. Nessa simulao demonstrada na figura 1, o
overflow (sobreposio de clulas) comeou com 39
mas diminuiu para 0.
IV. IMPLEMENTAO
O SA deste trabalho foi desenvolvido voltado a
web, utilizando JavaScript e HTML5. E tambm foi
utilizado o pacote jQuery para desenhar as barras de
valores de alguns parmetros que podem ser
dinamicamente alterados, como o HPWL, overflow e
temperatura.
Os parmetros definidos para este algoritmo
foram:
-
as clulas podem ser de tamanho entre 1 e

5, aleatoriamente;
as clulas podem se sobrepor, gerando

overflow (sobreposio), sendo que
depois definido um valor de
importncia (peso), que utilizado
quando o custo calculado para que o
overflow tenha mais prioridade sobre o
HPWL.
O peso do HPWL definido como padro

1, mas o usurio poder alterar o valor
como desejar. Podendo assim, visualizar
o efeito que a mudana no valor vai criar
sobre o desempenho do posicionamento.
O peso do overflow (sobreposio)

definido 1000 como padro, mas o
usurio tambm poder alterar o valor. A
temperatura definida pelo usurio
Fig. 2 Fluxograma da funo principal do SA
A funo principal do SA pode ser definida

basicamente pela figura 2.
Os passos da funo so (segundo a figura 2):
-
Primeiro seleciona uma clula de forma

aleatria;
Depois calcula o custo atual que igual a:
V. INTERFACE
Um dos objetivos do trabalho foi desenvolver uma
interface com facilidades para a visualizao da
execuo do posicionamento gerado pelo SA[11] ,
como apresentado na figura 3.
pesoHPWL*hpwl_atual+ pesoOverflow*overflow_atual,
onde, o peso do HPWL e o peso do overflow

so o peso de importncia de cada um.
-
Perturba a clula: move a clula selecionada,

para um local selecionado aleatoriamente;
Depois calcula novamente o custo, com os

valores atualizados do HPWL e do overflow;
Calcula o delta custo, sendo este a diferena

entre o novo custo com o custo antigo;
E, por fim, se o delta custo for menor que 0
Fig. 3 - Interface do SA
137
ISSN 977-2177-128009
Pode-se observar na figura 3, que foram

determinados alguns padres para a simulao, como
por exemplo: a temperatura, o nmero de iteraes
internas o peso do HPWL e do overflow. Mas a
interface permite ao usurio modificar esses padres,
digitando o nmero de clulas, conexes, temperatura e
o nmero de iteraes internas que deseja executar. O
nmero de iteraes internas o nmero de vezes que
o algoritmo executa antes de redesenhar na tela, por
exemplo se digitar 1, pode acompanhar o
posicionamento de clula por clula clicando no boto
"Parar" e "Rodar", ou seja, o SA executando passo-apasso. Em uma simulao com menos clulas d para
observar exatamente quais so os movimentos de cada
uma, o overflow que causa ou no, e a mudana do
HPWL.
Fig. 4 Alterao de peso do overflow de clulas durante a execuo
VI. RESULTADOS
A interface permite visualizar os efeitos na

mudana de alguns dos parmetros (HPWL, overflow e
temperatura) mesmo durante a execuo, como por
exemplo, ao alterar o peso do overflow (sobreposio)
para um valor mnimo ele passa a ter menos efeito
sobre o HPWL, como mostra a figura 4, observa-se que
isso gera sobreposio das clulas.
Foram realizados testes de comparao entre o

HPWL inicial e o HPWL final e comparaes entre o
overflow inicial e o obtido aps a execuo do
algoritmo. Para isso foram definidos 12 circuitos de
forma aleatria. A temperatura, o peso do HPWL e o
peso do overflow foram definidos em todos os testes
como 1000, 1 e 1000 respectivamente. Os resultados
obtidos esto na tabela 1.
TABELA 1 - Comparao de hpwl e overflow antes e depois da execuo
C1
C2
C3
C4
C5
C6
C7
C8
C9
C10
C11
C12
# clulas
100
350
700
805
599
458
235
522
255
315
153
425
# linhas
100
400
535
638
579
685
123
352
300
600
245
354
HPWL inicial
47915
182595
236925
274955
258905
302030
56470
155320
130520
268185
110440
163135
HPWL final
5120 (-89,3%)
49195(-72,06%)
26150 (-88.96% )
27080 (-90.15% )
46540 (-82.02% )
139390 (-53.85% )
4850 (-91.41% )
26235 (-83.11% )
27145 (-79.20% )
92820 (-65.39% )
20660 (-81.29% )
76545 (-53.08% )
Com base na anlise dos resultados obtidos nos testes,

pode-se concluir que esse algoritmo teve mais xito com
circuitos menores, como por exemplo de at 400 clulas.
51
407
1273
1572
978
637
193
815
225
329
88
550
Overflow final
0
0
946 (-25.69% )
1323 (-15.84% )
574 (-41.31% )
155 (-75.67% )
0
373 (-54.23% )
0
0
0
0
VII. CONCLUSES
O SA desenvolvido neste projeto permite ao usurio ter
uma boa visualizao de como o algoritmo funciona e
dinmico, permitindo a alterao de valores durante a
execuo. Outra facilidade que o algoritmo pode ser
executado passo a passo e isso permite compreender melhor
alguns conceitos bsicos do posicionamento, como por
exemplo HPWL. O SA tambm mais acessvel, uma vez
que foi desenvolvido para ser rodado via web e pode ser
acessado na pgina http://www.inf.ufrgs.br/~tmferla/teste/
sa.html.
O problema do SA desenvolvido neste trabalho, est na

velocidade de execuo que lenta, como previsto de
incio, pois foi desenvolvido em linguagem HTML 5
voltada para web, mas o objetivo do projeto no o tempo
de execuo mas sim a visualizao do problema, que tem
como vantagens uma interface que permite modificar
manualmente os parmetros HPWL, overflow e
temperatura, permitindo a visualizao dos efeitos que os
mesmos causam na execuo, como por exemplo, o tradeoff que ocorre entre o HPWL e o overflow.
Overflow inicial
Mesmo j existindo muitas ferramentas para ensino

sobre diversos assuntos dentro da rea de projetos de
138
ISSN 977-2177-128009
[8] Formation of a PN Junction Diode

and its Band Diagram
http://fiselect2.fceia.unr.edu.ar/fisica4/simbuffalo/education/pn/pnformatio
n_B/index.html#
[9] VISUALGOS: N QUEENS
http://yuval.bar-or.org/index.php?item=9
[10] A. A. Kennings and I. L. Markov, "Smoothening Max-terms and
Analytical Minimization of Half-Perimeter Wirelength" VLSI Design, vol.
14, no. 3, 2002, pp. 229-237.
[11] Simulated Annealing http://www.inf.ufrgs.br/~tmferla/teste/sa.html
[12] S. Kirkpatrick, C. Gelatt, Jr., and M. Vecchi, Optimization by
simulated annealing, Science, Vol220, No 4598, May 1983, p. 671680.
[13] Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H.,
and Teller, (1953), "Equations of State Calculations by Fast Computing
Machines," Journal of Chemical Physics, 21, 1087-1092.
[14] Simulated annealing: theory and applications
Kluwer Academic Publishers Norwell, MA, USA 1987
ISBN:9-027-72513-6
circuitos integrados, percebe-se que ainda falta muito a ser

desenvolvido, pois essa rea muito ampla, principalmente
Web, que possibilita a criao de materiais mais acessveis.
REFERNCIAS
[1] Johann, M. Algoritmos para Sntese Fsica. EMICRO 2004
http://lcr.uns.edu.ar/electronica/Posgrado/EAMTA/Documents/CAD
/Part3/EAMTAmjem2k4final.ppt
[2] Reis, Ricardo A. L.. Concepo de Circuitos Integrados. Porto Alegre:
II da UFRGS, 2000. 252 p.
[3] Reis R. Johann, M. Notas de Aula.
http://www.inf.ufrgs.br/~johann/cmp241/aula07.partplacefloor.ppt.
pdf
[4] S. Kirkpatrick, C.D. Gelatt, Jr., e M.P. Vecchi. Optimization by
simulated annealing. Science, 220(4598):671680, Maio de 1983.
[5] Dragon. http://er.cs.ucla.edu/Dragon/
[6] CADAPPLETS Animations of VLSI CAD Algorithms.
http://workbench.lafayette.edu/~nestorj/cadapplets/
[7] Falstad http://www.falstad.com/mathphysics.html
139
ISSN 977-2177-128009
Novel Fabrication Tecniques for RF-MEMS Devices

Georgina Rosas and Roberto Murphy
Wilfrido Moreno

Puebla, Mexico
e-mail: girosas@inaoep.mx, rmurphy@inaoep.mx
University of South Florida,

Tampa-FL, USA
e-mail: wmoreno@usf.edu
AbstractMEMS parallel-plate capacitor structures are widely

used in microelectromechanical systems (MEMS) as sensors,
actuators, radiators, and optical system elements, among many
other applications. However, the design and the fabrication
process techniques are a very important aspect in the behavior
and performance of MEMS devices. In this paper, we present a
novel release technique for RF-MEMS, analyzing materials for
high frequency applications with a high yield and avoiding
stiction and structure collapse effects. The structures used in this
work are MEMS capacitors.
I.
INTRODUCTION
The
design
and
fabrication
process
for
microelectromechanical systems (MEMS) are a fundamental
aspect of device performance. MEMS capacitors have been
proposed for many applications such as sensors, radiators,
antennas (RF systems), mirrors (optical systems), and
material testing, among others. However, there are some
aspects that are important to consider in the design of the
structure (identifying and determining the critical parts of the
design); the structural materials used in the fabrication
process (determining the thickness of the film, the
combination of the materials and the deposition technique),
etc. In addition, the performance of MEMS devices is
affected by the mechanical behavior, whose residual stress
gradient can be controlled with the appropriate release
process for the MEMS structures. However, recent novel
designs for MEMS devices have successfully exploited the
effect of residual stress gradients to improve the performance
of such devices, as for instance in achieving a high ramp in
the fabrication process, specially, in the lithography steps of
the sensors, so to achieve more movement [1]. This work
presents the design of MEMS capacitors and a novel
fabrication process technique using surface micromachining
technology, which is fully compatible with integrated circuit
processes.
This paper is organized as follows. Section II first presents
the design of a MEMS capacitor, then establishes and
analyzes the materials for high frequency devices based on
their RF performance, which is evaluated considering
material losses and electromagnetic properties. Section III
explains the fabrication process flow. Finally, Section IV

shows results and the most relevant conclusions of this work.
II.
MEMS CAPACITOR DESIGN
The MEMS capacitor consists of two parallel plates (two

electrodes, one mobile-positive and one fixed-negative),
whose capacitance can be varied using the electrostatic
principle.
The capacitance between parallel plates is determined by:
C=
0rWL
g
(1)
where 0 is the permittivity of free space, r is the relative

permittivity, W is the design width (W=596m), L the length
(L=596m) and g is the gap between electrodes
(g=130m).
Using these values in Equation (1), the capacitance varies
from 0.69 to 6.29 pF.
Figure 1 shows a diagram of the structure of the variable
capacitor using MEMS. Design techniques such as dimples,
holes, and some others were included to support functional
stability, operation and a proper release of the structure [2-4].
In this work, the dimple size is of 15x15m. The structure
design is formed by a series of three dimples, which are
isolated from the ground plane to avoid parasitic
capacitances.
140
Figure 1. MEMS capacitor
ISSN 977-2177-128009
The RF response was taken into account in the design and

fabrication of the MEMS capacitor, that is, materials which
present low losses, both conductor and dielectric, were
chosen according to their properties (skin depth and
dispersion parameters). Also, the electromagnetic properties
of the dielectrics (permittivity, permeability) were
considered. The type of substrate for a MEMS capacitor to be
used as a high-frequency device has to be taken into account.
In this case, the substrate is a high resistivity silicon wafer,
rather than a float-zone silicon wafer, so that it resembles an
insulator as much as possible. This type of substrate has been
successfully used for the fabrication of low-loss passive
components and RF-MEMS devices [5-6].
Another aspect to consider is the performance of the MEMS
structure, which is assessed by its mechanical properties,
some of which are displayed in Table I, and which depend on
the particular application. In this work, a combination of
titanium (Ti) and aluminum (Al) was used for the conductive
lines. This was decided due to the fact that titanium is a hard
material (Youngs Modulus is high), giving the structure a
more robust mechanical stability, and based on the
malleability and low price of aluminum. This combination of
metals was evaluated in the RF regime. The device under test
(DUT) was a coplanar transmission line to be used as a bias
feed for the MEMS structure.
TABLE I.
Summary of electrical and mechanical properties of the

different metals used in the fabrication processes [7].
Material
Resistivity
[-m]
Youngs
Modulus
[GPa]
Melting
point
[C]
Poisson
ratio
Tensile
Stress
[MPa]
124170
240370
Gold
Au
0.02214
79-109
1064
0.42-0.44
Titanium
Ti
0.42
116-120
1668
0.32
0.0282
70-79
660.4
0.35
40-50
0.01678
110-128
1084
0.34
210
Aluminum Al
Copper
Cu
the thickness of the metals to be used in the fabrication

process in order to minimize losses. On the other hand, a
combination of metals is used in this work to achieve a better
device performance. Thus, knowing the skin-depth for each
metal, the total thickness can be calculated.
Figure 2. Skin Depths of Gold, Titanium, Aluminum and Copper
To verify the performance and losses for different metals

(Ti, Au, Ti + Au and Ti + Al) in the RF regime, transmission
lines were simulated using HFSS at a design frequency of 5.8
GHz, as illustrated in Figure 3.
The S21 insertion loss response is indicative of the
transmitted signal through the transmission line and picked up
at the receiver, which will be affected by impedance
discontinuities, by series-resistance losses, and by the losses
due to the dissipation factor of the dielectric materials. Hence,
based on fundamental material properties, higher frequencies
will generally be more attenuated than lower frequencies. The
insertion loss is a direct measure of the ratio of the transmitted
amplitude at each frequency component to the incident
amplitude at the front of the transmission line. Therefore, the
best response is given by a Tx with Au. However, Ti + Al
were selected based on their acceptable response and lower
price.
Penetration depth (skin depth) is a measure of the depth an

electromagnetic wave penetrates a material [7]. The skin
depth, usually written , is the distance over which a 1/e decay
of the E (Electric) and H (Magnetic) fields and current occurs.
When conductivity is high, as for example in metals, the skin
depth is given to a very good approximation by [8]:
(2)
The skin depth has been determined for different metals.

As Figure 2 shows, titanium presents a larger skin depth with
frequency than other materials (gold, aluminum and copper).
For example, the skin depth of gold at 5.8 GHz is 1.032 m,
for aluminum is of 1.071 m and for titanium it is 4.89 m.
Hence, if the skin depth is known, it is possible to determine
Transmission Line CPW
Figure 3. Simulations of transmission lines with different metals (Ti,

Au, Ti + Al and Ti + Au). S21 Insertion loss response versus frequency.
141
ISSN 977-2177-128009
III.
FABRICATION PROCESS
The fabrication process is based on surface micromachining

technology, which under the implemented process conditions
constitutes a novel fabrication technique, in the sense that it
can naturally evolve for reconfigurable MEMS devices,
specially, for microwave devices.
Figure 4 illustrates the fabrication process of a tunable
MEMS capacitor. It consists of five materials and four levels
of masks on a high-resistivity silicon wafer (N type, <100>
orientation, 4000-8000 cm resistivity and 490-530 m
thickness). Firstly, silicon dioxide is grown on the silicon
wafer as an electrical and thermal insulator, as displayed in
Figure 4(a). The MEMS capacitors consist of two metal
layers (bi-layer), formed by titanium (Ti) and aluminum (Al),
which are deposited as structural materials by sputtering, with
one suspended level for the mechanical structures. Each metal
level is deposited to a thickness of 0.5 m titanium and 2 m
aluminum, to avoid losses due to skin depth and to provide
the structure a more robust mechanical stability, as shown in
figures 4(b), 4(c) and 4(d). Subsequently, SU8-2002 is
deposited on top of the first metal bi-layer as a dielectric, and
the dimples are formed. The purpose of these is to avoid the
collapse of the structure, and to prevent a DC voltage short,
therefore insuring the correct operating performance of the
device, as shown in Figure 4(e). A 5 m thick sacrificial layer
of AZ-4620 is then deposited, and the structure is spin-coated
with positive photoresist (PR+) and patterned by UV
lithography, as can be seen in figures 4(f) and 4(g). The
patterned sacrificial layer is then thermally cured. Next, a
second 2 m thick metal layer is deposited by sputtering. In
this step, the mobile electrodes are formed, as displayed in
Figure 4(g). Finally, the structures are released using a
positive photoresist remover, and then annealed in a
conventional oven, to be subsequently etched by plasma, as
illustrated in Figure 4(h).
(a)
(b)
(c)
(d)
(e)
A. Release Technique
A new technique for the release of the MEMS structure
was determined, based on a series of experiments, based on
both chemical and dry etching processes [9-11].
This new technique consists of three steps: First, a
chemical treatment using C5H9NO [12] and methanol in two
steps. The aim of the first step is to eliminate the sacrificial
layer, while methanol is used to clean the waste and to keep
the structure wet. Second, a thermal process at 120C in a
conventional oven in a nitrogen (N2) environment for 15 min
is performed to dry the structures. Finally, a dry etching
process to eliminate photoresist residues that could cause
stiction between the electrodes is undertaken using the
following conditions: oxygen (O2), 500 mTorr and 200 Watt
during 10 min. The great advantage is avoiding the use of a
Critical Point Drying (CPD) process [13], which has several
limitations, such as surface tension effects, small sample size
and the use of CO2. The procedure herein outlined is highly
reliable and inexpensive, simple and easily done.
(f)
(g)
(h)
142
ISSN 977-2177-128009
(a)
(i)
Figure 4. Fabrication Process. (a) Standard wafer cleaning followed by the
thermal growth of SiO2, (b) 0.5m thick Titanium deposited by sputtering,
(c) 1.5m thick aluminum deposited by sputtering, (d) Ground plane
patterned with Ti and Al by lithography, (e) Dimple formed by SU8-2002
pattering, (f) AZ-4260 sacrificial layer forming, (g) Definition of anchor by
pattering, (h) Ti-Al deposited by sputtering and (i) MEMS release process.
IV.
(b)
RESULTS AND CONCLUSIONS
The main challenges in the fabrication process were the

development of the second level of metallization that defines
a suspended level for the mechanical structure, and the most
critical issue was the release process. As a result, a novel
technique for the release of the RF-MEMS device was
developed.
The thickness of the second metal layer was analyzed. It
was concluded that a thick layer was needed to insure
mechanical stability and reduced losses at high frequencies.
In this work, the best thickness range was determined to be
between 2.5 to 4 microns. We also concluded that the
deposition of metals by sputtering presented good results; that
is, they manifest the same grain structure, the process is easy
to control (pressure, bias and temperature), it presents good
step coverage and good adhesion at the titanium and
aluminum interface.
Figures 5(a) and 5(b) show SEM (Scanning Electron
Microscope) photographs of the MEMS capacitor. The
successful release of the MEMS capacitor was performed
with the proposed procedure, which avoids stiction, sticking,
deformation and structural collapse.
Figure 6 shows a profile in the X-axis of the MEMS
capacitor, obtained with an interferometry technique using a
VEECO optical profiling system. The characterization of the
MEMS capacitor was done using an optical profiling system
and SEM. The best results were obtained with the novel
release procedure herein presented. Its principal advantage is
that it does not present surface tension effects, which cause
deformation, stiction and collapse, which may be presented
when using CPD. The C5H9NO used in this work has
completely different properties that the commonly used
acetone in the CPD technique. This chemical allows for the
release of the structure through the residual stress gradient,
obtaining a wide gap between the electrodes of the MEMS
capacitor.
Figure 5. SEM photographs. (a) MEMS Capacitor in different views and

(b) Details (dimples) of the MEMS capacitor, showing a gap of 30
micrometers between electrodes.
!
Figure 6. Profile in the X axe of the MEMS capacitor by the interferometry
technique.
ACKNOWLEDGMENT
The authors wish to acknowledge CONACyT, Mexico, for
the partial support of this work through Grant 83774-Y, and
FORDECyT project number 115976. Georgina Rosas also
thanks CONACyT for the scholarship to undertake doctoral
studies, number 102735 and for the support in carrying out
this research. Special recognition is due to the
Nanotechnology Research Center (NREC) at the University of
South Florida (USF) for their valuable support during the
fabrication of the device.
REFERENCES
[1]
143
Kuang-Shun Ou, Kuo Shen Chen, Tian-Shiang Yang and Sen-Yun Lee,
A Novel Semianaltical Approach for Finding Pull-In Voltages of
Micro Cantaliver Beams Subjetect to Electrostatic Loads and Residual
Stress Gradients, Journal of Microelectromechanical Systems, Vol. 20,
No. 2 april 2011.
ISSN 977-2177-128009
[2]
[3]
[4]
[5]
[6]
[7]
A Dec. K. Suyama, Micromachined Electro-Mechanically Tunable

capacitors, IEEE Transactions IEEE Transactions on Microwave
Theory and Techniques, Vol 46, No.12, December 1998.
S. Pamidighantam, R. Puers, et al, Pull-in Analysis of Electrostatically
Actuated Beam Structures with Fixed-Fixed and Fixed-Free end
Conditions, J. of Micromechanics and Mircoengineering, 12 , pp. 458464, June 2002.
David A. Czaplewski, Christopher W. Dyck, A Soft Landing
Waveform for Actuation of a Single-Pole Single-Throw Ohmic RF
MEMS Switch, Journal Microelectromechanical Systems, Vol.15,
No.6, December 2006.
Hctor J. de los Santos, George Fischer, Harrie A.C. Tilmans and Joots
T.M. van Beek, RF MEMS for Ubiquitous Wireless Connetivity: Part
2-Application,IEEE Microwave Magazine, 2004.
J.T.M. van Beek, M.H.W.M. van Delden, A. van Dijken, P. van Eerd,
A.B.M. Jansman, A.L.A.M. Kemmeren, Th.G.S.M. Rijks, P.G.
Steeneken, J. den Toonder, M.J.E. Ulenaers, A. den Dekker, P. Lok, N.
Pulsford, F. van Straten, and L. van Teeffelen, High-Q integrated RF
passives and RF-MEMS on silicon, in Mat. Res. Soc. Symp. Proc.,
Boston, 2003, vol. 783, pp. B3.1.1B3.1.
Manoj Gupta and Wong Wai Leong, Eugene. Microwaves and
Metals. Singapore : John Wiley & Sons Pte Ltd, 2007.
[8]
[9]
[10]
[11]
[12]
[13]
144
I. A. Glover, S. R. Pennock and P. R. Shepherd. Microwave Devices

Circuits and Subsystems for Communications Engineering. England :
Jonh Wiley & Sons, Ltd., 2005.
N. Tas, T. Sonneberg, H. Jansen, R. Legtenberg and M. Elwenspoek,
Stiction in surface micromachining, J. Microchemical Microengineer,
pp 385-397, UK 1996.
C. H. Mastrangelo and C.H. Hsu. Mechanical Stability and Adhesion
of Microstructures under Capillary Forces-Part I: Basic Theory,
Journal Microelectromechanical Systems, vol.2, no.1, March 1993.
C. H. Mastrangelo and C.H. Hsu. Mechanical Stability and Adhesion
of Microstructures under Capillary Forces-Part II: Experiments
Journal Microelectromechanical Systems, vol.2, no.1, March 1993.
2008 Chemical Book,
http://www.chemicalbook.com/ProductChemicalPropertiesCB2402488
_EN.htm
I. Jafri, H. Busta and S. Walsh, Critical Poitn Drying and Cleaning for
MEMS Technology, SPIE Conference on MEMS Reliability for
critical and Space Applications, September 1999.
ISSN 977-2177-128009
Algoritmo SHA-3 Keccak em Smart Card:

Implementao e Desempenho
Fbio Dacncio Pereira e Fernando Yokota
Edward David Moreno Ordonez
Centro Universitrio Eurpides de Marlia UNIVEM

Marlia, Brasil
{prof.fabiopereira, feryokota}@gmail.com
Universidade Federal de Sergipe - UFS

Aracaju, Brasil
edwdavid@gmail.com
Resumo A Integridade da Informao uma das metas

relacionadas segurana de informaes que se destaca no
cenrio atual com o advento da Internet. Uma das tcnicas e
mtodos para garantir a integridade de uma informao a
utilizao de funes de hash, que gera uma cadeia de bytes
(hash) que deve ser nica. Porm grande parte das funes
atuais j no consegue evitar ataques maliciosos e garantir que a
informao tenha apenas um hash. Atualmente por meio de uma
chamada pblica o National Institute of Standards and
Technology (NIST) convocou centros tecnolgicos, empresas e a
comunidade cientfica para enviar propostas para o novo padro
de funo de hash, chamado de SHA-3. Uma vez apresentados os
candidatos, estes so expostos e avaliados em diversos aspectos.
Neste contexto, neste trabalho foi selecionado um dos algoritmos
finalistas (Keccak) da chamada para o SHA-3 e posteriormente
este foi implementado em smart cards, com o intuito de obter de
dados de desempenho para futura comparao com trabalhos
correlatos.
I.
incumbncia de identificar de forma nica a informao

original.
Porm grande parte das funes atuais j no consegue
evitar ataques malciosos e garantir que a informao tenha
apenas um hash. A fim de resolver este problema, por meio
de uma chamada pblica o National Institute of Standards and
Technology (NIST) convocou centros tecnolgicos, empresas
e a comunidade cientfica para enviar propostas para o novo
padro de funo de hash, chamado de SHA-3. Uma vez
apresentados os candidatos, os mesmos sero expostos e
avaliados em diversos aspectos.
Neste contexto, este trabalho se props a selecionar um
dos algoritmos finalistas da chamada para o SHA-3 e
posteriormente implement-lo em um dispositivo smart card,
com o intuito de obter de dados de desempenho para
comparao com trabalhos correlatos.
INTRODUO
II.
Com o advento da Internet servios como comunicao,

entretenimento, negcios, entre outros, fizeram da rede
mundial de computadores um arranjo computacional
complexo e organizado de recursos e pessoas. Os servios, as
facilidades e a eficincia da Internet so atrativos que levam
esta a uma evoluo contnua. Entretanto, problemas que
antes no eram considerados, atualmente tornaram-se uma
preocupao de usurios e empresas, como por exemplo, a
segurana de informaes neste ambiente.
Neste trabalho foi destacada uma das solues para
mitigar problemas relacionados a um servio importante na
rea de segurana de informaes, a integridade da
informao.
A integridade da informao uma das metas
relacionadas segurana de informaes que se destaca no
cenrio atual. Uma das tcnicas ou mtodos para garantir a
integridade de uma informao a utilizao de funes de
hash, que gera uma cadeia de bytes (hash) de tamanho fixo a
partir de uma determinada informao. Este hash tem a
FUNES DE HASH
Um dos princpios da criptografia a integridade, ou seja,

garantir que uma informao no sofra qualquer tipo de
alterao indesejada no seu armazenamento, transmisso ou
apresentao. Como citado categoria especfica de
algoritmos tem a incumbncia de tratar e implementar esse
tipo de servio de segurana, so conhecidos como funes
de hash.
Funes de hash so funes matemticas que a partir de
uma informao cifrada ou no e de tamanho varivel, gera
uma cadeia de valores de tamanho fixo, denominado de hash
(ou message digest) [1].
Funes de hash so operaes de nico sentido, ou seja,
no possvel obter uma informao a partir de seu hash
gerado. Alm disso, uma informao cifrada tem tamanho
similar informao original (texto claro), j as funes de
hash geram apenas um resumo, que no tem relao com o
tamanho da informao original.
Pesquisa Financiada pela FAPESP. Processo n 2010/07835-2
145
ISSN 977-2177-128009
Uma funo de hash H segue algumas caractersticas ou

propriedades distintas que a diferem de outras funes
similares [1], que so:
Dada uma mensagem M, fcil gerar seu hash h.
Dado h, computacionalmente invivel encontrar M

tal que H(M) = h. Esta propriedade conhecida como
resistncia de pr-imagem.
Dado M, computacionalmente invivel encontrar

M tal que H(M) = H(M). Esta propriedade
conhecida como resistncia de segunda pr-imagem.
Uma funo de hash resistente a colises uma funo H

que alm de satisfazer as caractersticas descritas acima,
tambm satisfaz a propriedade [2] em que:
computacionalmente invivel encontrar um par M,

M tal que H(M) = H(M). Esta propriedade
conhecida como resistncia a colises.
Entre os algoritmos de hash utilizados atualmente

destacam-se Message-Digest Algorithm 5 (MD5) e o Secure
Hash Algorithm (SHA-1 e SHA-2).
O MD5 um algoritmo de hash de 128 bits desenvolvido
em 1991 por Ronald Rivest, Adi Shamir e Leonard Adleman
(fundadores da RSA Data Security), atualmente muito
utilizado para verificao de integridade de arquivos e logins
em softwares que utilizam o protocolo Peer-to-Peer (P2P).
O SHA foi desenvolvido pela National Security Agency
(NSA) e publicado pela National Institute of Standards and
Technology (NIST) que padronizou a funo nos EUA. Os
trs algoritmos SHA so diferentes estruturalmente e so
conhecidos como SHA-0, SHA-1 e SHA-2. Na famlia SHA2 existem ainda quatro variantes que possuem estruturas
similares, porm geram hashes de tamanhos diferentes que
so o SHA-224, SHA-256, SHA-384 e SHA-512, e so
utilizados atualmente em aplicaes que exigem alta
segurana da integridade.
No entanto foram reportados sucessos de ataques tanto
para o algoritmo MD5 [3], assim como, para os algoritmos
SHA-0 e SHA-1[4], que geram colises (informaes
diferentes que produzem hashes iguais), o que fere o
princpio das funes de hash, que a de garantir a
integridade de uma informao.
III.
SHA-3
A funo SHA-2 atualmente continua segura e

inquebrvel, porm como o mesmo compartilha de uma
herana estrutural similar ao seu antecessor, o SHA-1, o torna
suspeito e levanta dvidas quanto a sua segurana.
Como resposta, em 2007 foi aberta uma competio com
o princpio de escolher um novo padro de funo de hash.
Organizado pelo National Institute of Standards and
Technology (NIST), atualmente a competio se encontra em
sua terceira e ltima etapa. A terceira etapa da competio
comeou no final de 2010, onde cinco algoritmos finalistas
foram selecionados. previsto que o algoritmo finalista seja
anunciado e publicado pelo NIST em 2012 [5].
As funes escolhidas para esta terceira etapa so:

BLAKE, Grstl, JH, Keccak e Skein. Neste projeto foi
escolhida a funo de hash denominada Keccak,
desenvolvida por Guido Bertoni, Joan Daemen, Michal
Peeters e Gilles Van Assche.
IV.
KECCAK
O algoritmo escolhido neste projeto foi o Keccak, pois

apresenta uma estrutura relativamente simples e verstil, e
por ser bem aceito na competio.
O algoritmo Keccak pertence famlia de funes de
esponja e utilizam esta construo para produzir
transformaes ou permutaes de tamanho fixo com o
intuito de criar um algoritmo que a partir de uma entrada de
qualquer tamanho gere uma sada de tamanho arbitrrio [6].
Um dos grandes atrativos desta nova funo de hash a de
que a mesma pode gerar a partir de uma entrada de tamanho
varivel uma sada de tamanho infinito. Alm disso, funes
de esponja possuem relativa segurana contra todos os
ataques genricos existentes [6].
O algoritmo Keccak consiste de duas partes: a funo
Keccak-f[b](A,RC), que realiza as permutaes e operaes
lgicas sobre os dados e a funo de esponja Keccak[r,c](M),
que organiza e prepara os dados de entrada para realizar a
manipulao desses dados pela funo de permutao e
organiza os valores de sada para gerar o hash.
A. Funo de Permutao
O Keccak utiliza de tcnicas de permutao para gerar o
hash. Na funo de permutao pode ser escolhida uma das
sete permutaes disponveis, denotada como Keccak-f[b],
onde b {25, 50, 100, 200, 400, 800, 1600}, que representa a
largura de permutaes. A largura da permutao tambm a
largura do estado S na funo de esponja.
Os valores do estado so organizados em uma matriz A de
formato 5x5, que contm 25 posies de tamanho w bits,
onde w = {1, 2, 4, 8, 16, 32, 64}, tal que
w = b 25,
por exemplo, caso b seja igual a 1600, o tamanho de cada
posio da matriz ir ter 64 bits.
Esta funo realiza um nmero de rodadas nr onde em
cada rodada so realizadas cinco etapas que realizam
operaes lgicas e permutaes de bits nos blocos de dados
contidos na matriz A. O nmero de rodadas nr depende da
largura de permutao, que dada por
nr = 12+ 2*l,
onde 2*l = w.
Seguindo o exemplo acima, se w for igual a 64, o nmero
de rodadas seria igual a 24, portanto em cada uma das 24
rodadas seriam realizadas as cinco etapas de operaes
presentes na funo de permutao.
As cinco etapas citadas anteriormente que iro manipular
os dados so referenciadas com letras gregas, que so
(theta), (rho), (pi), (chi) e (iota). Cada uma delas tem
146
ISSN 977-2177-128009
objetivos diferentes e maneiras

manipulao dos dados da matriz.
especficas
para
A Fig. 1 apresenta o pseudocdigo da funo de

permutao, onde a matriz A contendo blocos de informao
e o RC (round constants) so parmetros de entradas, e so
realizadas as operaes lgicas XOR, NOT e AND,
permutaes e rotaes de bits sobre as informaes contidas
na matriz A, que geram ao final da rodada uma nova matriz
de sada A de tamanho 5x5 com informaes operadas pelas
cinco etapas.
+ c = 1600. Na competio do NIST, os desenvolvedores do

Keccak submeteram quatro propostas do tamanho em bits n
do hash gerado e dos parmetros r e c [8], que so: (i) n =
224: Keccak[r = 1152, c = 448]; (ii) n = 256: Keccak[r =
1088, c = 512]; (iii) n = 384: Keccak[r = 832, c = 768]; (iv) n
= 512: Keccak[r = 576, c = 1024].
A funo Keccak[r,c] possui as fases de inicializao,
padding, absoro e compresso, que podem ser vistas no
pseudocdigo representado na Fig. 2.
Keccak[r,c](M) {
Initialization and padding
S[x,y] = 0,
forall (x,y) in (04,04)
P = M || 0x01 || 0x00 || || 0x00
P = P xor (0x00 || || 0x00 || 0x80)
Keccak-f[b](A) {
forall i in 0nr-1
A = Round[b](A, RC[i])
return A
}
Round[b](A,RC) {
step
C[x] = A[x,0] xor A[x,1] xor A[x,2] xor A[x,3] xor A[x,4],
forall x in 04
D[x] = C[x-1] xor rot(C[x+1],1),
forall x in 04
A[x,y] = A[x,y] xor D[x],
and steps
B[y,2*x+3*y] = rot(A[x,y], r[x,y]),
Absorbing phase
forall block Pi in P
S[x,y] = S[x,y] xor Pi[x+5*y],
S = Keccak-f[r+c](S)
Squeezing phase
Z = empty string
while output is requested
Z = Z || S[x,y],
S = Keccak-f[r+c](S)
return Z
step
A[x,y] = B[x,y] xor ((not B[x+1,y]) and B[x+2,y]), forall (x,y) in (04,04)
forall (x,y) such that x+5*y < r/w
forall (x,y) such that x+5*y < r/w
}
step
A[0,0] = A[0,0] xor RC
return A
Figura 2. Funo de Esponja Keccak[r,c].
O parmetro RC utilizada pela funo de permutao

um vetor contendo 24 valores no formato hexadecimal que
so utilizados na etapa .
Na fase de inicializao criada uma matriz de estado S

de tamanho 5x5 que ir ser preenchida pela informao de
entrada. Na fase de padding ocorre o preenchimento da
informao de entrada M com um padro de 0x01 (valor
hexadecimal), seguido de n 0x00 necessrios seguido de 0x80
final, para tornar a informao M mltipla do parmetro r.
Em cada rodada nr um desses valores utilizado (o valor

referente ao nmero da rodada) para realizar a operao
lgica XOR no primeiro bloco da matriz A. Esta operao
tem como objetivo quebrar a simetria, para evitar falhas que
podem ser exploradas em ataques contra o algoritmo [7].
O preenchimento da informao necessrio, pois o

algoritmo disponibiliza a opo da mensagem de entrada ter
um tamanho varivel que nem sempre dispe de um tamanho
aceitvel (mltipla de r) para realizar as operaes de
permutao.
B. Funo de Esponja
A funo Keccak[r,c] utiliza a construo em esponja,
que recebe um valor de entrada de tamanho varivel e gera
uma sada de tamanho arbitrrio [7]. A funo recebe dois
parmetros, onde: r o parmetro de bitrate e c o parmetro
de capacidade.
Na fase de absoro (absorbing), a informao preenchida

que foi armazenada numa varivel P quebrada em pedaos
Pi de tamanho r e depois inseridos na matriz de estado S.
Aps a insero dos valores na matriz so realizados as
permutaes, ou seja, a funo de permutao Keccak-f[b](S)
aplicada dos valores de S. Esta fase realizada enquanto
existir blocos Pi que ainda no foram aplicados pela funo
de permutao.
Figura 1. Pseudocdigo de Keccak-f[b].
O parmetro r define o tamanho de que cada bloco ir ter

depois da informao ser quebrada em P pedaos de tamanho
r. necessrio quebrar a informao pois o algoritmo no
pode ser aplicado (ou no desejvel por questes de
segurana) em uma informao inteira caso ela seja muito
grande, mas sim em pequenos blocos.
O parmetro c afeta o desempenho do algoritmo e na
segurana do hash gerado, onde quanto maior o valor de c,
mais seguro o hash gerado, porm exige mais desempenho da
mquina.
A soma dos parmetros r e c define o nmero da largura
da permutao escolhida, por exemplo, na permutao 1600,
um dos valores adotados adotado r = 1024 e c = 576, onde r
Por fim, na fase de compresso (squeezing) definido

hLength, varivel que define o tamanho do hash que ser
gerado, e ento realizada a concatenao dos blocos de
dados j permutados da matriz S, que gera uma string Z que
o valor hash (com formato hexadecimal) da informao de
entrada que tem tamanho em bits determinado por n. A fase
de compresso realizada enquanto hLength r > 0, que
executa novamente a funo de permutao sobre a matriz S e
concatena os valores em Z novamente caso a condio seja
verdadeira. Isto faz com que a gerao do hash seja ainda
mais aleatria, prevenindo ataques que geram colises [7].
147
ISSN 977-2177-128009
V.
SMART CARDS
Um dos critrios de seleo do algoritmo SHA-3

vencedor sua capacidade de execuo em plataformas com
recursos limitados de processamento e memria e entrada e
sada de dados. Neste sentido neste trabalho foi explorado o
algoritmo Keccak em plataformas de tipo smart cards.
Smart cards so, basicamente, circuitos integrados
incorporados em cartes, geralmente com seu corpo de
plstico e com dimenses de um carto de crdito [9].
Existem cartes com outras dimenses, como por exemplo, o
carto SIM, ou ento carto GSM-SIM, muito utilizado
atualmente em dispositivos de telefonia mvel, para a
identificao, controle e armazenamento de dados.
Alm dos cartes SIM, outra rea que vem expandindo o
uso de smart cards so as corporaes bancrias, que esto
distribuindo cartes de crdito com smart cards incorporados,
juntamente com a j conhecida tarja magntica.
Comumente possvel encontrar trs tipos de smart
cards: (i).Cartes de memria(tm apenas a capacidade de
armazenar, alterar e excluir dados, onde a comunicao com
o carto feita atravs de contatos metlicos existentes sobre
o corpo do carto). (ii).Cartes microprocessados (contm um
processador incorporado capaz de executar operaes e
aplicativos. A comunicao similar aos cartes de
memria). (iii).Cartes sem contato (podem ser do primeiro
ou segundo tipo, porm possui uma antena incorporada em
seu corpo de plstico e sua comunicao feita atravs de
radio frequncia).
possvel encontrar cartes que suportam ambos os tipos
de interface, ou seja, podem trabalhar tanto com os contatos
metlicos, como atravs de radio frequncia. Estes cartes
so denominados como smart cards de interface dupla (dual
interface). Independentemente do tipo de smart card
escolhido, todos eles compartilham o mesmo destino, que a
de possuir um hardware limitado.
criados na linguagem Java (com uma verso mais restrita de

recursos de programao). A tecnologia Java Card define um
ambiente de execuo Java Card (JCRE) e fornece classes e
mtodos para auxiliar no desenvolvimento de aplicaes [10].
A tecnologia consequentemente tambm disponibiliza a
possibilidade de execuo destes aplicativos inseridos no
smart card, atravs do Java Card Virtual Machine (JCVM)
contido no mesmo.
Alguns dos benefcios do uso da tecnologia que a
mesma fornece facilidade para o desenvolvimento de
aplicaes, pois usa uma linguagem de alto nvel, traz vrios
mecanismos de segurana, como por exemplo, o uso de
firewalls para separar cada aplicao (impedindo uma acessar
outra indevidamente), possui uma independncia ao
hardware, podendo ser executado um aplicativo em qualquer
smart card com suporte a plataforma Java Card, e possvel
armazenar e controlar mltiplas aplicaes dentro de um
mesmo carto [10].
O JCRE define o ambiente de execuo Java Card e tem a
responsabilidade de gerenciar e disponibilizar recursos do
smart card e tambm gerencia a execuo dos applets, ou
seja, ele basicamente o Sistema Operacional do carto. O
JCRE consiste na mquina virtual Java Card (JCVM), nas
Java Card APIs, aplicaes especficas do fabricante e das
classes do sistema do JCRE.
A JCVM tem a funo de executar os applets, controlar a
alocao da memria e gerenciar os objetos instanciados,
alm de prover recursos para executar os applets
independentemente do hardware em questo [10].
A linguagem utilizada para o desenvolvimento dos
aplicativos a linguagem Java, porm com restries de
recursos no suportados pelo smart card, devido ao fato de
que o mesmo dispe recursos limitados de processamento e
memria. Na Tabela I so citados alguns dos recursos
suportados e no suportados em cartes que utilizam a
plataforma Java Card 2.2 [10].
Smart cards com poder de processamento geralmente

possuem processadores com barramento de 8 ou 16 bits, e sua
memria EEPROM capaz de armazenar algumas centenas
de Kbytes. Existem tambm smart cards que possuem
memrias ROM e RAM, alm da EEPROM.
TABELA I. RECURSOS JAVA SUPORTADOS E NO SUPORTADOS
Recursos suportados
Tipos boolean, byte e short
Arrays unidimensionais
Pacotes Java
Classes,interface e exceptions
A comunicao entre o carto com contato e o host feita

atravs da troca de blocos de dados (bytes) entre si, onde
estes blocos so definidos como Protocolo de Unidade de
Dados (APDU), que o protocolo no nvel de aplicao
definido pela ISO 7816.
Muitos cartes atualmente vm com a tecnologia Java
Card incorporada, o que facilita na criao e controle de
aplicaes (bancrias, de segurana, telecomunicaes, entre
outras) pelos desenvolvedores, pelo fato de que tais
aplicaes so criadas com a utilizao da linguagem Java,
sem a necessidade de se utilizar linguagens especificas para a
implementao de aplicativos em smart cards.
VI.
JAVA CARDS
Java card a tecnologia que permite que os smart cards

sejam capazes de executar pequenos aplicativos (applets)
Recursos no suportados
Theads
Tipos double, float e long
Arrays Multidimensionais
Char e Strings
VII. DESENVOLVIMENTO DO KECCAK

Para o desenvolvimento da funo na linguagem Java
utilizou-se o ambiente de desenvolvimento Eclipse SDK
v3.6.1 juntamente com o plugin JCOP Tools. O JCOP Tools
um plugin de desenvolvimento de aplicativos para smart
cards, que foi desenvolvido pela IBM e atualmente
distribuda pela NXP. Este plugin disponibiliza um ambiente
funcional e amigvel para o desenvolvimento e
gerenciamento de aplicativos em dispositivos smart cards,
148
ISSN 977-2177-128009
TABELA II. MEMRIAS UTILIZADAS PELA FUNO KECCAK
como por exemplo, a facilidade de importao, remoo e

execuo de aplicativos nos smart cards.
Alm disso, ele disponibiliza um simulador de smart
cards, possibilitando desenvolver e testar aplicativos sem ter
fisicamente um carto conectado a uma leitora. O JRE
utilizado e suportado pelo JCOP Tools a verso 1.5.0.16. O
plugin JCOP Tools fornece o depurador de erros para os
programas Java, o compilador de bytecodes, o conversor para
arquivos CAP e o JCOP Shell, utilizado para enviar
comandos APDU ao carto.
A implementao da funo de permutao utilizada neste
projeto foi desenvolvido por Guido Bertoni, Joan Daemen,
Michal Peeters e Gilles Van Assche, criadores da funo
Keccak na linguagem C, onde esta funo foi convertida para
a linguagem Java e adaptada para suportar a tecnologia Java
Card bem como suas restries. A funo de esponja foi
desenvolvida de acordo com o pseudocdigo da funo e
baseando-se na funo implementada em python.
A funo de esponja Keccak[r = 144, c = 256] recebe uma
informao armazenada num vetor M de tamanho varivel
como entrada e gera um hash de sada de tamanho fixo. O
tamanho do hash gerado na implementao de 224 bits. A
funo de esponja realiza o tratamento da informao de
entrada, para que ela seja aplicada pela funo de permutao
e assim seja gerado o hash. Este tratamento realizado em
trs etapas: preenchimento (padding), absoro (absorbing) e
por fim a compresso (squeezing).
A funo de permutao Keccak-f [400] recebe o vetor de
estados S j preenchida e absorvida pela funo de esponja.
Esta funo tem o objetivo de manipular os blocos de
informaes atravs de operaes lgicas, rotaes de bits e
troca de posies para gerao do hash de uma informao.
Pelas limitaes da tecnologia Java Card, podemos apenas
utilizar variveis do tipo byte e short, que equivalem a 8 e 16
bytes respectivamente. Por este motivo, foi definido
utilizarmos a permutao b = 25 * w, onde w o
comprimento mximo da palavra, que no caso 16 e que
reflete na permutao 400 e consequentemente define o
nmero de rodadas nr = 20. A funo possui cinco etapas
com propsitos especficos, que so: , , , e , onde estas
etapas sero executas 20 vezes.
VIII. RESULTADOS
Nesta seo apresenta-se a anlise da funo Keccak e seu
comportamento dentro do carto. Em termos de operaes
lgicas, a funo Keccak-f [b] utiliza aproximadamente [7]:
76nr XORs, 25nr ANDs, 25nr NOTs e 29nr rotaes

de b bits.
Utilizando o Keccak-f [400], temos (nr = 20):
1520 XORs, 500 ANDs, 500 NOTs, e 580 rotaes de

16 bits.
Em relao utilizao de memria, tem-se os dados

representados na Tabela II. Estes valores foram retirados com
o uso de comandos dentro do carto, disponibilizados pelo
Java Card.
Tamanho da
Funo (bytes)
1275
EEPROM
Utilizado (bytes)
2579
RAM Utilizada
(bytes)
703
A funo Keccak ocupa 1275 bytes na memria

EEPROM do carto, onde este valor representa apenas o
tamanho da funo carregada na memria. Para utilizao nas
operaes da funo, so utilizados 2579 bytes da memria
EEPROM e 703 bytes da memria RAM.
Em relao ao tempo de execuo, a Tabela III descreve
uma comparao com as funes de hash MD5 e SHA,
ambas disponibilizadas pelo carto atravs da tecnologia Java
Card. Estas funes foram executadas dentro do carto,
passando um valor de entrada padro.
TABELA III. COMPARAO DE TEMPO DE EXECUO ENTRE TRS FUNES
DE HASH
Funo
MD5
SHA
Keccak
Tempo aproximado de
execuo (s)
0,194
0,196
37,392
Tamanho
hash (bits)
256
300
224
Como visto na Tabela III, o tempo aproximado que a

funo Keccak implementada neste projeto leva para executar
de aproximadamente 37 segundos, um tempo muito elevado
em relao s funes MD5 e SHA, que executam em menos
de 1/5 de segundo. Este tempo elevado indesejvel,
tornando seu uso (no momento atual) impraticvel para a
maioria das aplicaes reais.
importante salientar que a diferena apresentada entre
MD5 e SHA em relao ao Keccak justificada pelo fato que
o Keccak foi implementado no carto como uma aplicao
(applet) em memria EEPROM/RAM e os algoritmos MD5 e
SHA so dedicados e j presentes neste modelo de smart card
podendo fazer uso dos coprocessadores embarcados.
IX.
CONCLUSO
Durante a implementao da funo no foi encontrada

dificuldade para adapt-la tecnologia Java Card, em grande
parte por suportar a linguagem Java, o que facilitou a
implementao. Importante ressaltar que utilizou-se a
implementao da funo Keccak compatvel arquitetura do
carto (Keccak[400]), que diferente da funo proposta
para a competio (Keccak[1600]), que possui padres de
segurana maiores porm necessita de um hardware mais
robusto, o que seria invivel utilizar esta funo no carto.
Foi observado que a funo Keccak implementada neste
projeto requer muito tempo de processamento para gerar a
funo de hash, o que torna seu uso ineficaz. A otimizao da
funo deve ser realizada para diminuir este tempo e umas
das formas a utilizao da memria RAM efetivamente, que
fornece uma velocidade de acesso no mnimo 10x maior
comparado a memria EEPROM.
149
ISSN 977-2177-128009
Com o uso parcial da memria RAM foi possvel atingir

o tempo de processamento descrito na seo IX, que de
aproximados 37 segundos, contra os 60 segundos da primeira
tentativa de implementao, onde a funo apenas utilizava a
memria EEPROM. Com o uso efetivo da memria RAM e
uma melhor otimizao do cdigo acredita-se que o tempo de
execuo diminua consideravelmente e se torne aceitvel.
Uma das etapas deste trabalho foi a comparao dos
resultados obtidos com trabalhos correlatos. No trabalho de
Gouciem [11] foi realizado o estudo informaes referentes
aos ciclos/bytes (nmero de ciclos de clock para processar um
byte) e troughtput (quantidade de dados processados em
determinado tempo) para o microcontroladores comumente
utilizados em smart card.
No entanto, a implementao descrita neste projeto
executa de fato em smart cards, diferente do trabalho
correlato citado [11], que no especifica se realmente foi
implementado o algoritmo keccak em uma arquitetura de
smart card ou apenas foi aferido o nmero de ciclos para o
microcontrolador comumente adotado como unidade de
processamento e controle em smart card especficos.
Isto tem impacto direto nos resultados, uma vez que, no
considerados atrasos da arquitetura de dispositivo smart card
completo, como mecanismos de entrada e sada
(comunicao) e hierarquia de memrias. Elementos estes
que so considerados gargalos na maioria das arquiteturas, o
que pode levar a diferentes resultados nos quesitos memoria e
tempo de execuo.
REFERNCIAS
B. Schneier, Applied Cryptography: Protocols, Algorithms, and
Source Code in C. 2nd Edition. New York: John Wiley & Sons, 1996.
[2] A. J. Menezes, P. C. Oorschot, S. A. Vanstone, Handbook of Applied
Cryptography. CRC Press, 1996.
[3] X. Wang, D. Feng, X. Lai, H. Yu, Collisions for Hash Functions
MD4, MD5, HAVAL-128 and RIPEMD. Crypto04, 2004.
[4] X. Wang, Y. L. Yin, H. Yu, Finding Collisions in the Full SHA-1.
Crypto05, 2005.
[5] National Institute of Standards And Technology, Cryptographic Hash
Algorithm Competition. 2008, http://csrc.nist.gov/groups/ST/hash/sha3/index.html.
[6] G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, R. Van Keer,
Cryptographic Sponges. Novembro 2009, http://sponge.noekeon.org.
[7] G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, The Keccak
Reference. Version 3, Janeiro 2011, http://keccak.noekeon.org/keccakreference-3.0.pdf.
[8] G. Bertoni, J. Daemen, M. Peeters, G. Van Assche, The Keccak SHA3
Submission
Version
3,
Janeiro
2011,
http://keccak.noekeon.org/Keccak-submission-3.pdf.
[9] W. Rankl, Smart Cards Applications: Design Models for using and
programming smart cards. Chichester: John Wiley & Sons, 2007.
[10] Z. Chen, Java Card Technology for Smart Cards: Architecture and
Programmers Guide. Addison-Wesley, 2000.
[11] M. Gouciem, Comparison of Seven SHA-3 Candidates Software
Implementations
on
Smart
Cards.
Outubro
2010,
http://eprint.iacr.org/2010/531.pdf.
[1]
150
ISSN 977-2177-128009
Accurate Modeling to Characterize the Distributed

Substrate Effects in SiGe HBTs
, Roberto Murphy-Arteaga and Reydezel Torres-Torres
German Alvarez-Botero
galvarez@inaoep.mx, rmurphy@inaoep.mx, reydezel@inaoep.mx
Instituto Nacional de Astrofsica, Optica

y Electronica (INAOE)
Deparment of Electronics
Tonantzintla, Puebla, Mexico
AbstractThe applicability of a distributed RC model for

representing the substrate parasitic effects in a SiGe HBT is
demonstrated in this paper. In addition, the corresponding parameter extraction from S-parameter measurements is proposed,
allowing to achieve excellent model-experiment correlation of the
electrical behavior of the devices output characteristics up to
40 GHz.
Index TermsHeterojunction Bipolar Transistor, Distributed
model, Parameter Extraction.
I. I NTRODUCTION
The requirements of modern communication systems put
stringent demands on semiconductor technologies for providing performance at a low cost [1]. BiCMOS technology based
on SiGe heterojunction bipolar transistors (HBTs) provides
an attractive solution to address these exigencies, due to the
inherent properties of the SiGe HBTs, such as low noise,
high linearity, and low power consumption. However, these
remarkable characteristics can be considerably degraded in the
microwave range due to the influence of the substrate parasitic
effects, which become more important as the operation frequency rises. For this reason, for advanced RF circuit design,
the impact of the substrate effects must be correctly accounted
for in the modeling of the HBTs.
For an adequate modeling of the substrate effects, it is
necessary developing physically based models according to
the device structure. For instance, in the present analysis it
is important considering the buried layer (n+) (referred to in
subsequent as the sub-collector), the depletion region of the
sub-collector-substrate junction, the resistive nature of the bulk
substrate (p), and the channel stopper (p). A cross-section
of the HBT under study showing this structure is presented in
Fig. 1.
Substrate contact
p+
Base contact
Emitter contact
SiGe:C layer
Collector contact
n+
p
n+
n-
Sub-collector / Buried layer (n+)

channel stopper (p)
Bulk substrate (p-)
Fig. 1.
Depletion region
Sketch of a HBT cross section showing its physical structure.
Usually, a lumped RC equivalent circuit model is used to

represent the substrate characteristics. This is shown at the
bottom of Fig. 2, which considers the depletion capacitance

formed at the bottom of the sub-collector-substrate junction
by means of Csub . Connected in series to this capacitance, the
influence of the inner substrate resistance, and the additional
associated with the channel stopper and the substrate contact
resistance are taken into account by Rsub . This lumped approach is widely used; however, it is restricted for application at relatively low frequencies since when the frequency
increases it is not accurate enough, yielding errors when
representing the output impedance of the transistor and the
actual gain [2].
Emitter contact
Base
contact
Substrate
contact
p+
Re
n+
p
Cbe
Rb
Rbci ie
Cbcx
Cbci
Rbcx
nRsub
Ysub
Rbe
Rbi
Collector
contact
Rc
n+
Csub
pLumped model for

the substrate parasitics
Fig. 2. Sketch of a HBT cross section and its corresponding equivalent

circuit model considering a lumped model for the substrate network.
The frequency limitations of the typical HBT model are

associated to the fact that it neglects the peripheral capacitance of the sub-collector-substrate junction and the resistive
behavior of the bulk substrate, resulting in significant errors
at high frequencies.
Bearing in mind the previously exposed arguments, in this
paper a physically-oriented model for represent the substrate
electrical characteristics, taking into account the distributed
effects of a HBT is proposed. In addition, the corresponding
parameter extraction methodology is developed, obtaining a
significant improvement on the output characteristics of a HBT
in up to 40 GHz.
151
ISSN 977-2177-128009
II. E XPERIMENT
On-wafer two-port S-parameter measurements up to
40 GHz, using a vector network analyzer (VNA) and groundsignal-ground (GSG) coplanar RF probes with a pitch of
100 m, were performed to a common-emitter SiGe HBT,
fabricated on p-type Si substrate in a 0.13 m BiCMOS
technology. Using an off-wafer LRM (line-reflect-match) procedure, and an impedance-standard-substrate, the equipment
was previously calibrated up to the probe tips, establishing a
reference impedance of 50 .
Operating the transistor under cold-HBT condition, which
is defined as the condition when the emitter-base junction
and base-collector junction are zero biased, and therefore
both junctions are depleted, results in a simplified equivalent
circuit that allows an accurate characterization of the substrate
effects. Thus, the device under test (DUT) was biased at
VBE = VBC = 0 in order to obtain the S-parameters used
for developing the model proposed in this work. Afterwards,
the experimental data were de-embedded from pad parasitics
by applying a three-step procedure and the measurements
collected to open and a short dummy structures. [3].
the influence of Rbi can be neglected in this case, simplifying

the equivalent circuit to that illustrated in Fig. 4.
Port 1
Port 1
Cbe
Port 2
where Cbc considers the combined effect of Cbcx and Cbci .

Then, the constitutive parameters of the equivalent circuit
illustrated on Fig. 4, can be extrated using the experimental
Y-parameters, obtained from the corresponding transformation
of the measured S-parameters. In this case, the following
expressions can be written:
Im(Y11 + Y12 ) = Cbe
Im(Y12 ) = Cbc
(1)
(2)
Thus, plotting (1) and (2) versus , the values of Cbe and
Cbc can be obtained from the respective slopes, as illustrated
in Fig. 5.
Im (-Y12) (mS)
Im (Y11+Y12) (mS)
15
Experimental data
Linear regression
10
5
Slope:
Cbe = 37.85 pF
0
4
Slope:
Cbc = 11.97 pF
0
0
10
20
30
40
f (GHz)
Cbci
Rbi
Ysub
Fig. 4. Simplified HBT model in common emitter configuration used for

derive the proposed substrate model.
Cbcx
Base
Cbe
Collector
Emitter/Substrate
III. P ROPOSED M ODEL AND R ESULTS

The analysis presented here is based on the equivalent circuit shown in Fig. 2, which consist of the extrinsic resistances
Rb , Rc , and Re ; the dynamic resistances Rbe , Rbcx and Rbci ;
the intrinsic base resistance Rbi ; the junction capacitances Cbe ,
Cbcx , and Cbci ; the substrate admittance Ysub , and the intrinsic
current gain represented by ie [4]. Thus, considering that
when the HBT is biased in a cold condition the following
conditions occur: i) no potential drop is present at the contacts,
ii) all dynamic resistances present very large values, (i.e., no
transfer current exists), and iii) there is no current gain (i.e.
= 0). Therefore, in this case, the influence of the extrinsic
and dynamic resistances, as well as the current source can
be neglected in the HBT equivalent circuit, resulting in the
simplified model presented in Fig. 3.
Cbc
Base
Collector
Ysub
Fig. 5.
Port 2
In accordance with the model shown in Fig. 4, the substrate

admittance (Ysub ) is related with the experimental data as:
Ysub = Y12 + Y22
Emitter/Substrate
Fig. 3. Small-signal equivalent circuit model for a SiGe HBT biased at
VBE = VBC = 0.
Additionally, for a bipolar transistor fabricated on a highresistivity substrate, the value of the intrinsic base resistance is
much lower than Re(Zsub ), where Zsub = 1/Ysub . Therefore,
Linear regressions used to determine Cbe and Cbc .
(3)
For modeling the substrate admittance, it is important

carefully analyzing the sub-collector-substrate junction. In this
regard, notice in Fig. 1 that the boundary conditions between
the depletion region and the bulk substrate changes as the
channel stopper is closer. This implies that the peripheral
capacitance differs from the bottom capacitance depending on
the proximity of the channel stopper, and also on the variation
152
ISSN 977-2177-128009
of the potential along the depletion region, influencing also

the local resistance of the bulk substrate. In consequence, in
order to accurately modeling the substrate parasitics in a HBT,
it is necessary considering the corresponding effects using a
distributed model, as illustrated in Fig. 6.
Substrate
contact
Emitter contact
Rbe
Rbi
Cbcx
Rbcx
p+
n+
p
Cbe
Rbci ie
Cbci
n+
n-
csub
rsub
Collector
contact
csub
csub
rsub
p-
rsub
Ysub
Taking into account that S22 parameter is directly related

with the output of the HBT and strongly affected by the
substrate parasitics, the usefulness of the proposed model and
extraction method is verified after performing a simulation
using the equivalent circuit shown in Fig. 4, and evaluating
the accuracy of the lumped and distributed models for S22 . As
shown in Fig. 8, a very good correlation between simulated
and experimental data is achieved when the distributed model
is used, reproducing both magnitude and phase and allowing
to verify the accuracy and consistency of the proposal for
representing the HBT-output characteristics.
Mag of S22 (dB)
Base
contact
IV. M ODEL V ERIFICATION
Distributed model for

the substrate parasitics
Experimental data
Distributed model
Lumped model
-1
-2
Phase of S22 (deg)
Fig. 6. Sketch of a HBT cross section and its corresponding equivalent

circuit model considering a distributed model for the substrate network.
From [5][7], it is possible obtaining an analytical expression for Ysub , this is:
Ysub =
jcsub
tanh( jrsub csub )
rsub
The resulting model, based on (5) and (6), is shown in

Fig. 7, where it is possible to notice a good correlation with
the experimental data.
Re (Ysub) (mS)
50
Experimental data
Distributed model
40
30
20
1/2
Re(Ysub) = (2rsubcsub) /rsub
-12
Im (Ysub)/ (S/rad 10 )
10
0
40
Im(Ysub)/ = csub
-40
0
(4)
where the total distributed resistance and capacitance, rsub and

csub respectively, are given by:
2rsub csub
Re(Ysub ) =
(5)
rsub
Im(Ysub )/ = csub
(6)
-20
10
20
30
40
f (GHz)
Fig. 8. Comparison between experimental and simulated data for the S22 parameter using a distributed network for modeling the substrate parasitics in
a SiGe HBT.
V. C ONCLUSIONS
A distributed model for representing the substrate parasitic
effects in a SiGe HBT has been proposed and analyzed. Also
an analytical extraction method to determine its constitutive
parameters, extracted from S-parameters measurements has
been proposed. A very good simulation-experiment correlation
for the output electrical characteristics of the HBT up to
40 GHz has been obtained, which is primordial for accurate
circuit behavior prediction and circuit optimization.
The proposed model represents an important contribution in
the field of physics-based equivalent circuit modeling since it
helps to understand the substrate effects. Thus, the proposal
can be used for improving the integrated circuit design,
evaluating the process technology or optimizing the device
structure.
20
VI. ACKNOWLEDGEMENTS
csub= 22 pF, rsub = 5 k

0
0
10
20
30
40
f (GHz)
Fig. 7. Frequency dependence of the substrate parasitics for a SiGe HBT

under VBE = VBC = 0 bias condition.
The authors acknowledge IMEC, Leuven, Belgium for supplying the test structures. They also acknowledge the partial
support of this project by CONACyT through Grant 83774Y, and the scholarship to undertake doctoral studies number
213292.
153
ISSN 977-2177-128009
R EFERENCES
[1] A. Joseph, J. Dunn, G. Freeman, D. Harame, D. Coolbaugh, R. Groves,
K. Stein, R. Volant, S. Subbanna, V. Marangos, S. Onge, E. Eshun,
P. Cooper, J. Johnson, J. Rieh, B. Jagannathan, V. Ramachandran,
D. Ahlgren, D. Wang, and X. Wang, Product applications and technology
directions with SiGe BiCMOS, IEEE Journal of Solid-State Circuits,
vol. 38, no. 9, pp. 14711478, Sep. 2003.
[2] S. Fregonese, D. Celi, T. Zimmer, C. Maneux, and P. Sulima, A Scalable
Substrate Network for Compact Modelling of Deep Trench Insulated
HBT, Solid-State Electronics, vol. 49, no. 10, pp. 16231631, Oct. 2005.
[3] R. Torres-Torres, R. Murphy-Arteaga, and J. A. Reynoso-Hernandez,
Analytical Model and Parameter Extraction to Account for the Pad
Parasitics in RF-CMOS, IEEE Transactions on Electron Devices, vol. 52,
no. 7, pp. 13351342, 2005.
[4] M. Reisch, High-Frequency Bipolar Transistors, 1st ed. Springer, 2003.
[5] E. Abou-Allam and T. Manku, A Small-Signal MOSFET Model for
Radio Frequency IC Applications, IEEE Transactions on ComputerAided Design of Integrated Circuits and Systems, vol. 16, no. 5, pp.
437447, 1997.
[6] , An Improved Transmission-Line Model for MOS Transistors,
IEEE Transactions on Circuits and Systems-II, vol. 46, no. 11, pp. 1380
1387, 1999.
[7] M. Vaidyanathan and D. L. Pulfrey, Extrapolated fmax of Heterojunction
Bipolar Transistors, IEEE Transactions on Electron Devices, vol. 46,
no. 2, pp. 301309, 1999.
154
ISSN 977-2177-128009
Extraction Methodology of the Substrate Parasitic

Network of an RF-MOSFET with Separate Substrate
DC Connection
Fabin Zrate-Rincn1, Germn lvarez-Botero2, Reydezel Torres-Torres3 and Roberto Murphy-Arteaga4
1
fabian_zar@inaoep.mx, 2galvarez@inaoep.mx, 3reydezel@inaoep.mx, 4rmurphy@inaoep.mx

Instituto Nacional de Astrofsica, ptica y Electrnica (INAOE)
Deparment of Electronics
measurement data, but gate resistance is ignored. The circuit

used in this case is extremely simplified based on certain
considerations [3]. Initially, the equivalent circuit assumes that
the source and drain junction capacitances are equal. This is
also assumed for the gate-to-source and gate-to-drain
capacitances. Furthermore, the effect of the substrate is
represented by a single resistor. Also, source and drain are
shorted. On the other hand, the substrate parasitic network in
[4] and [5] is based on the series connection of a resistor and a
capacitor, which means that the junction capacitance between
source and substrate is neglected. In this paper, a method to
directly extract substrate resistance and capacitances is
proposed. Unlike [3], the capacitances are independently found,
and the gate resistance is taken into account. The model was
verified using an RF-MOSFET with a separate substrate DC
connection, obtaining a very good correlation.
AbstractIn this paper, a methodology to extract the parasitic

substrate network of an RF-MOSFET with a channel length of 80
nm and a width of 3 m is proposed. In this transistor, the source
is disconnected from the active area, which is the substrate
connection. The device presents a separate substrate DC
connection, which allows varying the bulk potential.
A
mathematical formulation to find the source junction capacitance
in the high-frequency regime, as a function of bulk potential, is
presented for the first time. This capacitance influences the
behavior of the output port of the device at high frequencies. In
fact, excellent agreement between measurement and model was
obtained up to 20 GHz. In addition, this proposal was used to
characterize RF-MOSFET devices using two-port S-parameter
measurements without having the source and substrate
connected. This was possible because all capacitances were
determined independently, which allowed us to determine an
accurate electrical model, which makes it possible to simulate the
performance of the device when varying the substrate potential in
the RF regime.
II.
EXPERIMENT
Index terms RF MOSFET, parameter extraction, substrate

parasitic network.
I.
INTRODUCTION
RF-MOSFETs have been increasingly used for high

frequency applications, and thus the need for accurate models
to describe the MOSFET in this regime is evident. In these, the
substrate parasitic network is a critical element in their
characterization and modeling. Furthermore, it is not always
possible to connect the source to substrate in most applications.
For example, the substrate terminal of one of the two
transistors in a cascode amplifier is not shorted to the source
[1]. Also, the substrate resistance dramatically influences the
operation of the circuit in both common-gate and commondrain topology. Therefore, it is essential to fully characterize
this type of transistor. Therefore, an extraction methodology
that considers the separate substrate terminal is required.
Fig. 1. Layout of the device under test (DUT) consisting of an RF-MOSFET

surrounded by the pads in ground-signal-ground (GSG) configuration.
Several recent articles have focused on the extraction of the

substrate parasitic network [2-3]. In reference [2], all
parameters are extracted from three-port S-parameter
A Vector Network Analyzer (VNA) and ground-signalground coplanar Cascade probes of 100 m pitch were
155
ISSN 977-2177-128009
employed to measure the on-wafer two-port S-parameters, as

shown in Fig. 1. The applied power was 10 W. Before device
measurements, the system was calibrated using the LRM
procedure (Line - Reflect - Match). The device under test
(DUT) was an RF-NMOS with Lm = 80 nm, Wf = 3 m and NF
= 64. Additionally, this transistor has a substrate terminal,
which is disconnected from the source. It allows for a DC bias
to be applied to the substrate while performing the RF
measurements. The device was turned off to allow us to
determine the substrate parasitic network.
Im(Y11 ) C gs + C gd + C gb
Z1 =
The equivalent circuit without neglecting the substrate

parasitic parameters is shown in Fig. 2(a), without considering
the intrinsic parameters. Cjs and Cjd are the source and drain
junction capacitances; Cgs, Cgd and Cgb represent the gate-tosource, gate-to-drain and gate-to-substrate capacitances,
respectively; Rsb, Rdb, Rdsb1 and Rdsb2 correspond to resistances
from source and drain to substrate. Since Rdsb1 and Rdsb2 are
very small, the circuit can be simplified to that shown in Fig.
2(b). Furthermore, as is shown [2] Rdsb1 and Rdsb2 are about 2.5
% of Rsb and Rdb, and thus, Rsb and Rdb can be placed in parallel,
which is indicated as Rb in the figure.
Z2 =
Z3 =
PROPOSED METHOD AND RESULTS
Fig. 3 illustrates the linear regression of experimental data

in order to find the elements of the substrate parasitic network.
Initially, the Cgd, Cjd, Cgs, and Cgb capacitances are obtained
using (1) through (4), and then resistances Rg y Rb are
determined with (5) and (6).
Im(Y22 ) C gd + C jd
)2
(5)
(6)
1
1
1
1
jC gd C gb
+
+
C gd C jd C gb
Im
1
1
1
jC gd C jd
+
+
C gd C jd C gb
1
1
1
1
jC gbC jd
+
+
C gd C jd C gb
1
jC gs
1
1
1
1
1
Z
+
1 Z4
+ Z2
Y22
(2)
156
(7)
Z 4 = Rg
(1)
(4)
The Y22 admittance and Z22 impedance were adjusted up to

4 GHz, since at these frequencies the effect of the Cjs junction
capacitance is not appreciable. This is due to the parallel
combination of Cjs and Rb in the RF-MOSFET equivalent
circuit, in the off bias condition. In Fig. 3 the capacitive
behavior of Y22 can be observed, as well as in the inverse of Z22.
On the other hand, Cjs was determined at frequencies above 6
GHz from Y'. The admittance Y' represents the reduction of the
equivalent circuit of Y22 (see Fig. 4(a)) by subtracting all
elements up to Cjs. For this reason, a delta-star transformation
of impedances was performed at nodes 1 to 3, as shown in Fig.
4(b). At this point, it is important to remark that the source and
substrate terminals are connected to the same AC node because
the DC bias between the source and substrate is applied
through a compensated CASCADE DC probe. Thus, the
components were then determined using (7) through (11). As
can be observed, the experimental data and the simulation
results were in agreement up to 20 GHz.
b)
Im(Y12 ) C gd
(3)
Re(Y22 ) 2 Rb C jd 2
Fig. 2. Equivalent circuit of an RF-MOSFET when it is turned off and the

intrinsic parameters are neglected. In a) all substrate parameters are
considered and b) Rdsb1 and Rdsb2 resistences are very small and not taken into
account in the scheme, where Rb is RsbRdb.
III.
Re(Y11 ) 2 Rg C gs + C gd + C gb
Before interpreting the measured data, the pad parasitic

effects were removed using the de-embedding technique
outlines in [6] and [7], for which two additional dummy
structures were measured. These were an open and a short-all.
a)
C gs C gd
1
C jd +
Im(Z 22 )
C
gs + C gd
Z3
(8)
(9)
(10)
= Im(Y ') = C js
ISSN 977-2177-128009
(11)
Experimental data
Linear regression
Im (Y22) (mS)
- Im (Y12) (mS)
f = 20 GHz
4
-15
Slope: Cgd = 60.2 x10
(F)
10
12
-15
Slope: Cgd+Cjd=155.5 x 10
0
0.0
14
f = 4 GHz
2
1
0
0
Experimental data
Linear regression
0.5
1.0
1.5
10
(rad x 10 )
(F)
2.0
2.5
10
(rad x 10 )
-15
Slope: Cjd + [(CgsCgd)/(Cgs+Cgd)] = 132.4 x 10
16
(F)
Im (Y11) (mS)
- 1/ [Im (Z22) ] (mS)
f = 4 GHz
Experimental data
Linear regression
Experimental data
Linear regression
12
f = 20 GHz
8
4
-15
Slope: Cgd+Cgs+Cgb = 121.0 x 10
(F)
0
0.0
0.5
1.0
1.5
2.0
2.5
10
12
14
10
(rad x 10 )
10
(rad x 10 )
5.4
-4
Experimental data
Linear regression
Re (Y22) ( x 10
Re (Y11) ( x 10
-4
4
7.2
f = 20 GHz
3.6
1.8
-26
Slope: (Cgd+Cgs+Cgb) Rg = 4.35 x10
F)
0.0
0
20
40
60
80
2
100
2
120
140
Experimental data
Linear regression
f = 4 GHz
2
1
F)
160
20
(rad x 10 )
20
(rad x 10 )
12
Im (Y') (mS)
-25
Slope: Cjd Rb = 5.3 x10
Experimental data
Linear regression
f = 20 GHz
f = 6 GHz
6
-15
Slope: Cjs = 98.2 x10
(F)
0
0
10
12
14
10
(rad x 10 )
Fig. 3. Linear regression of experimental data for the parameter extraction of an RF-NMOS device, with Lm=80nm, Wf=3um and NF=64. Also, Vgs, Vds, Vbs are
equal to 0V.
TABLE I.
EXTRACTED PARAMETER VALUES
Parameter
Cgd
Cjd
Cgs
Cgb
Rg
Rb
Cjs
Extracted value
60.2
102.5
62.2
3.0
2.8
50.1
98.2
Unit
fF
fF
fF
fF
fF
a)
b)
Fig. 4. The equivalent circuit for Y22. The nodes are drain (D), source (S) and
substrate (B). Also, Vgs, Vds, Vbs are equal to 0V.
157
ISSN 977-2177-128009
Table I lists, in order of extraction, the parameters obtained

from linear regressions of the experimental data. It is clear that
Cjs y Cjd are of the same order of magnitude, which is also true
for Cgs y Cgb. On the other hand, Cgb is only 5% of Cgs y Cgb.
Magnitude (S22) (dB)
S11 simulation values will not match the experimental data, as

is shown in Figure 9.
Phase(S22) (Deg.)
Figures 5 through 7 show the comparison of S-parameters

between experimental data and simulations of the proposed
model (with Cjs and Cgb) and of the model in [3] and [4]
(without Cjs and Cgb). Here, it is assumed that S12 is equal to
S21. In addition, significant differences are observed in S22
above 6 GHz. This allows neglecting the Cjs capacitance in the
low-frequency range. Therefore, this capacitance can be
obtained at higher frequencies.
0.0
0
-2
-4
Experimental data
With Cjs and Cgb
0
-30
Without Cjs and Cgb
-60
-90
0
-0.4
-0.8
Experimental data
With Cjs and Cgb
-1.2
Without Cjs and Cgb
12
16
20
Frequency (GHz)
Fig. 7. Experimental and simulated data with/without Cjs and Cgb for the
magnitude and phase of S22. Also, Vgs, Vds, Vbs are equal to 0V.
-60
-90
0
12
16
20
Frequency (GHz)
-2.1
-2.4
-2.7
0
-15
12
-30
-45
Experimental data
With Cjs and Cgb
-60
Without Cjs and Cgb
14
16
18
20
Frequency (GHz)
Fig. 8. Experimental and simulated data of the proposed model and of the
model in [2] for S22. Also, Vgs, Vds, Vbs are equal to 0V.
90
0.0
60
30
Phase(S12) (Deg.)
Experimental data
Proposed model
Model in [2]
-30
Phase(S11) (Deg.)
-1.8
0
0
0
12
16
20
Frequency (GHz)
-0.8
Experimental data
Proposed model
Model in [2]
-1.2
Figure 8 compares the proposed model to the one presented in

[2] for S22. The considerations in [2], Cgs=Cgb and Cjs=Cjd, do
not allow for a precise adjustment of S22 at high frequencies,
whereas the proposed model does. In addition, Rg. is not taken
into account. In fact, Rg does not affect the extraction of most
of the parameters of the substrate parasitic network except for
Cjs, as shown in equations (10) and (11). If Rg is neglected, the
-0.4
12
16
20
Frequency (GHz)
Fig. 9. Experimental and simulated data of the proposed model and of the
model in [2] for S11. Also, Vgs, Vds, Vbs are equal to 0V.
158
ISSN 977-2177-128009
IV.
REFERENCES
CONCLUSIONS
A simple methodology to extract the substrate parasitic

network was herein presented. In turn, this method was used to
characterize devices in which the source and substrate are
disconnected. In this case, there is a separate substrate DC
connection. Two-port S-parameters measurements were
performed on this type of devices, demonstrating that the
proposed model agrees very well with experimental data. A
clear advantage of this method is that capacitances Cgs, Cgd, Cjs
and Cjd can be independently determined, allowing for a deeper
knowledge of the parasitic components present in an RFMOSFET. This proposal provides a simple characterization
methodology based on two-port S-parameter measurements in
a new RF-MOSFET structure with separate substrate DC
connections. This allows the designer to simulate complex
circuits in which the substrate and the bulk are biased at
different potentials.
[1]
[2]
[3]
[4]
[5]
[6]
ACKNOWLEDGMENT
The authors acknowledge IMEC, Leuven, Belgium for
supplying the test structures. They also thank CONACyT,
Mxico, for the partial support of this project through Grant
83774-Y, and the scholarships awarded to undertake master
and doctoral studies, numbers 375862 and 213292,
respectively.
[7]
159
Choong-Yul Cha, Jin-Pil Kim, and Sang Gug Lee, Small-Signal

Substrate Resistance Effect in RF CMOS Cascode Amplifier, IEEE
Microwave and Wireless Components Letters, vol. 13, No 7, 2003, pp.
253-255.
I. M. Kang, J. Duk Lee, and H. Shin, Extraction of -Type Substrate
Resistance Based on Three-Port Measurement and the Model
Verification up to 110 GHz, IEEE Electron Device Letters, vol. 28, No
5, 2007, pp. 425-427.
J. Han, Minkyu Je, and H. Shin, A Simple and Accurate Method for
Extracting Substrate Resistance of RF MOSFETs. IEEE Electron
Device Letters, vol. 23, No. 7, 2002, pp. 434-436.
N. Srirattana, D. Heo, H. M. Park, A. Raghavan, P. E. Allen, and J.
Laskar, A New Analytical Scalable Substrate Network Model for RF
MOSFETs, IEEE MTT-S Digest, vol. 4, 2004, pp. 699702.
Y. Cheng and M. Matloubian, On the high-frequency characterstics of
substrate resistance in RF MOSFET, Electron Device Letters, vol. 21,
no. 12, 2000, pp. 604606.
R. Torres-Torres, R. Murphy-Arteaga, and J. A. Reynoso-Hernndez,
Analytical Model and Parameter Extraction to Account for Pad
Parasitics in RF-CMOS, IEEE Transactions on Electron Devices, vol.
52, No. 7, 2005, pp. 1335-1342.
H. Cho and D. E. Burk, A Three-step method for the deembedding of
high-frequency s-parameter measurements, IEEE Transactions on
Electron Devices, vol. 38, No. 6, 1991, pp. 1371-1375.
ISSN 977-2177-128009
Mtodo modificado de hiperesferas aplicado a

homotopas biparamtricas: simulacin de circuitos con
transistores bipolares
R. Castaeda-Sheissa1, H. Vzquez-Leal1, A. Yildirim2,3, U. Filobello-Nio1, A. Sarmiento-Reyes4, L. HernndezMartnez4
1
Facultad de Instrumentacin Electrnica; 2 Department of Mathematics; 3Departament of Mathemtics and Statistics; 4Departamento
de Electrnica
1
Universidad Veracruzana; 2Ege University; 3 University of South Florida; 4Instituto Nacional de Astrofsica, ptica y Electrnica
1
Xalapa, Veracruz, Mxico; 2Izmir, Turkey; 3Tampa, FL, USA; 4Sta. Mara Tonantzintla, Puebla, Mxico
hvazquez@uv.mx
AbstractEn este artculo se muestra cmo se puede adaptar y
aplicar la tcnica de las hiperesferas al trazado de homotopas

multiparamtricas. Adems, se presentar una tcnica basada
en crculos (derivada de las hiperesferas), la cual es ms rpida y
simple de implementar que la tcnica de las hiperesferas. Por
ltimo, se presentar un anlisis comparativo entre ambas
tcnicas aplicndolas a la simulacin de circuitos con
transistores bipolares.
I.
INTRODUCCIN
El aumento en la complejidad de los circuitos, impulsa el

avance cientfico en el rea de las tcnicas de simulacin de
circuitos integrados. Asimismo, las homotopas se han
presentado como una herramienta novedosa y til en el rea
de la solucin del punto de operacin de circuitos [2, 6],
debido a que el mtodo Newton-Raphson (NR), ampliamente
utilizado, presenta problemas de convergencia como
oscilacin y divergencia.
II.
El primer paso para formular una homotopa es establecer

la ecuacin de equilibrio a resolver, la cual se formula a partir
de las leyes de Kirchhoff quedando definida como:
TCNICAS DE TRAZADO
Con la finalidad de aplicar las tcnicas de trazado

descritas en este artculo se utilizar a manera de ejemplo una
homotopa biparamtrica basada en el mtodo homotopa de
Newton:
H ( f ( x), 1 , 2 ) = f ( x, 2 ) (1 1 ) f ( xi , 0)
(1)
donde x representa a las variables elctricas del circuito y n es

el nmero de variables elctricas.
Las homotopas multiparamtricas [9, 3, 8, 11] se
caracterizan por agregar ms de un parmetro homotpico a
la ecuacin de equilibrio. Cuando los parmetros
homotpicos estn ajustados a cero, la solucin de H() es
trivial y cuando los parmetros alcanzan el valor de uno,
entonces se ha localizado el punto de operacin. La funcin
de homotopa multiparamtrica se puede representar como:
(2)
donde los parmetros homotpicos son 1, 2, ... k [0, 1] y

k es el nmero de parmetros homotpicos. Las homotopas
multiparamtricas [9] se han propuesto con la finalidad de
evadir bifurcaciones de horquilla (fork bifurcations),
singularidades, entre otros problemas que se pueden dar con
las trayectorias homotpicas. Asimismo, tanto para las
homotopas
uniparamtricas
[2]
como
para
las
multiparamtricas, la tcnica de trazado [7, 1] es una
herramienta fundamental que puede afectar la convergencia,
velocidad y nmero de soluciones localizadas. Por lo tanto, se
propone aplicar dos tcnicas de trazado a homotopas
multiparamtricas, las cuales sern descritas las prximas
secciones.
III.
HOMOTOPA MULTIPARAMTRICA
f ( x) = 0 donde f : n n
H ( f (x), 1 , 2 , , k ) = 0
donde H : n +1 n .
(3)
Con la existencia de dos parmetros (1 y 2), se producen

dos deformaciones o transformaciones simultneas: una en la
funcin f y otra en la funcin H. Cuando [x, 1, 2] = [xi, 0, 0]
entonces
H ( f ( x), 1 , 2 ) = f ( xi ,0) f ( xi ,0) = 0,
(4)
por lo tanto, la funcin homotpica es satisfecha. Adems,

cuando [1, 2]=[1,1] llega a ser
160
ISSN 977-2177-128009
H ( f ( x), 1 , 2 ) = f ( x),
(5)
as que la solucin de H es la solucin de la ecuacin de

equilibrio. Sin embargo, dado que la funcin H contiene dos
variables extra, es necesario agregar dos ecuaciones al
sistema H para poder resolverlo utilizando tcnicas ms
convencionales como NR.
1.
Ecuacin n+1. Se agrega una ecuacin que defina

la trayectoria 1 2, la cual se denominar funcin
paramtrica M(1, 2). Esta ecuacin cruza por tres
puntos [1, 2]: p1 = [0, 0], p2 = [A,B] y p3 = [1, 1].
La ecuacin es:
M ( 1 , 2 ) = 1 +
2
" A B B 1+ A
2
$
+
$ A B 1
A B 1
#
2.
) %'
) '&
(
(
(6)
donde p2 es definido por el usuario, tal como se

muestra en Fig. 1 (a). El rango de valores para A y B
es [0, 1].
Ecuacin n+2. Se agrega la ecuacin de la
hiperesfera [10]:
2
() (
) (
)
+ ( c ) + ( c )
S = x1 c1 + x2 c2 +
2
n+1
n+2
Es posible remplazar la ecuacin 5 por la ecuacin de un

crculo, en funcin de los parmetros homotpicos:
(7)
El resumen del procedimiento consiste en los siguientes pasos

[10] (ver Fig. 1 (b)):
1) Se establece la primer hiperesfera S0 con centro en
t0 = [xi, p1] y se resuelve el sistema de ecuaciones
((3), (6) y (7)) con el mtodo de NR (usando como
punto de inicio el t0), localizndose el punto t1.
2) Se crea una nueva hiperesfera S1 con centro en t1.
3) Utilizando los puntos t0 y t1 se realiza una
prediccin, la cual toca la hiperesfera S1 en el punto
k1, el cual es utilizado como punto de inicio para el
mtodo NR, hasta localizar el punto t2 sobre la
trayectoria homotpica.
4) Los pasos 2 y 3 se repiten sucesivamente hasta
cruzar por el punto p3.
5) Se utiliza los dos puntos anterior y posterior a p3
para realizar una interpolacin [4]. El tipo de
interpolacin utilizada en este artculo es la
interpolacin lineal multidimensional (conocida
como LERP), la cual produce una aproximacin xa
de la solucin xs de la ecuacin de equilibrio.
6) Finalmente, usando el mtodo NR con punto de
inicio xa, se mejora la precisin del punto de
operacin xs.
C () = ( 1 cn+1 ) + ( 2 cn+2 ) r 2
donde C es el centro de la hiperesfera (el cual ajusta

su valor en cada iteracin) y r << 1 es el radio de la
hiperesfera (tamao de paso).
(8)
donde r << 1. El resto de los pasos para implementar la

continuacin numrica son los mismos que los descritos para
la tcnica de las hiperesferas.
IV.
CASO DE ESTUDIO: CIRCUITO CON TRANSISTORES

BIPOLARES Y DIODO
El siguiente circuito [5] (ver Fig. 2), contiene 9

soluciones, se ha convertido en circuito de referencia para la
Homotopa aplicada al anlisis de circuitos.
(a)
Fig. 2 Circuito de Chua.
(b)
Fig. 1 a) Funcin paramtrica; b) Tcnica de las hiperesferas.
161
ISSN 977-2177-128009
Utilizando el sistema reportado en [5] se formula la

ecuacin de equilibrio aumentada:
f (v1 , v2 , v3 , v4 , 2 ) :
f1 = 6.103168I s (exp(40v1 ) 1)2 4.36634v2
+ 2.863168 I s (exp(40v2 ) 1) 12
f 2 = 5.4v1 + 3.58I s (exp(40v1 ) 1)2 + 6.62 I s ( exp ( 40v2 ) 1)
+ v3 + 0.7 I s ( exp ( 40v3 ) 1) + 0.5I s ( exp ( 40v4 ) 1) 22
f3 = 6.103168I s ( exp ( 40v3 ) 1) 2.863168I s ( exp ( 40+v4 ) 1) 2
+ 4.36634v4 12
f 4 = v1 + 0.7 I s ( exp ( 40v1 ) 1) 2 + 0.5I s ( exp ( 40v2 ) 1)
+ 5.4v3 + 3.58I s ( exp ( 40v3 ) 1) + 6.62 I s ( exp ( 40v4 ) 1) 2 20
donde Is = 106. Se formula el sistema de ecuaciones

aumentado utilizando (3), (6) y (7) u (8) dependiendo de la
tcnica de trazado a utilizar.
En la Tabla I se presenta de manera resumida los
resultados de realizar el trazado de 4 trayectorias con
diferente punto de inicio (xi1, xi2, xi3 y xi4) cada una.
Tabla I Puntos de la simulacin homotpica.
Punto Inicial
#
Punto operacin [v1, v2, v3, v4]
Hiperesferas
Iter.
donde [1, 2] = [0, 0]
xi1=[-5, -5, -5, -5]
519
xs1=[0.3830, -3.5446, 0.3851, -4.0990]
xi2=[-1, -2, -1, 0]
202
xs2 =[0.3869, -4.6321, -0.8002, 0.3775]
xi3=[-5, -0.5, -5, 0]
216
xs3 =[-0.5136, 0.3775, -0.9682, 0.3775]
xi4=[-1, 0, 0, 0]
168
xs4 =[-1.0510, 0.3775, 0.3845, -3.9542]
Punto Inicial
#
Punto operacin [v1, v2, v3, v4]
Crculos
Iter.
donde [1, 2] = [1, 1]
xi1=[-5, -5, -5, -5]
48
xs1 =[0.3830, -3.5446, 0.3851, -4.0990]
xi2=[-1, -2, -1, 0]
48
xs2 =[0.3869, -4.6321, -0.8002, 0.3775]
xi3=[-5, -0.5, -5, 0]
48
xs4 =[-0.5136, 0.3775, -0.9682, 0.3775]
xi4=[-1, 0, 0, 0]
48
xs4 =[-1.0510, 0.3775, 0.3845, -3.9542]
Fig. 3 Trayectorias homotpicas v2-1.
Este proceso se repiti para las 2 tcnicas de trazado,

mostrndose de manera grfica en Fig. 3 (a) y 3 (b). Existen
dos conclusiones interesantes que resaltar: en primera las
trayectorias homotpicas trazadas desde un mismo punto de
inicio conducen exactamente a la misma solucin, de hecho,
al contrastar las figuras punto a punto se puede observar que
es exactamente la misma trayectoria y en segunda, pese a que
las trayectorias son idnticas, la tcnica de los crculos
requiri de un nmero fijo de iteraciones (48), los cuales son
mucho menos que los requeridos con la tcnica de las
hiperesferas.
De hecho, de la Tabla I se puede concluir que en el mejor
de los casos (punto de inicio en xi1), la tcnica de trazado de
los crculos result tener 10.8 veces menos iteraciones que la
tcnica de las hiperesferas.
En ambas tcnicas de trazado se utiliz un radio de r =
0.03 y una funcin paramtrica M con p2 = [0.2, 0.3].
La tcnica de los crculos puede ser modifica cambiando uno

de los dos parmetros homotpicos por alguna variable
elctrica de inters. Por ejemplo, se repiti la simulacin a
partir del punto de inicio xi1, cambiando nicamente el circulo
de (6), por otro en funcin de las variables v1 y 1. El
resultado fue que se traz la trayectoria homotpica ya
conocida (ver Fig. 3(b)) en un total de 191 iteraciones
(localizndose la misma solucin xs1). Tambin es posible
utilizar uno de los dos parmetros homotpicos con ms de
una variables elctrica, para implementar una hiperesfera
reducida. Por lo tanto, en un prximo trabajo se abordar con
ms profundidad este aspecto de la tcnica de los crculos y
su posible aplicacin a la simulacin de circuitos VLSI.
V.
CONCLUSIN
En el presente trabajo se mostr que es posible utilizar la

tcnica de las hiperesferas para el trazado de homotopas
multiparamtricas, tambin se present una tcnica de trazado
derivada de las hiperesferas (crculos); la cual es an ms
simple de programar y rpida que la tcnica de las
hiperesferas. Estos resultados hacen de la tcnica de los
162
ISSN 977-2177-128009
crculos una herramienta atractiva para el trazado de

funciones multiparamtricas.
REFERENCES
[1]
G. Eason, E. L. Allgower and K. Georg, Numerical path following.

1994.
[2] R. C. Melville and L. Trajkovic, Artificial parameter homotopy
methods for the dc operating point problem, IEEE transactions on
computer-aided design of integrated circuits and systems, vol. 12, no.
6, pages 861877, 1997.
[3] J. Roychowdhury and R. Melville, Delivering global dc convergence
for large mixed-signal circuits via homotopy/continuation methods,
IEEE Transactions on Computer-Aided Design of Integrated Circuits
and Systems, vol. 2, no. 1, pages 66-78, 2006.
[4] M. Sosonkina, L. T. Watson, and D. E. Stewart, Note on the end game
in homotopy zero curve tracking, ACM Transantions On
Mathematical Software, vol. 22, no. 3, pages 281287, 1996.
[5] A. Ushida and L. O. Chua, Tracing solution curves of nonlinear
equations with sharp turning points, Circuit Theory and Applications,
vol. 12, pages 121, 1984.
[6] H. Vzquez-L., L. Hernndez-M., and A. Sarmiento-R., Doublebounded homotopy for analysing nonlinear resistive circuits,
International Symposium on Circuits and Systems, vol. 4, pages 32033206, 2005.
[7] H. Vzquez-L., L. Hernndez-M., A. Sarmiento-R., and R. CastaedaS., Numerical continuation scheme for tracing the double bounded
homotopy for analysing nonlinear circuits, International Conference
on Communications, Circuits and Systems, pages 1122-1126, 2005.
[8] H. Vzquez-L., L. Hernndez-M., A. Sarmiento-R., and R. S. MurphyA., Improving multi-parameter homotopy via symbolic analysis
techniques for circuit simulation, 2003 European Conference on
Circuit Theory and Design II, pages 402405, 2003.
[9] D. M. Wolf and S. R. Sanders, Multiparameter homotopy methods for
finding dc operating points of nonlinear circuits, IEEE transactions on
circuits and systems-I: fundamental theory and applications, vol. 43,
no. 10, pages 824837, 1996.
[10] K. Yamamura, Simple algorithms for tracing solution curves, IEEE
transactions on Circuits and Systems I: Fundamental Theory and
Applications, vol. 40, no. 8, pages 537541, 1993.
[11] H. Vzquez-Leal, R. Castaeda-Sheissa, F. Rbago-Bernal, L.
Hernndez-Martnez, A. Sarmiento-Reyes and U. Filobello-Nio,
Powering multiparameter homotopy-based simulaton with a fast pathfollowing technique, ISRN Applied Mathematics, vol. 2011, Article ID
610637, 7 pages, 2011.
163
ISSN 977-2177-128009
Dissipao de potncia em
Redes de Transistores versus Clulas Padro
Gerson Scartezzini, Ricardo Reis
PPGC-PGMicro, Instituto de Informtica
Universidade Federal do Rio Grande do Sul UFRGS
Porto Alegre, Brazil
gerson.scartezzini@inf.ufrgs.br, reis@inf.ufrgs.br
Resumo A necessidade da otimizao de circuitos para se

minimizar a dissipao de potncia tem se tornado cada dia mais
importante. Tcnicas de reduo de consumo em nvel arquitetural
j no so suficientes para minimizar o impacto do consumo em
projetos modernos, principalmente os que utilizam tecnologia
CMOS de nano escala. A clssica metodologia baseada em clulas
padro altamente utilizada em projetos digitais. No entanto, est
longe de obter a mxima minimizao de potncia, necessria para
o desenvolvimento de dispositivos onde existe uma fonte limitada de
alimentao. Tendo em vista estes fatores, percebe-se que de suma
importncia o desenvolvimento de uma nova metodologia em
projetos digitais, de forma que se possa minimizar ainda mais a
dissipao de potncia, principalmente devido ao fator leakage.
Esta metodologia dever ser capaz de realizar a sntese de
transistores, permitindo uma gerao automtica do leiaute de
qualquer funo lgica necessria, otimizando ao mximo o circuito
em questo. Considerando isso, este trabalho concentra-se em
demonstrar quantitativamente que a utilizao de redes de
transistores gera melhores resultados, em termos de potncia e
atraso, que a tradicional utilizao de clulas pr-projetadas,
disponvel nas bibliotecas de clulas padro comerciais.
I.
INTRODUO
Atravs do desenvolvimento de novas tecnologias de

produo de circuitos CMOS, tornou-se possivel criar
dispositivos cada vez menores, permitindo a integrao de um
altssimo nmero de transistors em uma mesma rea de silcio,
dessa forma, a cada gerao tecnolgica, sistemas mais e mais
complexos puderam ser desenvolvidos. Este avano
tecnolgico e o contnuo aumento da densidade de
componentes dos novos sistemas em chip fazem com que haja
uma dificuldade ainda maior ao se tentar minimizar a
dissipao de consumo de potncia [1] [2].
Atualmente uma grande parcela dos projetos
desenvolvida utilizando celulas padro, ou seja, o leiaute do
circuito feito pela montagem de clulas pr-projetadas
existentes em uma biblioteca de clulas [3]. No entanto, um
dos principais problemas de se utilizar este tipo de
metodologia o nmero de funes disponveis nas
bibliotecas comerciais bastante limitado, reduzindo o espao
para optimizaes do circuito, assim como limitando a
possibilidade de minimizar a dissipao de potncia em nvel
fsico. A dissipao de potncia em circuitos CMOS consiste
basicamente de trs fatores [4]: (i) potncia dinmica; (ii)
potncia esttica e (iii) potncia de curto circuito. Por muito

tempo, somente a potncia dinmica foi causa de preocupao,
como sendo fonte significativa de dissipao de potncia em
circuitos CMOS. Dessa forma, muitas tcnicas foram
desenvolvidas para reduzir o consumo dinmico. No entanto,
este cenrio mudou, e a dissipao relativa potncia esttica
(leakage) tornou-se extremamente significante em tecnologias
submicromicas.
De acordo com expectativas do International Technology
Roadmap for Semiconductors 2009, a dissipao esttica de
potncia um fator altamente relevante no consumo total em
circuitos CMOS, aumentando a uma taxa de 10% por gerao
tecnolgica.
Atualmente o desenvolvimento de sistemas de baixo
consumo, sem denegrir seu desempenho tem sido muito
importante na rea de semicondutores, principalmente quando
aplicados em sistemas mveis, onde os recursos de potncia
so normalmente limitados.
Com o passar dos anos, muitas pesquisas e mtodos foram
propostos com o objetivo de reduzir a dissipao de potncia
em nvel arquitetural [5]. No entanto, estes esforos no so
suficientes quando o objetivo minimizar a dissipao esttica
(leakage). Este trabalho objetiva-se a mostrar de forma
quantitativa que a utilizao de ferramentas de gerao
automtica e a gerao de redes de transistores so mais
eficientes em termos de otimizao da dissipao esttica do
que o tradicional uso de clulas padro.
Este trabalho est organizado da seguinte forma: Na seo
II descrevemos a tradicional metodologia de clulas padro, e
suas limitaes. Na seo III apresentamos os benefcios de se
utilizar redes de transistores ao invs de clulas padro, e
finalmente, na seo IV, ser descrito os resultados do
experimento realizado, sees V e VI apresentam,
respectivamente, as concluses e trabalhos futuros.
II.
METODOLOGIA
Buscando a melhora em desempenho e rea, a partir dos

anos 90, projetos, como o 486 iniciaram a utilizao da
metodologia de clulas padro em seu desenvolvimento. Esta
proposta utilizava um conjunto de funes pr-definidos
(como nand, nor, inversor, entre outros) para abstrair o nvel
164
ISSN 977-2177-128009
fsico do projeto, tornando possvel ao projetista se concentrar

em aspectos de alto nvel de abstrao, no havendo
necessidade de se preocupar com o projeto fsico dos
componentes. Este tipo de metodologia tornou-se tradicional
para o desenvolvimento de projetos, e altamente utilizado at
hoje.
Visto que o projeto de um conjunto de clulas representava
importantes custos, o nmero de diferentes funes
encontradas em uma biblioteca de clulas tpica limitada
(geralmente por volta de 150 funes diferentes). Esta
limitao faz com que o projetista no possa alcanar uma real
otimizao do circuito em nvel fsico, visto que para
descrever determinadas funes de maior complexidade, o
projetista dever obrigatoriamente utilizar uma montagem das
funes disponveis na biblioteca.
De maneira a aumentar o nvel de libertade do projetista,
permitindo que ele aumente a otimizao do projeto, uma
nova metodologia foi proposta em [6] [7] [8]. Nestas
abordagens, qualquer funo lgica pode ser projetada a partir
da sua necessidade, durante a etapa de design fsica,
permitindo uma completa otimizao fsica do projeto. Ou
seja, minimizando a quantidade de transistores presentes do
projeto. De acordo com [6], esta abordagem significa uma
mudana no nvel de abstrao da etapa de projeto fsico, visto
que o projeto deixa de ser um simples posicionamento e
roteamento de clulas pr-projetadas, para se tornar um
posicionamento e roteamento de redes de transistores.
III.
A ltima estratgia, e objetivo deste trabalho, reduzir o

nmero de transistores (N). Ou seja, visto que o consumo
esttico diretamente proporcional ao nmero de
componentes, reduzir sua quantidade se torna uma tcnica
bastante eficiente para minimizar o consumo.
No entanto, para reduzir o nmero de transistores
necessria uma mudana na tradicional metodologia de
projeto. Como descrito na seo II, difcil se obter uma
verdadeira otimizao fsica do projeto utilizando a
metodologia de clulas padro. Para se obter uma verdadeira
otimizao, h a necessidade de uma metodologia que permita
o projeto de qualquer rede de transistor. Para este propsito,
fica tambm necessrio um conjunto de ferramentas que
possibilitem o projeto destas redes da maneira mais
automtica possivel.
O uso de redes de transistores (por exemplo, uma porta
complexa com vrios nveis de AOI) pode substituir muito
bem um conjunto de clulas padro bsico.
Na Figura 1, ilustrado o exemplo de uma funo lgica
descrita utilizando clulas padro (esquerda) e a mesma
funo descrita a partir de redes de transistores (direita). Fica
evidente neste exemplo que houve uma importante reduo no
nmero de transistores utilizados nesta descrio. A
abordagem com rede de transistores possui 10 transistores, j
sua equivalente descrita a partir de clulas clssicas formada
por 18 transistores, o que equivale a uma reduo de mais de
44% no nmero de transistores.
USO DE REDES DE TRANSISTORES
De acordo com [9], a dissipao esttica de potncia pode

ser aproximado pela Equao 1:
. . .
(1)
Onde: Pleakage representa a dissipao esttica, N

corresponde ao nmero de transistores, Kdesign um parmetro
dependente do projeto, e Ileak um parmetro dependente da
tecnologia. Ileak depende de parmetros tecnolgicos como o
Vth. Enquanto Kdesign depende de parmetros do projeto, como
por exemplo, a frao de transistores ativos em um momento.
A partir da Equao 1, possivel definir algumas
estratgias comuns para minimizar a dissipao esttica de
potncia:
Uma primeira abordagem seria reduzir a tenso da fonte de
alimentao do circuito. Este mtodo j altamente utilizado
na indstria, principalmente em novas tecnologias. Sua larga
utilizao se d no s pela reduo na potncia esttica, mas
tambm na dinmica [10].
Como segunda abordagem pode se reduzir o Kdesign, ou
seja, reduzir a frao de transistores em estado ativo. Visando
esta abordagem muitas propostas j foram desenvolvidas,
principalmente a nvel arquitetural. Um exemplo a tcnica de
Power Gating [11]. Esta mesma tcnica aconselhavel para
reduzir a ltima componente desta equao, Ileak.
Figura 1. Diferentes opes de projeto, para a mesma funo lgica.
IV.
DESENVOLVIMENTO E RESULTADOS
De forma a verificar que a utilizao de redes de

transistores em projetos de circuitos integrados um mtodo
relevante para reduzir a dissipao de potncia, foi realizado
um estudo de caso, comparando diferentes funes lgicas
descritas como redes de transistores, e seu anlogo utilizando
clulas padro.
O processo de comparao foi conduzido em quatro
etapas: (A) Projeto e Dimensionamento, (B) Leiaute &
Extrao, (C) Simulao & Caracterizao e (D) Anlise e
Comparao.
165
ISSN 977-2177-128009
Figura 3. Leiaute da rede de transistores F=(A*B)+(C*D*E)

gerada automaticamente com a ferramenta ASTRAN.
Com o leiaute implementado, utilizamos a ferramenta

Virtuoso, da empresa Cadence, para realizar a extrao
eltrica de cada leiaute desenvolvido. O objetivo desta
extrao identificar os resistores e capacitores parasitas deste
circuito. Como resultado obteve-se uma nova descrio de
cada funo, contendo tambm os parasitas dela.
Figura 2. Fluxo de desenvolvimento estabelecido para a

realizao deste estudo de caso.
A. Projeto e Dimensionamento
Para o desenvolvimento deste experimento, foi utilizado
um PDK (Process Design Kit) comercial de 0.35m da
empresa AMS (Austria Micro Systems). Mesmo no sendo
um nodo tecnolgico ideal para a anlise de potncia esttica,
sua utilizao j uma boa referncia para se comparar
circuitos desenvolvidos por diferentes metodologias.
A primeira etapa deste experimento foi desenvolver um
conjunto de 14 funes do tipo AOI (and, or, inverter) e 14 do
tipo OAI (or, and, inverter), em um total de 28 funes
lgicas. Para cada uma destas funes, foi descrito uma rede
de transistores utilizando a linguagem Spice.
A partir destas descries, tanto a rede de pull-up, como
pull-down, foram dimensionadas com o mesmo par de w
(wn=1m e wp=1.6*wn). De forma a manter o mesmo atraso
em todas as estruturas, utilizou-se o mtodo Logic Effort [12] ,
em cada uma das descries.
B. Leiaute & Extrao

Aps o projeto e dimensionamento das funes, para cada
uma delas foi gerado seu leiaute a partir da ferramenta
ASTRAN [9]. Esta ferramenta foi utilizada para gerar o
leiaute de maneira automatizada, isto , desenhar os
retngulos, posicionar e rotear as redes de transistores
definidos. Na Figura 2 apresentado o leiaute da funo
F=(A*B)+(C*D*E) gerada pela ferramenta ASTRAN.
C. Simulao & Caracterizao

Utilizando a descrio ps-extrao, cada funo foi
simulada utilizando o simulador Spectre. De maneira a
analisar diferentes parmetros, como: tempo de subida, de
descida, tempo de transio, e dissipao de potncia,
consideramos diferentes sinais de entrada.
Ao todo foram gerados 3038 vetores para caracterizar as
28 funes. Todas elas sendo caracterizadas em corner tipico,
ou seja, em temperatura de 25C e tenso de 3.3V.
Para cada vetor, as clulas foram simuladas utilizando
diferentes slopes de entrada e cargas em suas sadas. O
circuito foi completamente caracterizado em todas as
configuraes de slope e carga ilustrado na Tabela I.
TABELA 1. SLOPE DE ENTRADA E CARGAS DE SADA UTILIZADOS NA
CARACTERIZAO
De forma a automatizar o processo de caracterizao e

gerao dos vetores de simulao, foi utilizado a ferramenta
comercial Encounter Library Characterizer (ELC), da empresa
Cadence. Como resultado, obtivemos um arquivo em formato
liberty (.lib) com a caracterizao de todas as funes.
D. Anlise e Comparao
De maneira a comparar os resultados da simulao,
calculo-se a mdia dos resultados das lookup tables presentes
nos arquivos .lib. Para cada anlise foi obtido um nico
valor de transio, atraso, e potncia utilizado na comparao.
O mesmo processo de desenvolvimento e simulao foi
realizado para os circuitos criados atravs da utilizao de
166
ISSN 977-2177-128009
clulas padro. Dessa forma, podem-se gerar dados concretos

para se comparar as duas metodologias (utilizando redes de
transistores e utilizando clulas padro). Os resultados para

todas as funces lgicas so apresentadas na Tabela II.
TABELA 2. COMPARAO ENTRE FUNES IMPLEMENTDAS UTILIZANDO CLULAS PADRO E REDES DE TRANSISTORES.
Funo
#
F=(A*B)+C
F=(A*B)+C+D
F=(A*B)+(C*D*E)
F=(A*B)+(C*D)
F=(A*B)+(C*D)+E
F=(A*B)+(C*D)+(E*F*G)
F=(A*B*C)+(D*E*F)
F=(A*B*C)+(D*E*F)+(G*H)
F=(A*B*C)+(D*E*F)+G
F=(A*B*C)+D
F=(A*B*C)+(D*E)+F
F=(A*B*C)+D+E
F=(A*B)+(C*D)+(E*F)
F=(A*B*C)+(D*E*F)+(G*H*I)
F=(A+B)*C
F=(A+B)*C*D
F=(A+B)*(C+D+E)
F=(A+B)*(C+D)
F=(A+B)*(C+D)*E
F=(A+B)*(C+D)*(E+F+G)
F=(A+B+C)*(D+E+F)
F=(A+B+C)*(D+E+F)*(G+H)
F=(A+B+C)*(D+E+F)*G
F=(A+B+C)*D
F=(A+B+C)*(D+E)*F
F=(A+B+C)*D*E
F=(A+B)*(C+D)*(E+F)
F=(A+B+C)*(D+E+F)*(G+H+I)
V.
10
12
18
16
20
26
20
28
22
12
20
14
26
30
10
12
18
16
20
26
20
28
22
12
20
14
26
30
Implementao em
Clulas Padro
Potncia
Leakage
Atraso
dinmica
(nW)
mdio(ns)
mdia (W)
0,158
0,146
0,102
0,125
0,118
0,104
0,108
0,097
0,177
0,128
0,116
0,172
0,108
0,099
0,167
0,083
0,074
0,246
0,094
0,084
0,182
0,075
0,067
0,251
0,085
0,077
0,176
0,131
0,119
0,105
0,095
0,087
0,171
0,110
0,102
0,102
0,094
0,085
0,242
0,068
0,061
0,259
0,148
0,140
0,100
0,114
0,108
0,104
0,105
0,096
0,178
0,124
0,113
0,173
0,101
0,092
0,168
0,081
0,072
0,247
0,091
0,083
0,182
0,073
0,065
0,252
0,079
0,072
0,175
0,123
0,115
0,105
0,089
0,081
0,171
0,100
0,094
0,103
0,090
0,080
0,242
0,066
0,059
0,260
#
6
8
10
8
10
14
12
16
14
8
12
10
12
18
6
8
10
8
10
14
12
16
14
8
12
10
12
18
Implementao com
Redes de Transistores
Potncia
Atraso
Leakage
dinmica
mdio(ns)
(nW)
mdia (W)
0,126
0,135
0,037
0,103
0,112
0,041
0,080
0,085
0,040
0,093
0,105
0,038
0,087
0,096
0,039
0,064
0,073
0,042
0,066
0,079
0,042
0,060
0,070
0,043
0,065
0,076
0,040
0,109
0,113
0,042
0,074
0,085
0,040
0,089
0,099
0,040
0,073
0,081
0,039
0,053
0,063
0,044
0,124
0,136
0,036
0,097
0,109
0,040
0,080
0,093
0,040
0,094
0,107
0,039
0,081
0,093
0,041
0,064
0,076
0,042
0,072
0,083
0,042
0,057
0,069
0,043
0,063
0,076
0,042
0,099
0,111
0,040
0,072
0,084
0,041
0,081
0,096
0,041
0,069
0,080
0,042
0,056
0,067
0,044
Reduo mdia (%)
CONCLUSES
Reduo
de # (%)
Reduo
do atraso
mdio (%)
40,0
33,3
44,4
50,0
50,0
46,2
40,0
42,9
36,4
33,3
40,0
28,6
53,8
40,0
40,0
33,3
44,4
50,0
50,0
46,2
40,0
42,9
36,4
33,3
40,0
28,6
53,8
40,0
41,4
20,4
17,6
26,4
27,7
20,0
22,8
29,2
19,2
24,0
17,0
22,8
19,1
22,1
22,8
16,5
15,2
23,6
24,1
19,6
20,3
20,6
20,9
20,0
19,6
19,9
18,7
23,3
14,2
21,0
Reduo
Reduo
da da
potncia do Leakage
(%)
dinmica
mdia(%)
7,8
63,8
5,2
61,0
12,2
77,6
9,1
77,8
3,6
76,5
2,2
82,8
6,1
76,9
-4,3
82,8
1,6
77,1
5,0
60,3
1,6
76,8
3,6
60,9
4,3
83,8
-3,9
83,1
2,8
63,6
-0,6
61,0
3,2
77,4
5,4
77,6
-0,8
75,6
-5,5
82,8
0,2
77,0
-6,9
82,9
-6,2
76,3
3,4
62,1
-4,2
75,8
-2,0
60,5
0,2
82,7
-15,0
83,2
1,0
74,3
grande potencial de reduo do nmero de componentes, e que

necessitam de baixo dissipao de potncia.
A partir dos resultados da Tabela II, pode-se observar uma

significante reduo no nmero de transistors das funes,
quando descritar por redes de transistors comparadas ao seu
anlogo desenvolvida por clulas padro.
Esta reduo, resultou em uma significante melhoria na
dissipao de potncia na forma de leakage, atingindo uma
mdia de 74% de reduo. O atraso mdio tambm sofreu uma
melhoria interessante de 21%. No entando o resultado de maior
relevncia obtido ao sintetizarmos os valores da Tabela II, na
forma de grfico.
Pela Figura 3 podemos notar que tanto o comportamento da
curva de leakage, como a de atraso, so praticamente lineares
reduo do nmero de transistores. Ao compararmos as curvas,
visualmente notamos que quanto maior o percentual de
reduo do nmero de transistores, maiores tambm a
reduo em potncia esttica, na forma de leakage, e atraso
mdio. Com estes dados, torna-se justificvel a utilizao de
redes de transistores em circuitos mais complexos, onde h
Figura 4. Curvas de reduo (normalizada) do nmero de transistores, leakage

e atraso mdio.
167
ISSN 977-2177-128009
VI.
TRABALHOS FUTUROS
Como trabalho futuro, pretendemos desenvolver um

trabalho semelhante utilizando circuitos mais complexos, de
forma a verificar se o comportamento das curvas de reduo se
mantm linear. Um outro importante cenrio que pretendemos
trabalhar, rodar novamente estes experimentos em
tecnologias mais modernas, como 90nm ou 65nm.
REFERNCIAS
[1]
Kim, N.S.; Austin, T.; Baauw, D.; Mudge, T.; Flautner, K.; Hu,
J.S.; Irwin, M.J.; Kandemir, M.; Narayanan, V. Leakage Current:
Moores Law Meets Static Power; In IEEE Computer Society; P. 6875; Vol. 36, 2003.
[2] Jeong T. T. and Ambler P. A.; Design Trade-Offs and Power
Reduction Techniques for High Performance Circuits and System, In
ICCSA 2006, pp. 531-536, vol. 3984.
[3] Reis, R. e Cols., Concepo de Circuitos Integrados, 2 edition..
Srie Livros Didticos do Instituto de Informtica, ed. Bookmann,
Porto Alegre, 2009, 258 Pginas. ISBN 9788577803477.
[4] Henzler, Stephan; Introduction to Low-Power Digital Integrated Circuit
Design - Power Management of Digital Circuits in Deep Sub-Micron
CMOS Technologies; In: Springer Series in Advanced
Microelectronics, 2007, Volume 25, 1-21, DOI: 10.1007/1-4020-5081X_1.
[5] Borkar, S.; , Design challenges of technology scaling, Micro, IEEE ,
vol.19, no.4, pp.23-29, Jul-Aug 1999.
[6] Reis, A; Reis, R; Auvergne D.; Robert M., Library Free Technology
Mapping, In: IFIP TC10 WG10.5 International Conference on Very
Large Scale Integration, Gramado, Brazil, August 26-30, 1997. pp. 303314, ISBN: 0 412 82370 5.
[7] Ziesemer, A.; Lazzari, C., Reis, R., Transistor Level Automatic Layout
Generator for non-Complementary CMOS Cells, In: IFIP/CEDA
VLSI-SoC2007, International Conference on Very Large Scale
Integration, Atlanta, USA, October 15-17, 2007. pp. 116-121, ISBN:
978-1-4244-1710-0.
[8] Reis, R.; , "Physical Design Automation at Transistor Level,"
NORCHIP, 2008. , vol., no., pp.241-245, 16-17 Nov. 2008 doi:
10.1109/NORCHP.2008.4738270
[9] J. A. Butts and G. S. Sohi. A static power model for architects, In
Proc. of the 33rd Annual Intl. Symp. on Microarchitecture, 2000.
[10] Gonzalez, R.; Gordon, B.M.; Horowitz, M.A.; , "Supply and threshold
voltage scaling for low power CMOS," Solid-State Circuits, IEEE
Journal of , vol.32, no.8, pp.1210-1216, Aug 1997. Doi:
10.1109/4.604077
[11] De-Shiuan Chiou; Shih-Hsin Chen; Shih-Chieh Chang; Chingwei Yeh; ,
"Timing driven power gating," Design Automation Conference, 2006
43rd ACM/IEEE , vol., no., pp.121-124, 0-0 0 doi:
10.1109/DAC.2006.229189
[12] Sutherland, I.; Sproull, B.; Harris, D. Logical Effort: designing fast
Cmos Circuits, San Francisco, CA, USA: Morgan Kaufmann
Publishers Inc., 1999.
168
ISSN 977-2177-128009
ALGORITMO RAPIDO
PARA LA BUSQUEDA
DE AUDIO POR CONTENIDO
Algoritmo rapido para la busqueda de audio por

contenido
Adriana Sanabria 1 , Jaime Vitola 1 , Cesar Pedraza 2 , Johanna Sepulveda
1
Universidad Santo Tomas, Facultad de Ingeniera Electronica, Bogota, Colombia.
Universidad Santo Tomas, Facultad de Ingeniera de Telecomunicaciones, Bogota, Colombia
3
University of Sao Paulo Microelectronics Laboratory LME, Sao Paulo, Brazil
adriana.sanabria@ieee.org, jaimevitola, cesarpedraza@usantotomas.edu.co, jsepulveda@lme.usp.br
2
AbstractLas tecnicas de busqueda

de audio por contenido
MIR (music information retrieval) enfrentan dos retos principales: la robustez del algoritmo y la velocidad de su funcionamiento. En este artculo se propone un modelo de algoritmo
rapido para la extraccion de informacion de audio mediante la
tecnica de la extraccion de la huella (fingerprinting) haciendo
uso de los algoritmos de codificacion LPC (Linear Prediction
Coding), Cepstrum y el alineamiento temporal dinamico (DTW).
Las pruebas realizadas determinaron una tasa de acierto cercana
al 100% y tiempos de respuesta aproximadamente de hasta 8
veces menores que los obtenidos con otras tecnicas, permitiendo
mas busquedas
en tiempo real.
de comparacion DTW (DynamicT imeW arping). La aplicacion especfica que se le ha dado a este algoritmo es
el reconocimiento de anuncios comerciales transmitidos por
una emisora de radiodifusion. En la seccion II se expone el
problema de la busqueda de audio por contenido. Luego en la
seccion III se propone un modelo de algoritmo rapido para la
extraccion firmas digitales, comparacion y busqueda en tiempo
real de anuncios comerciales. Posteriormente en la seccion
4 se muestran los resultados de los experimentos realizados
para verificar el funcionamiento del algoritmo y finalmente se
muestran las conclusiones.
.
I. I NTRODUCCI ON
Existe gran interes en el reconocimiento de pistas o piezas
de audio con diversos fines, por ejemplo monitorear emisoras
de radio, reconocer la legalidad de los contenidos publicados
en la Internet o identificar canciones. Para lograrlo han sido
planteadas e implementadas diferentes tecnicas que permiten
identificar una trama de audio entre las demas.
Para conseguir este reconocimiento existen dos mecanismos: agregar una firma digital que contenga informacion
como el ttulo y nombre del autor que se introduce en la
trama de audio sin afectar la percepcion que tiene el odo
humano de la misma. Esta

se transmite con ella y luego en
la recepcion se extrae para su identificacion. En el segundo
mecanismo, la firma digital es extrada de las caractersticas
del audio en tiempo y en frecuencia sin haber sido marcado
con anterioridad [1].
El esquema general que describe un sistema de extraccion
de huella digital implica en primer lugar la adquisicion y
acondicionamiento de la senal de audio, seguido del procesamiento que permita generar una cadena de bits o palabra
correspondiente al firma digital, para compararla con otras
firmas almacenadas en una base de datos y finalmente se
entrega un resultado afirmativo o negativo de la identificacion.
El proceso empleado para la extraccion de la firma digital
puede ser complejo por su implementacion pero debe resultar
en un algoritmo que en lo posible sea robusto, preciso,
confiable, escalable extensible, versatil, granular (en relacion
con el tiempo de duracion de los fragmentos de audio a
ser analizados), emplear el menor tiempo y tener baja carga
computacional [3].
A continuacion se presenta un algoritmo para la busqueda
de audio por contenido basado en las componentes cepstrales
del audio, un codificador de prediccion lineal y la tecnica
II. A NTECEDENTES Y PROBLEMA

El proceso para la identificacion de una senal de audio se
divide en dos partes: la extraccion de la firma digital y la
comparacion de la marca obtenida con las firmas almacenadas
en una base de datos, para determinar el grado de similitud.
En [1] para la primera etapa se han desarrollado tecnicas de
audio fingerprinting como AudioDNA que permite de forma
rapida extraer caractersticas de la pieza musical (audioGenes).
En esta tecnica el proceso empleado para la extraccion de la
huella digital inicia con un front-end el cual obtiene valores
caractersticos de los bloques (tramas de 10ms) de las muestras
de audio. Luego se realiza una convolucion con una ventana
Hamming para disminuir los efectos indeseables de la ventana
rectangular. A la senal resultante se le aplica la transformada
rapida de Fourier (FFT), se calculan los coeficientes MFCCs
(Mel-Frequency Cepstral Coefficients), finalmente se aplica
la transformada discreta del coseno. Los AudioGenes son
modelados mediante HMM (Hidden Markov Models).
En el procedimiento de extraccion de la firma digital,
Sert et al. [2] proponen en lugar de realizar el proceso
para cada cuadro, obtener la firma de cada anuncio, la cual
tendra patrones muy diferentes con la de los demas. Este
metodo aprovecha que las senales de audio generalmente
tienen fragmentos que se repiten en el tiempo como los coros.
La extraccion se realiza por medio de la tecnica conocida
como ASF (Audio Spectrum Flatness), que se aplica tambien
mediante bloques cuya cantidad esta determinada por un factor
llamado decim que corresponde a una potencia de dos [2]. Otro
algoritmo que se ha implementado dentro de las tecnicas de
fingerpriting es el PCA (Principle Component Analysis) [3].
Por su parte la identificacion de la huella consta de la
base de datos y el buscador que compara hasta elegir la
169
ISSN 977-2177-128009
ALGORITMO RAPIDO
PARA LA BUSQUEDA
correspondiente. La comparacion se realiza comunmente con

el algoritmo de correlacion cruzada. Cano et al. dividen la
base de datos principal de 100000 ttulos en 10 sub bases
de datos y realiza una cantidad correspondiente de procesos
independientes paralelos y al final selecciona la mejor respuesta entre las obtenidas de todos los procesos [1]. Estas
tecnicas son cada vez mas robustas para poder enfrentar
distorsiones generadas por radio difusion, en el entendido que
las estaciones emplean compresores para el ahorro de memoria
y el pitching (reproducir audio mas rapido) para el ahorro de
tiempo.
Es muy comun emplear el modelo psico-acustico del ser
humano que describe la sensibilidad del odo a diferentes
parametros del audio (frecuencia, tiempo, tonos) para reducir
la cantidad de informacion a procesar [4].
III. A LGORITMO IMPLEMENTADO
El algoritmo propuesto de la figura 1 se compone de dos
partes, la primera corresponde a la extraccion de la firma
digital que caracteriza de manera u nica el comercial bajo
estudio y la extraccion de las caractersticas en tiempo real de
la informacion de entrada y la segunda parte es la comparacion
de las firmas en busqueda de coincidencia. Finalmente se
decide si es o no el comercial buscado dependiendo del grado
de similitud de las firmas.
ventanas o tramas de 1024 muestras (64ms) para realizar

el analisis en componentes de audio mas pequenas. A cada
ventana se le aplican dos algoritmos de codificacion: LPC
(Linear Prediction Coding) y Cepstrum. De cada ventana de
1024 muestras a la que se aplica LPC se extraen los 21
primeros coeficientes ya que estos contienen la informacion
suficiente para caracterizarla. Estos coeficientes son la entrada
al algoritmo de Cepstrum real, teniendo a la salida 21 datos
por ventana nuevamente. Este conjunto de datos conformara
la firma de audio del comercial que se encuentra almacenado.
De la misma forma se extraen las caractersticas del audio
capturado.
LPC es una poderosa herramienta que permite estimar
parametros de las senales de audio como los tonos, las
componentes y el espectro de forma rapida y precisa [5]. El
funcionamiento de este metodo se entiende como una aproximacion a una muestra de audio mediante una combinacion
lineal de las anteriores como lo describe la ecuacion 3, donde
p es el orden del predictor que suele ser un valor entre 12 y
18 [6] y los valores a son conocidos como los coeficientes de
prediccion. Por su parte e[t] es el error entre la aproximacion
lograda por los predictores y el valor real de la muestra. As
una trama de audio puede ser modelada como un sistema
variable en el tiempo con una entrada formada por pulsos casi
periodicos.
Extracci
on de la firma
x[t] = a1 x[t 1] a2 x[t 2]... ap x[t p] + e[t] (1)
Archivo
de
audio
Se han planteado diferentes metodos para hallar los coeficientes de prediccion lineal como son:
1) Covarianza
2) Autocorrelacion
3) Enrejado
4) La formulacion inversa
5) La estimacion del espectro
6) La maxima probabilidad
7) El producto interno.
Para el caso particular de este algoritmo se implemento el
metodo de Autocorrelacion inventado por N. Levinson en 1947
y modificado por J. Durbin en 1959.
El cepstrum es una transformacion de la senal de audio
con dos principales propiedades: separa las componentes y las
cmbina linealmente [7]. Esta tecnica fue propuesta por Bogert,
Healy y Tukey (1963) y Noll(1967) El cepstrum real de una
senal x(n) se calcula como lo muestra la ecuacion 2.
Se
nal de
audio
Normalizaci
on
Normalizaci
on
Enventanamiento
Enventanamiento
LPC Cepstrum
LPC Cepstrum
Firma de audio
Firma de audio
DTW
Normalizaci
on
Derivada
C(n) = F 1 {log|F(x(n))|}
coincidencia?
Si
B. Comparacion de las firmas digitales.
Fig. 1: Esquema general del algoritmo.
A. Extraccion de las caractersticas del audio.

En la extraccion de la huella, la senal de audio es digitalizada y almacenada. A continuacion se divide en pequenas
(2)
El alineamiento temporal dinamico (DTW) es una medida

de la similitud de series de tiempo basada en el analisis de la
forma. Esta medida se logra mediante la implementacion de
programacion dinamica que plantea distintas rutas entre dos
series de tiempo y encuentra la de menor distancia. Se define
tambien como una tecnica para encontrar un alineamiento
o ptimo entre dos series dependientes del tiempo [8]. esta
tecnica ha sido empleada en reconocimiento de habla, minera
170
ISSN 977-2177-128009
ALGORITMO RAPIDO
PARA LA BUSQUEDA
Serie temporal 1
g=
i(j1)
si
j, i = 0
si
i = 0, j > 0
M(i1)j
si j = 0, i > 0
min(M (i 1)j, M i(j 1),
M (i 1)(j 1))
si
i, j > 0
(4)
n,m
Serie temporal 2
Serie
de
tiempo
A
Fig. 2: Compresion o expansion del tiempo en el Alineamiento

temporal dinamico DTW.
1,1
Serie de tiempo B
de datos y en la extraccion de informacion [9]. A pesar de

las similitudes entre la distancia euclideana y la DTW, esta
u ltima resulta mucho mas robusta ya que la distancia primera
corresponde a la suma de las distancias punto a punto entre
dos series de tiempo, mientras que el alineamiento temporal
dinamico permite comprimir o expandir el eje del tiempo
encontrando la distancia mnima [10] como lo ilustra la figura
2. En e sta se evidencia que no se hacen divisiones uniformes
del tiempo sino que se puede ampliar o estrechar el intervalo
te tiempo con el fin de obtener la distancia mas corta entre
las dos senales. Otra ventaja de esta tecnica es que permite
comparar dos series de tiempo incluso de distintas longitudes
encontrando las distancias acumuladas, de forma que para dos
tramas de audio similares la distancia acumulada es mnima,
y esta aumenta conforme se hacen diferentes los segmentos
de audio.
El calculo de la DTW entre dos series de tiempo A y B se
realiza construyendo una matriz M de tamano n m donde
n es la longitud de A y m es la longitud del vector B. La
posicion M ij de la matriz se calcula como el cuadrado de la
diferencia entre Ai y Bj mas el mnimo de los valores que
preceden al valor M ij, en este caso M (i 1)j, M i(j 1)
y M (i 1)(j 1) como lo representan las ecuaciones 3 y 4.
De esta forma al llegar al ultimo dato de la matriz, este tendra
almacenada informacion de la ruta de menor distancia que se
recorrio desde el punto de origen hasta el como lo ilustra la
figura 3.
Mij = (Ai Bj )2 + g
(3)
Fig. 3: Busqueda de la ruta mas corta.
IV. R ESULTADOS .
A. Experimentos.
Se realizaron dos tipos de experimentos para validar el
algoritmo. El primero consistio en verificar los tiempos de
respuesta para la busqueda de pistas de audio de 5, 10, 20 y
30 segundos en una trama de audio de 3600 segundos. Las
longitudes de las pistas fueron seleccionadas de acuerdo a
las duraciones tpicas de anuncios comerciales de la radio.
El segundo experimento consistio en verificar la efectividad
del algoritmo respecto a su confiabilidad en la diferenciacion
de distintos anuncios o posibles combinaciones de audio entre
s. Para esto, se realizo un banco de pruebas en el que se
generaron pistas de forma aleatoria y cada cierto tiempo se
insertaron pistas a buscar tambien de forma aleatoria.
B. Tiempos de respuesta.
Para obtener los tiempos de respuesta se lanzo el algoritmo
en un computador con un procesador core i7 con 4GB de
memoria. La tabla I muestra los valores de tiempo obtenidos
y aquellos con la tecnica de la correlacion [11]. Se observa
que los tiempos de respuesta son notablemente menores,
obteniendose valores de speedup de hasta 8 para el caso del
comercial de 30 segundos. De lo anterior se deduce que es
mas eficiente buscar pistas de audio de mayor longitud.
Pista con longitudes de 30, 20 y 10 segundos fueron
insertadas de forma aleatoria en la trama de audio de 3600
segundos. Las figuras 4, 5 y 4 muestran la probabilidad de
encontrar dicha pista calculada por el algoritmo en prueba.
Se observa una probabilidad cercana a uno en los puntos en
171
ISSN 977-2177-128009
ALGORITMO RAPIDO
PARA LA BUSQUEDA
Tabla I: Tiempos de respuesta para busqueda en una trama de

3600 segundos.
Pista [s]
5
10
20
30
TR algoritmo
37.25
58.28
101.62
141.13
TR algoritmo XCORR [11]

193.72
383.42
763.55
1140.59
Speedup
5.2
6.57
7.51
8.08
encontro una variacion de la probabilidad en funcion de la

posicion en que se encuentre la pista a buscar, pero en todos
los casos esta fue lo suficientemente alta para ser detectada.
La figura 7 muestra los valores de probabilidad maximos
obtenidos cuando una pista de 30 segundos fue insertada
dentro de otra de 120 segundos en varias posiciones en
distintas repeticiones de la prueba. En el eje x de la grafica se
muestra el punto donde la pista de 30 segundos fue insertada.
donde se encontraron las pistas, lo cual confirma la efectividad

del algoritmo para este caso.
Probabilidad con pistas aleatorias

1
Probabilidad
Probabilidad
0.95
Probabilidad
Probabilidad para comercial de 30 segundos

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Pista 30s
0.9
0.85
0.8
0.75
0.7
0.65
0
500
1000
1500
2000
Tiempo [s]
2500
3000
3500
4000
500
1000
1500
2000
2500
Tiempo en que se inserta la pista [s]
3000
3500
Fig. 7: Probabilidad maxima para multiples pruebas con una

pista de 30 segundos en distintas posiciones.
Fig. 4: Probabilidad calculada para una pista de 30 segundos.

V. C ONCLUSIONES Y TRABAJO FUTURO .
Probabilidad

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Pista 20s
500
1000
1500
2000
Tiempo [s]
2500
3000
3500
4000
Probabilidad

0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Pista 10s
500
1000
1500
2000
Tiempo [s]
2500
3000
3500
4000
Se diseno y se probo un algoritmo para la busqueda rapida

de audio por contenido. El proceso de busqueda fue acelerado
mediante el uso de las tecnicas LPC y cepstrum para la
extraccion de la firma, las cuales reducen considerablemente
la cantidad de datos a procesar en comparacion con otras
tecnicas. Para la comparacion se implemento la tecnica de
alineamiento temporal dinamico (DTW) demostrando su efectividad para determinar el grado de similitud de tramas de
audio y se demostro que es una tecnica mas eficiente que
la correlacion, lo que puede ser decisivo en la intencion de
aumentar el numero de comerciales que puede el sistema
detectar en cada instante de tiempo.
Se corroboro su funcionamiento para la busqueda de pistas
de 10, 20 y 30 segundos y se demostro un rendimiento de
hasta 8 veces mas comparado con la tecnica que hace uso
de la correlacion. La prueba con pistas aleatorias demostro
la confiabilidad del algoritmo para la diferenciacion con otros
datos, as como la posicion en el tiempo en que se localiza. Se
encontro que segun la posicion en el tiempo, la probabilidad
calculada puede variar, pero en todos los casos los valores
obtenidos fueron lo suficientemente altos para diferenciar la
pista a buscar de las demas. Sin embargo existe una tendencia
a que la exactitud del algoritmo mejore con valores de tiempo
de comercial por encima de 20 segundos y se decremente
drasticamente con valores inferiores a 10 segundos.
R EFERENCES
C. Pruebas con pistas aleatorias.

Se realizaron pruebas lanzando pistas de forma aleatorias
y cada cierto tiempo se anadio la pista para su deteccion. Se
172
[1] P. Cano, E. Batlle, T. Kalker, and J. Haibma, A Review Algorithms

Audio Fingerprinting, in IEEE International workshop on MMSP,
vol. pp, pp. 169173, 2002.
[2] M. Sert, B. Baykal, and A. Yazc, A Robust and Time-Efficient
Fingerprinting Model for Musical Audio, Audio, 2006.
ISSN 977-2177-128009
ALGORITMO RAPIDO
PARA LA BUSQUEDA
[3] L. Shen, Y. Guan, Y. Wu, and Y. Zhao, Fast audio fingerprint search
strategy for song identification, Networking and Digital Society, International Conference, pp. 259262, 2009.
[4] G. Clarence, Robust computer voice recognition using improved , New
Trends in Information and Service , International Conference, pp. 835
840, 2009.
[5] C. Di Brina, R. Niels, A. Overvelde, G. Levi, and W. Hulstijn, Dynamic
time warping: a new method in the study of poor handwriting., Human
movement science, vol. 27, no. 2, pp. 24255, 2008.
[6] J. Coleman, Introducing speech and language processing. Press syndicate of the University of Cambridge, 2005.
[7] J. Deller, J. Hansen, and J. Proakis, Discrete Time Processing of Speech
Signals. Wiley-Interscience, 2000.
[8] M. Muller, Information Retrieval for Music and Motion. Springer, 2007.
[9] V. Niennattrakul and C. A. Ratanamahatana, On Clustering Multimedia
Time Series Data Using K-Means and Dynamic Time Warping, 2007
International Conference on Multimedia and Ubiquitous Engineering
(MUE07), pp. 733738, 2007.
[10] E. K. y M. Pazzani, Derivative Dynamic Time Warping, Science,
pp. 111, 2000.
[11] J. Martinez, J. Vitola, A. Sanabria, and C. Pedraza, Fast parallel audio
fingerprinting implementation in reconfigurable hardware and gpus, in
Programmable Logic (SPL), 2011 VII Southern Conference on, pp. 245
250, april 2011.
173
ISSN 977-2177-128009
Deembedding On-Wafer S-parameters: Is a One-Step

Procedure Enough?
Svetlana C. Sejas-Garca and Reydezel Torres-Torres
Instituto Nacional de Astrofsica, ptica y Electrnica (INAOE), Department of Electronics
carsof@inaoep.mx, reydezel@inaoep.mx
Tonantzintla, Puebla, Mexico
Calibration
Calibration
DUT
Deembedding
P2
P1
AbstractA systematic analysis to verify the differences between

the popular one- and two-step deembedding algorithms for
correcting on-wafer measurements is presented in this paper.
This analysis allows for the identification of the potential errors
introduced by the one-step deembedding when measuring low
and high impedance devices at microwave frequencies. Using the
information provided in this paper, the microwave device
designer can determine if using the simplest deembedding with
only one dummy structure is enough for the type of device and
frequencies of interest.
Fig. 1. Sketch depicting the calibration and deembedding planes.
I. INTRODUCTION
Modeling and characterizing the small-signal behavior of
semiconductor devices on-wafer require reliable experimental
data, which are collected using test equipment that introduces
systematic errors [1]. These errors can be removed using
mathematical algorithms that move the measurement reference
plane closer to the device-under-test (DUT) [2]. Ideally, this
reference plane must see only the DUT so that the
experimental data contain only information about the desired
device.
When performing on-wafer measurements, the correction
algorithms are categorized in two types: calibration and
deembedding procedures. In essence, these two types of
procedures are used with the same purpose. However, the term
calibration procedure is used to refer to the procedure that
removes the errors introduced by the parasitics associated with
the cables, connectors, probes, and other accessories needed to
apply and sense signals to and from the DUT [3], [4]. On the
other hand, a deembedding procedure is that used to remove
the effects associated to the on-wafer pads and interconnects
that serve as interface between the DUT and the probing pads.
For clarifying the difference between these two procedures,
Fig. 1 shows the measurement planes after performing
calibration and deembedding of measured two-port network
parameters.
Whereas the calibration procedure is typically performed
by using data measured off-wafer to an impedance-standardsubstrate (ISS) provided by the probe manufacturer [3], for
carrying out a deembedding, measuring additional on-wafer
dummy structures is needed [5],[6]. Unfortunately, these
dummy structures take precious space from the die, which
increases the corresponding cost and in many cases is not
available. For this reason, it is desirable to use the minimum
number of these structures when correcting the measurements

for the pad parasitics. In fact, this is the main reason why the
one-step deembedding is still very popular for removing the
effect of these parasitics [7] since only a single open dummy
is required. However, this deembedding does not
appropriately account for the effect of the series parasitics
associated with the pads and other on-wafer interconnects
outside the DUT, introducing errors in the collected data. For
this reason, several deembedding procedures have been
proposed in recent years, [5], [8], [9].
In spite of all the advances in the development of
deembedding procedures, an analysis that allows to determine
the conditions at which using only one dummy structure
(together with a one-step deembedding) is enough for
correcting the pad parasitics in on-wafer measurements is not
available. This analysis is important for designers that are
limited in die space and for those measuring at relatively low
frequencies. Thus, in this paper, this topic is studied using
several test and dummy structures measured up to 30 GHz.
From the obtained results the designer can take informed
decisions about including structures to perform either one- or
two-step (including a short dummy) deembedding
procedures.
II. PROTOTYPES AND EXPERIMENTS
In order to show the development and verification of the
proposal, test structures were fabricated on an RFCMOS
process. These structures are named: open (open circuit at
both ports), short (short circuit at both ports), load (a
broadband 50- load interconnecting port-1 and port-2), and
thru (a line of negligible length interconnecting port-1 and
This project was sponsored by CONACyT Mexico under grant #128818 and
scholarship #213385
174
ISSN 977-2177-128009
Y3
Z1
Z2
Y1
Y2
P1
P2
Z3
Fig. 3. Model representing the DUT embedded in the parasitic effect from
pad structures.
Y3
Fig. 2. Micrographs of the fabricated test structures.
port-2). The pads in these structures were formed with

aluminum on field oxide grown on a p-type silicon substrate
with 20-cm resistivity. Fig. 2 shows the corresponding
micrographs. It is important to mention that these pads are
shielded from the substrate by means of a metal layer formed
at a lower metal level (i.e. a ground shield). This is a common
practice in modern test chips since it reduces the negative
impact of the substrate losses in the measurements. In
addition, notice that the pads are designed to measure Sparameters using ground-signal-ground (GSG) probes with a
150-m pitch. For this purpose, a vector network analyzer
setup was calibrated up to the probe tips by applying an offwafer line-reflect-match (LRM) algorithm and an impedance
standard substrate (ISS). These measurements are used
throughout this paper to analyze the differences between the
one-step deembedding (1SD) that uses only an open structure
and the two-step deembedding (2SD) that uses an open and a
short.
Y1
Y3
Z1
Z2
Y2
Y1
P1
P2
Z3
Fig. 5. Model that represents the short structure.
Fig. 3 shows the model of a DUT embedded between a

test-fixture consisting of pads and other interconnects. In this
case, the parasitic effects associated with the test fixture are
represented by means of generic admittance and impedance
blocks. The admittance blocks account for the shunt parasitics
associated with the capacitors formed between the pads and
the ground shield, whereas the impedance blocks take into
account the series parasitics associated with the finite
resistance of the pads and other interconnects. As is well
known, all these parasitics must be removed from the
measurements so that the experimental data correspond to the
DUT. Thus, to eliminate these effects, a deembedding
technique is performed as explained hereafter.
P2
Fig. 4. Model that represents the open structure.
III. REVIEWING THE 1SD AND 2SD PROCEDURES
The simplest deembedding procedure is 1SD. In this case,

an open dummy structure is used, in which each one of the
two pad-to-DUT interfaces is terminated with an open circuit,
and the corresponding equivalent circuit is that shown in Fig.
4.
Y2
P1
When comparing the models in figures 3 and 4, and

neglecting the series impedances, the Y- parameters associated
with the DUT can be obtained by applying the following
matrix operation:
where the sub-indexes RAW and OP are used to distinguish

the parameters associated with the raw measurements (i.e.
those including the test fixture effects), and those
corresponding to the open structure, respectively.
Since the series parasitics are neglected in the 1SD, some
errors are still present. For this reason, the 2SD was
developed. In this case, the deembedding process requires the
measurement of the open and short structures. Thus,
considering that the short structure can be represented using
the model shown in Fig. 5, the Y- parameters associated with
the DUT are now obtained by means of:
(2)
175
ISSN 977-2177-128009
THRU
50j
25j
DUT
Low
impedance
P2
High
impedance
High
impedance
P1
100j
THRU S11
10j
250j
One step
deembedding
(blue)
Two step
deembedding
(red)
10
25
50
100
250
-10j
-250j
Measurement
(black)
Fig. 6. Model that represents the thru structure.
-25j
-100j
-50j
Z ref =50
P1
P2
Fig. 8. S11 for the thru plotted in a Smith Chart up to 30 GHz.

50j
Z ref =50
S 11
Low
impedance
25j
100j
THRU S21
10j
250j
Two step
deembedding
(red)
Fig. 7. Model for obtaining S11 for the thru.
where the matrix Zsh represents the experimental Zparameters associated with the short structure.
From (1) and (2) it is clear that the matrix operations used
in both deembedding procedures are simple and can be easily
implemented in any software that allows data processing.
Thus, the reason why the 1SD could be preferred by
microwave and device engineers is because it only uses one
dummy structure. However, in this case, there is a penalty in
accuracy which is discussed afterwards.
IV. COMPARISON USING THRU AND LOAD STRUCTURES
For comparing the 1SD and 2SD procedures, the
deembedded measurements corresponding to the thru and
load structures were compared.
Ideally, after deembedding the effect of the pads from the
measurements of the thru structure, the DUT correspond to a
small line which ideally presents a series impedance equal to
zero. A more realistic model, however, is that shown in Fig.
6. Assuming that the parasitic capacitance introduced by this
small line is very small, the equivalent circuit model for
obtaining the reflection parameter S11 is illustrated in Fig. 7.
In this case, an RF source with a reference impedance (in this
case of 50 ) applies a signal to the port-1 and the port-2 is
terminated with the same reference impedance. Since S11 is
obtained as:
(3)
where ZL is the sum of the low impedance associated with the
thru plus the reference impedance at port-2, then it is
expected that the result approximately corresponds to a
10
25
50
100
250
-10j
-25j
-100j
-50j
One step
deembedding
-250j (blue)
Measurement
(black)
Fig. 9. S21 for the thru plotted in a Smith Chart up to 30 GHz.
measurement in matched condition (corresponding to a point

at the center of a Smith Chart).
Fig. 8 shows that the raw measurements corresponding to
S11 considerably differ from the ideal behavior of a thru
structure and these data show the considerable impact of the
capacitive parasitics. Once the 1SD is applied, however, the
corrected data now presents noticeable inductive effects
associated with the series parasitics, which are still not
removed. In contrast, the 2SD also takes into consideration
these series parasitics and the corresponding data approach
the single point expected for an ideal thru. This is not
perfectly achieved due to the fact that the measured thru still
presents a finite series impedances as depicted in the model
shown in Fig. 6. Similar reasoning can be applied when
analyzing the transmission S21 parameter of the thru. In this
case, it is ideally expected that the signal reaching the port-2
is neither attenuated (i.e |S21|1) nor delayed (i.e. phase(S21)
0) with respect to the signal applied at port-1. The data
corresponding to this parameter are shown in Fig. 9. Again
substantially better results are obtained using the 2SD
procedure.
For the case of the load structure after removing the effect
of the test fixture, the port-1 and port-2 are interconnected
through a 50- load. The corresponding equivalent circuit is
176
ISSN 977-2177-128009
LOAD
50j
25j
DUT
LOAD S11 One step
50
10j
High
impedance
High
impedance
P1
100j
deembedding
(blue)
Two step
deembedding
(red)
P2
10
25
50
100
250j
250
-10j
-250j
Measurement
(black)
-25j
-100j
-50j
Fig. 10. Model that represents the load structure.
Fig. 12. S11 for the load plotted in a Smith Chart up to 30 GHz.
P1
Z ref
50
P2
50j
25j
S11
100j
LOAD S21
Z ref =50
Two step
deembedding
(red)
10j
Fig. 11. Model for obtaining S11 for the load.
10
that shown in Fig. 10. Similarly to the case of the thru,

assuming that the parasitic capacitive coupling of the load
with the ground is negligible, the equivalent circuit model for
obtaining the reflection parameter S11 is that illustrated in Fig.
11. Now, the RF source with a reference impedance applies a
signal to the port-1, the signal travels through the load and the
port-2 is terminated with the reference impedance.
The representation of S11 for this load in a Smith Chart is
also very simple. Notice in Fig. 11 that the impedance seen at
point P1 by the RF source is 100 , which can be mapped to
the Smith Chart using (3) as S111/3. As in the case
corresponding to the thru, some inductive parasitics remain
after performing the 1SD. However, in this case the
difference is smaller since the impedance of the load is
considerably larger than that of the thru. In other words, the
error introduced in the deembedded data when applying the
1SD is lower for the load than what it is for the thru. This will
be explained with more detail in the following section.
For completeness, Fig. 13 shows S21 for the load, showing
that the difference between 1SD and 2SD is less accentuated
than that observed in Fig. 9 for the thru.
25
50
100
250
One step
deembedding
(blue)
-10j
-25j
250j
-250j
Measurement
(black)
-100j
-50j
Fig. 13. S21 for the load plotted in a Smith Chart up to 30 GHz.
Figures 14 and 15 show the magnitude of the reflection and

transmission for the thru and load structures after performing
1SD and 2SD. Notice that the error is less than 0.5 dB up to
30 GHz in all cases (except for |S11| corresponding to the thru
since the return loss is very low). However, the percentage of
error is much bigger for the thru since the insertion and return
losses are considerably smaller than those associated with the
load. Thus, when measuring low impedance devices such as
inductors or transistors, even at low frequencies is expected
to obtain erroneous data when only applying a 1SD
procedure.
VI. CONCLUSIONS
V. DISCUSSION
A final comparison is carried out in this section to point
out the differences between the two studied methods when
applied to DUTs with relatively low and high impedances. As
mentioned before, the 1SD neglects the series parasitics
associated with the test fixture. These impedances are in the
order of a few ohms and can be neglected when measuring
the high-impedance DUTs; for instance, a MOS capacitor.
An exhaustive analysis of the experimental data collected

to several on-wafer structures was carried out to verify the
applicability of the typically used 1-step and 2-step
deembedding procedures. The results show that using only an
open structure to perform deembedding may introduce severe
errors in the analysis of DUTs, which becomes more serious
as the corresponding impedance is comparable with the series
parasitics introduced by the test fixture. In this case, using a 2step deembedding is mandatory. However, it was also
177
ISSN 977-2177-128009
-6.0
-10
|S11| (dB)
|S11| (dB)
LOAD
Raw measurement
One step deembedding
Two step deembedding
-7.5
-9.0
-20
-30
THRU
Raw measurement
-40
-10.5
0.0
-0.5
|S21| (dB)
|S21| (dB)
-3
LOAD
Raw measurement
-4
-5
-1.0
THRU
Raw measurement
-1.5
-2.0
10
15
20
25
30
f (GHz)
10
15
20
25
30
f (GHz)
Fig. 15. Magnitude of the reflection and transmission parameters for the load
structure up to 30 GHz.
Fig. 14. Magnitude of the reflection and transmission parameters for the thru
structure up to 30 GHz.
observed that the error in dB introduced by using the 1SD

procedure remains approximately constant when varying the
impedance of the DUT; thus, its relative value with respect to
the actual data associated with the DUT becomes small when
the corresponding impedance is large. In this case, the 1-step
deembedding can be used even at frequencies as high as 30
GHz. Bear in mind, that this is only valid for devices with
high impedance such as small capacitors or the input port of
some amplifiers.
[3] A. Davidson, K. Jones, E. Strid, "LRM and LRRM Calibrations with

Automatic Determination of Load Inductance," ARFTG Conference
Digest-Fall, 36th, vol.18, pp.57-63, Nov. 1990.
[4] J. Stenarson, K. Yhland, "A Reformulation of TRL and LRM for SParameters," Microwave Measurement Conference, pp.1-4, Jun. 2009.
[5] L. Tiemeijer, R. Havens, A. Jansman, Y. Bouttement, "Comparison of
the "Pad-Open-Short" and "Open-Short-Load" Deembedding
Techniques for Accurate On-Wafer RF Characterization of HighQuality Passives," IEEE Trans. Microw. Theory Tech., vol.53, no.2,
pp.723-729, Feb. 2005.
[6] R. Torres-Torres, R. Murphy-Arteaga, J. Reynoso-Hernandez,
"Analytical Model and Parameter Extraction to Account for the Pad
Pparasitics in RF-CMOS," IEEE Trans. Electron Dev., vol.52, no.7,
pp.1335- 1342, Jul. 2005.
[7] Y. Kuo-Liang, G. Jyh-Chyurn, "A New Method for Layout-Dependent
Parasitic Capacitance Analysis and Effective Mobility Extraction in
Nanoscale Multifinger MOSFETs," IEEE Trans. Electron Dev., vol.58,
no.9, pp.2838-2846, Sept. 2011.
[8] C. Hanjin, D. Burk, A three-step Method for the De-Embedding of
High-Frequency S-Parameter Measurements, IEEE Trans. Electron
Dev., vol.38, no.6, pp.2838-2846, Jun. 1991.
[9] M. Drakaki, A. Hatzopoulus, S. Siskos, De-Embedding Method for OnWafer RF CMOS Inductor Measurements, Microelectronics
Journal, vol. 40, pp. 958-965. Feb. 2009.
ACKNOWLEDGMENT
The authors thank imec vzw for supplying the test
structures.
REFERENCES
[1] V. Camarchia, V. Teppati, S. Corbellini, M. Pirola, "Microwave
Measurements Part I Non-linear Measurements," IEEE Instrumentation
& Measurement Magazine, vol. 10, no.3, pp.34-39, June 2007.
[2] G. Engen and C. A. Hoer, Thru-Reflect-Line: An Improved Technique
for Calibrating the Dual Six-Port Automatic Network Analyser, IEEE
Trans. Microw. Theory Tech. vol. 27, no.12, pp.987-993, Dec. 1979.
178
ISSN 977-2177-128009
Architecture for myolectric features extraction by H.O.S. of four

sMES channels
Salvador Antonio Arroyo Daz
Alejandro Daz Sanchez
Apolo Z. Escudero Uribe
National Institute for Astrophysics, Optics

and Electronics (INAOE). Autonomous
University of Puebla (BUAP)
National Institute for Astrophysics,

Optics and Electronics (INAOE). Puebla
Institute of Technology.
zeus.escuder@eotec.com
sarroyo@inaoep.mx
adiazsan@inaoep.mx
EOTEC Labs, Mxico
increase the rates, algorithms were coded in C language, and

implemented on a special purpose chip. However, to achieve
higher processing rates, the software is realized in a
high-level hardware description language and implemented
in a Field Programmable Gate Array (FPGA). That approach
can be considered an attractive alternative for its application
on intensive computing algorithms [15][16].
In the next sections, we describe a sMES acquisition
process, and the use of it to feature extraction for control a
prosthetic arm. The main characteristic of HOS, and a
methodology to implement the architecture to calculate the
cross moments and the FPGA implementation results is also
discussed. Finally, some concluding remarks are depicted.
Abstract In this paper, an FPGA-based parallel

architecture for the computation of myoelectric signal feature
extraction, employing higher order statistics is presented. The
three channels myoelectric signal is obtained directly from the
user's muscles, in order to obtain prosthesis movement
commands which emulate a biological elbow. The proposed
architecture was realized in Very High Speed Integrated Circuit
(VHSIC) by Hardware Description Language (VHDL), and
functionally verified on a Xilinx board, which uses a Spartan-3
XC3S700AN FG484-4 FPGA. Experimental results are
presented, and establish a maximum operation frequency of
44.570MHz.
I. INTRODUCTION
Current research in prosthetics has been focused on the
development of prosthetic hands and prosthetic legs [1]-[2].
Nowadays, most of the implementations of mechanical and
myoelectric prosthetic elbows [3] are serial and with a single
degree of freedom (DOF), such as: Utah Arm [4], and the
Edinburgh Arm [5]. In contrast, a complete and functional
prosthetic elbow must have 3 motorized-axes in order to
provide 3 DOFs [5]-[6].
For such devices, the control the parallel mechanism is an
unavoidable requirement. There are several ways for
prosthesis control, such as voice commands, switches with
programmed routines, movements of any part of the user's
body and myolectric signals [9]. The acquisition of
myoelectric signals using superficial electrodes (sMES) have
been recently used for that purpose. Their special
characteristics in time and frequency have been chosen to
realize a refined movement command, and provide a
functional movement similar to the biological elbow
mechanism. Several methods are proposed to classify
myoelectric signal for prosthesis control [10]-[13]. Recent
studies have shown that, depending on the level of Maximum
Voluntaries Contraction (MVC), the probability density
function of the sMES may become more Laplacian than
Gaussian.
Therefore,
by
assuming
non-Gaussian
distributions, this paper propose the use of higher order
statistics as feature extractor for the classification of sMES
In [14], the use of high order statistics is proposed to
extract the characteristics of this kind of signals. In order to
II.
SMES ACQUISITION PROCESS
The electromyography signal observed at the skin surface

(sMES) is the sum of many small potential generated in the
muscles fibers [17]. Because of EMG signals are
non-stationary and have highly complex time-frequency
characteristics, these signals cannot be analyzed using
classical methods such as Fourier Transform.
Although the short time Fourier Transform can be used to
satisfy the stationary condition for such no stationary signal,
its performance depends on choosing an appropriate length of
the desired segment of the signal. To overcome that problem,
High Order Statistics, a widely used technique in signal
analysis [17], was used as a feature extraction method. The
process to obtain the myoelectric signal is shown in Figure 1.
As shown in Figure 1, the classification problem can be
divided into the stages of feature extraction, dimensionality
Fig. 1: sMES Classification Bloc Diagram.
reduction, and pattern classification. For sMES classification,

several techniques have been used, such as time domain
179
ISSN 977-2177-128009
function can be computed from the equation [22]
features, autoregressive coefficients, cepstral coefficients and

wavelets transform coefficients. Unfortunately, the previous
methods computationally intense processes when multiple
feature extraction is required, because they use high
complexity algorithms and have intensive memory
requirements for its operations. On the other hand, HOS
feature extraction method is easy to implement, and require
only simple arithmetic operations that can easily realized for
programmable devices, such as FPGA or DSP.
M k (1 , 2 ,L k ) =
where N is the length of each data record, and l1, l2 are the
maximum and minimum lag of the cross moment functions
respectively.
IV. ARCHITECTURAL DESIGN
Despite its complexity, the evaluation process can be
simplified by converting the Equation 6 into an iterative
matrix multiplication to compute the third-order cross
moment, as described in [7] and [21]. Let Mi be a matrix
whose elements are samples of third-order cross moments
defined in Equation 6. Mi is given by equation 1, where i =
q, q + 1,, q, and q is the maximum lag of third order
cross moment function.
In that way, all the third-order cross moments are evaluated
by computing the entries for matrix Mi for different values of
i. So that, if the entries for matrix Mi were calculated by
performing the multiplication auxiliary matrices XYi, the
third-order samples matrix Mi, are equal to the product of X,
Yi and Z matrices, where X is a (2q+1)*N matrix, Z is an
N*(2q+1) rectangular matrix (Z=X(x2T)) and Yi is a diagonal
square matrix with entries x0(0)x3(i), x0(1)x3(1 + i),
x0(2)x3(2 + i), x0(n)x3(N-1 + i), where x0(0)x3(i) is the
non zero element in the first row, x0(1)x3(1 + i) is the non
zero element in the second row, and so on. The complete
formulations to construct the matrices are show in Equation 7.
III. FEATURE EXTRACTION

Historically, stochastic process has been limited to second
order analysis for stationary signals. Since a Gaussian
stochastic process is totally defined by first and second order
statistics, it can be analyzed using only their autocorrelation
and power spectral density (PSD). Nevertheless, the second
order statistics is deficient in the analysis of certain type of
signals, and it is necessary to extend the analysis to statistics
of high order. That is the case of biomedical signals, which
are widely used for analysis of non-gaussian non-stationary
processes [17- 20].
Despite the statistical moments, described by Equation 1,
provide enough information of the probability density
function (PDF), and their mathematical characteristics can be
been useful in the signal analysis, their evaluation is highly
complex (Equation 2-5). The cumulants of a process x(t) is
given by:
k 1
M k , x ( 1 , 2 ) = E x(t )C x(t + i )
i =1
(1)
0
0
0
L
M
M
0
x
1 (0)
Mi =
x1 (0)
x1 (1)
x1 (2)
x1 (1)
M
M
x1 (q) x1 (q + 1)
It can be observed that he cumulants and the statistics

moments are widely related. If x(t) is a process with zero
mean, the k-th cumulant Ck,x can be calculated using the
relation between cumulants and statistical moments of the
process. Equations 2-5 show the relations to evaluate second,
third and fourth order cumulants [20, 21].
C 1 , x ( 1 ) = E {x ( t ) }
C 2 , x ( 1 ) = E {x ( t ) x ( t + 1 ) }
C 3 , x ( 1 , 2 ) = E {x (t ) x (t + 1 ) x ( t + 2 )}
(2)
C 2 , x ( 1 )C 2 , x ( 2 3 )
0
L x1 (N q 1)
L x1 (N q)
M

0
M
M

L x1 (N 2) x2 (0)
*
L x1 (N 1) x2 (1)

0
L
M

x (N q)
M
M
2

0
L
x2 (N q 1)
L x2 (q)
M
M
L x2 (2q 1)
L x2 (2q)
*Y
i
L
M
M
M
0
L
0
L
(3)
(4)
(5)
C 2 , x ( 2 )C 2 , x ( 3 1 )
C 2 , x ( 3 )C 2 , x ( 1 2 )
where E{x(t)} is the expectation operator, x(t) is the sMES
sample, and represent the order of cumulant to be computed.
If x(n) is a 0 mean process, the K-th-order cumulant are equal
to the k-th cross moments, so 1, 2,and 3 values are
considered constants to calculate the third and quarter order
cumulants [20]. Therefore the higher order cross moment
(7)
The second array MM2, multiplies XYi by Z. The block

diagram for the full computation process is shown in Figure
2. It uses two arrays; the first of them, MM1, performs the
multiplication of X matrix (a Henkel Matrix), by a diagonal
matrix Yi and the results are sent to array MM2.
C 4 , x ( 1 , 2 , 3 )
= E {x (t ) x (t + 1 ) x (t + 2 ) x (t + 3 )}
1 l2
x0 (n)xl1 (n + 1 )Lxl 2 (n + k ) (6)
N nl1
Fig. 2: Architecture Block Diagram.
180
ISSN 977-2177-128009
provides data storage in the form of 18-Kbit dual-port blocks

and two input 18-bit multiplier blocks [21]. The
implementation of a single PE block is shown in Figure 4.
The number of PEs required for the computation of
third-order cross moments depends on the required number of
data N. Hence, the algorithm, which was formulated as
For matrix multiplication purposes, systolic array based

architecture was used, as shown in Figure 3. Systolic arrays
are often used to realize intensive algorithms with inherent
parallelization, and their major features are: (1) simple and
regular design; (2) concurrent design; and (3) nearest
neighbor communication.
Figure 3 also show the systolic architecture for array MM1
and equivalent for array MM2 for N1 = N2 =4. MM1 and MM2
consist of sixteen identical Processing Elements PE1 and PE2
respectively. Each processing element PE contains
Multiply-Accumulate (MAC) unit and each MAC unit
consists of a multiplier, adder, and a register for storage..
Each PE in MM1 array multiply the diagonal element of Yi
[Y11, Y22, Y33] by one element of matrix X during each clock
period. First column of each PE1s is responsible for
Fig. 4: Matrix Multiplier Architecture to implement just one PE to

calculate the matrix MM1 or MM2.
product of three matrices, becomes computationally intensive

[22], and the complexity is further increased by the parameter
q and N. Simulation results of the FPGA implementation are
shown in Fig. 5.
The designs achieve a maximum frequency of 44.570 MHz
Implementation results, generated by Xilinx ISE 9.2i, and are
Figure 3: Full Matrix Multiplier Architecture.
producing first column of the product XYi referred to as W in

the Figure 3, while the second column generates the second
column, and so on. The entries are stored in an output buffer
to be used later by next array MM2. In the same way, array
MM2, shown in Figure 4, computes the whole product of
(XYi) with Zi.
V. IMPLEMENTATION RESULTS
Because of their low cost and inherent reconfigurability,
Spartan-3 FPGAs are well suited for signal processing
applications, as the computation of third order cross
moments. The algorithm was coded in VHDL and realized in
a FPGA using Xilinx ISE 9.2i to synthesize and the
place-and-route process. Xilinx ISE simulator is used to
verify the design in simulation before it is implemented on
Xilinx Spartan-3 FPGA.
The Spartan-3 FPGA architecture consists of five
fundamental programmable functional elements, including
configurable Logic Blocks (CLBs) with RAM-based
Look-Up Tables (LUTs) to implement logic and storage
elements. It also includes an internal Block RAM, which
Fig. 5: Flow data sampled of the FPGA implementation Cross Moment

architecture by TLA HP 1663A.
reported in Table I.
As show in Table II, the higher percentage values for
181
ISSN 977-2177-128009
TABLE I
IMPLEMENTATION RESULTS
[3]
FPGA XC3s700AN FG484 -4

Resources
Utilization
Number of Slices
4608 out of 5888
Number of 4 Input LUTs
9206 out of 11776
Number of Multipliers
(2X9s) 18 out of 32
Minimum Period (ns)
22.436
Maximum Frequency
44.570 (MHz)
Power Consumption
(mW) 100
correct classification was obtained when use the full
architecture to calculate the third-order cross moment.
[4]
[5]
[6]
[7]
VI. CONCLUSIONS
[8]
TABLE II
[9]
CLASSIFICATION RESULTS FOR EACH FEATURE EXTRACTIO
TYPE
Flexion
Extension
Pronation
Supination
Percentage
Frequency
Time
Domain
73.7 (8.56)
76.2 (4.30)
50.3 (7.19)
75.20 (3.15)
68.8 (5.80)
106.13MHz
th
3 Order
3Features
90.0 (5.12)
63.7 (2.24)
68.8 (5.68)
89.0 (6.19)
85.4 (4.56)
82.73MHz
th
3 Order
4Features
93.4 (4.27)
89.7 (3.54)
88.8 (6.23)
92.6 (4.20)
89.37 (4.80)
44.57MHz
[10]
[11]
[12]
[13]
The calculus of HOS requires less computation compared

to the sMES features extraction using Wavelets transform or
Wavelets package transform, and improving the correctness
classification.
In order to calculate HOS, an FPGA based architecture for
the computation of third-order cross moments based on novel
matrix multiplication algorithm is presented; the use of a
unified parallel systolic architecture array allows processing
to computing the HOS at 23ns, with a reduced latency to
compute the first and second order moments. By
decomposing the original estimation algorithm into two parts
and by storing and reusing the computed products of
autocorrelation terms, we reduce data broadcasting and
computational redundancy.
Each PE of the linear array has one multiplier less, but it is
connected to its neighbors, so both synthesized designs, the
array MM2 and the linear array (MM1), exhibit the same
minimal latency, with the full array starting to provide results
a bit earlier.
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
ACKNOWLEDGMENT
Authors would like to thank the National Council of
Science and Technology (CONACyT) of Mxico for the
financial support given through the scholarship number
104369.
signal patterns, presented at the 19th Annu. Int. Conf. IEEE

EMBS/CMBEC 21, Montreal, QC, Canada, 1995.
Prakash, P., C. A. Salini, et al. (2005). "Adaptive whitening in
electromyogram amplitude estimation for epoch-based applications."
IEEE Trans Biomed Eng 52(2): 331-4.
Staudenmann, D., I. Kingma, et al. (2006). "Improving EMG-based
muscle force estimation by using a high-density EMG grid and
principal component analysis." IEEE Trans Biomed Eng 53(4): 712-9.
T. Kiryu, C. J. De Luca, and Y. Saitoh, AR Modeling of Myoelectric
Interference Signal During a Ramp Contraction, IEEE Trans. Biomed.
Eng., vol. 41, pp. 1031-1038, Nov. 1994.
Light, C. M., P. H. Chappell, et al. (2002). "Intelligent multifunction
myoelectric control of hand prostheses." J Med Eng Technol 26(4):
139-46.
A. Del Boca and D. C. Park, Myoelectric signal recognition using
fuzzy clustering and artificial neural networks in real time, in Proc.
IEEE Int. Conf. Neural Networks, vol. 5, Orlando, FL, 2004, pp.
30983103.
Al-Assaf, Y. and H. Al-Nashash (2005). "Surface myoelectric signal
classification for prostheses control." J Med Eng Technol 29(5): 203-7.
K. Englehart, Signal Representation for Classification of the Transient
Myoelectric Signal, Ph.D. Dissertation, Univ. New Brunswick,
Fredericton, NB, Canada, 2001.
D'Alessio, T. and S. Conforto (2001). "Extraction of the envelope from
surface EMG signals." IEEE Eng Med Biol Mag 20: 55-61.
Karlik, B., M. O. Tokhi, et al. (2003). "A fuzzy clustering neural
network architecture for multifunction upper-limb prosthesis." IEEE
Trans Biomed Eng 50(11): 1255-61.
R.O. Duda, P.E. Hart, and D.G. Stork, Pattern Classification, Wiley
Interscience, 2002., pp. 22432246.
Clancy, E. A., O. Bida, et al. (2005). "Influence of advanced
electromyogram (EMG) amplitude processors on EMG to torque
estimation " J Biomech 2006; 39: pp. 26902698.
Zecca M. Micera S. et al. Control of multifunctional prosthetic hand
by processing. The electromyographic signal. Critical reviews in
biomedical engineering. 2002.
G. Paraskevas et al. Study of the carry angle of the human elbow joint
in full extensin: a morphometric anlisis. Springer Verlag 2003.
Basmajian, JV.; de Luca, CJ. Muscles Alive - The Functions Revealed
by Electromyography. The Williams & Wilkins Company; Baltimore,
1985.
F.H.Y. Chan, Y.S. Yang, F.K. Lam, Y.T. Zhang, and P.A. Parker
(2000). Fuzzy EMG Classification for Prosthesis Control, IEEE Trans.
Rehabilitation Eng., volume 8, pages 305-311.
P. Paajarvi and J. P. Leblanc, Online adaptive blind deconvolution
based on third-order moments, IEEE Signal Processing Letters, vol.
12, No.12, Dec. 2005, pp. 863 866.
L. Wenkai, Blind channel estimation using zero-lag slice of third-order
moment, IEEE Signal Processing Letters, vol.12,No.10, Oct. 2005, pp.
725 727.
T. Tuan, S. Kao, A.Rahman, S. Das, and S.Trimberger, A 90 nm
lowpower FPGA for battery-powered applications, in Proc. of 14th
ACM/SIGDA Int. Symp. on FPGAs, Feb. 2006, pp. 3-11.
[21] S. M. Qasim and S. A. Abbasi, A Novel FPGA-based approach
for digital waveform generation using orthogonal functions, Journal of
Circuits, Systems and Computers, vol.16, no. 6, 2007, pp. 895-909.
S. A. Alshebeili, Estimation of higher-order moments via discrete
orthogonal laguerre functions, in Proc. of 3rd IEEE Int. Conf. on
Signal Processing, vol. 1, pp. 1114, Oct. 2006.
REFERENCES
[1]
[2]
Merletti, R., S. H. Roy, et al. (1999). "Modeling of surface myoelectric

signals--Part II: Model-based signal interpretation." IEEE Trans
Biomed Eng 46(7): 821-9.
K. Englehart, B. Hudgins, M. Stevenson, and P. A. Parker, A dynamic
feed forward neural network for subset classification of myoelectric
182
ISSN 977-2177-128009
CONTROL DE ROBOT MOVIL BASADO EN

FPGA
Karina Rosas Paleta, Dr. Arroyo Daz Salvador A.
rpkarina@gmail.com, arroyo@ece.buap.mx
funciones avanzadas y recursos para la creacin

rpida de prototipos de sistemas en chips [3], [4].
Resumen
Este artculo describe la implementacin de un robot
en forma de medusa basado en un FPGA Spartan 3
de Xilinx (Arreglo de compuertas programables) con
movimiento de avance hacia adelante y atrs con un
control PWM (Modulacin por ancho de pulso) y el
mismo robot controlado ahora con un PIC
(controlador de interfaz perifrico) con
comunicacin de bluetooth y una interface en matlab
GUI (interface grfica de usuario) con comunicacin
de bluetooth y pc.
Estos controles se desarrollaron para lograr una
versatilidad en el control teleoperada y el desarrollo
de prototipos de robots a bajo costo.
Tambin el desarrollo de la ecuacin matemtica de
uno de los tentculos aplicando la metodologa de
Euler Lagrange utilizando Simulink para
desarrollar un control de bloques PD solo como
simulacin pero sin implementarla dentro del robot.
En este caso se realizar un robot en forma de

medusa controlado por comunicacin bluetooth y
por una interface grfica de usuario Y la
metodologa utilizada en el desarrollo de este
robot y es necesario su movimiento para
aplicaciones de exploracin, vigilancia y
aplicaciones futuras.
Los control hbridos se basan en tcnicas de
codiseo de hardware/software (HW/SW) como
los sistemas embebidos, que tienen la flexibilidad
para poder integrar en forma modular los
algoritmos a un sistema completo.
II.
Palabras Clave Control, Bluetooth, Fpga, Simulink,

Pwm.
I.
Uno de los procedimientos ms empleados para la

obtencin de los modelos dinmicos de los robots,
en forma cerrada, es el basado en las ecuaciones
de movimiento de Lagrange. El empleo de las
ecuaciones de Lagrange para el modelado,
requiere de dos conceptos importantes: la energa
cintica y la energa potencial. A continuacin se
muestra la estructura mecnica para el brazo del
robot medusa.
INTRODUCCION
Un robot autnomo es un tema de investigacin

importante en los campos de la robtica avanzada
y sistemas inteligentes [1].
El robot medusa pertenece a la clasificacin de
formacin y enseanza e investigacin de la
Federacin internacional de la robtica que se
estableci en 1988.
El Robot medusa se controla mediante
comunicacin almbrica e inalmbrica con
bluetooth de forma teleoperada para que puedan
manejar la incertidumbre presentada en el entorno
de trabajo entorno de trabajo, al mismo tiempo
que se desempee en tiempo real.
Es deseable para desarrollar soluciones
especializadas dirigidas por hardware que
funcione a alta velocidad y que ofrecen ventajas
adicionales como la reconfiguracin y la
portabilidad de hardware programable de
dispositivos tales como Arreglo de compuertas
programables (FPGAs) en la actualidad ofrecen
DESCRIPCION
DEL
MODELADO Y CONTROL.
Donde:
El brazo est formado por 2 eslabones rgidos de
longitudes
y . Y masas
y
m2
respectivamente. Las uniones 1 y 2 son
rotacionales. Los desplazamientos del robot se
llevarn a cabo en el plano vertical x-y (Fig.1).
La distancia entre los ejes de giro y los centros de
masas se denota por
y
respectivamente. Y
expresan los momentos de inercia de los
eslabones con respecto al eje que pasa a travs de
183
ISSN 977-2177-128009
sus centros de masas y que es perpendicular al

plano x-y.
Los ngulos
que se mide desde la posicin
vertical hacia abajo y
que se mide a partir de la
extensin del eslabn 1 hasta el eslabn 2, siendo
ambos positivos en sentido contrario al
movimiento de las manecillas del reloj.
(5)
[
[
El modelo dinmico obtenido por las ecuaciones

(5 y 6) est compuesto por ecuacin (7).
Donde:
= Matriz de inercias.
= Matriz de Coriolis
= Vector de Gravedad
= Matriz de Friccin
= Par aplicado
Figura 1 Forma de un brazo del robot.
Las coordenadas del centro de masa para el

eslabn 1 en el plano x-y son:
La forma matricial de la ecuacin (7)
][ ]
][ ]
[ ]
Para el clculo de las coordenadas de masa para el
eslabn 2 en el plano x-y son:
El control Proporcional-Derivativo (PD) es una
extensin inmediata del control Proporcional con
retroalimentacin de velocidad.
La ley de control est formada no solo por un
trmino proporcional al error de posicin como
el controlador Proporcional con retroalimentacin
de velocidad, sino tambin por otro trmino
proporcional a su derivada al error de velocidad .
Aplicando el mtodo de Lagrange que consiste en

cuatro pasos:
Paso1: Clculo de la energa Cintica.
Paso2: Clculo de la energa Potencial.
Paso3: Clculo del lagrangiano.
Paso4: Desarrollo de las ecuaciones de Lagrange.
Se obtiene las ecuaciones dinmicas que modelan
el brazo de la medusa.
La ley de control PD viene dada por:
Donde:
=Par aplicado al eslabn 1
=Par aplicado al eslabn 2
Sustituimos la ecuacin dinmica (7) con la

ecuacin PD (8) y obtenemos el modelo de
control PD dinmica ecuacin (9).
[
]
184
ISSN 977-2177-128009
Las variables de estado para el modelo dinmico

son las posiciones y y las velocidades y
.
El modelo dinmico expresado en variables de
estado es:
]
(10)
[ ] [
[
]
En trminos de diagrama a bloques para el

control PD el modelo queda expresado en la
siguiente manera.
Figura 3: Arquitectura del hardware de

comunicaciones.
IV.
Como se seal el robot medusa se controla

por una tarjeta FPGA Spartan 3 de Xilinx
con comunicacin almbrica con PWM Y
para poder moverse hacia adelante y atrs con
los push-bottom de la misma tarjeta y el
mismo robot controlado ahora con un PIC
con comunicacin de bluetooth y una
interface en matlab GUI que controla el
movimiento de dicho robot por medio de las
teclas de a computadora. El bluetooth se
comunica con la PC por comunicacin RS232 para genera, las seales PWM de control
de los motores.
La informacin se enva desde una
comunicacin serie asincrnica a 9600bps
Figura 2: Diagrama de control PD para el Brazo

de 2 grados de libertad de la medusa.
III.
ELECTRNICA
Una vez armada la estructura mecnica del

prototipo es necesario aadir la electrnica que
pueda implementar el control de manera digital
para que el prototipo cumpla con los requisitos de
bajo consumo, modularidad, etc.
El robot medusa tiene
una serie de
particularidades que hacen que la electrnica
tenga que cumplir el principal requisito de ser
flexible y modular, de modo que permita
incorporar mejoras futuras. La solucin est
basada la comunicacin almbrica con una tarjeta
spartan 3 y un puente H y con comunicacin
inalmbrica con Pc y con bluetooth y un puente
H, la cual controla el mecanismo desde la
computadora con una interface GUI (interface
grfica de usuario).
DISEO Y CONSTRUCCION
DEL PROTOTIPO
Figura 4: Arquitectura del sistema desde un puerto

COM virtual, se enva un carcter de 8 bits a
velocidad de 9600bps.
185
ISSN 977-2177-128009
Figura 6: Control del brazo en Simulink

V.
SOFTWARE DE CONTROL
El mdulo de interface GUI se desarrolla en

matlab.
Esta interface se conecta al puerto virtual COM14
del bluetooth y a la tarjeta de control bluetooth del
PIC para controlar los motores con pwm mediante
las teclas H, J Y K de la computadora.
Esta misma interface nos dice si el dato se recibi
el dato el PIC.
Tambin cuenta con un botn de envi de teclas y
conexin.
Figura 6: Grfica de posiciones deseada para

eslabn 1 y eslabn 2
Figura 7: Grfica de posiciones deseada para error

eslabn 1 y error eslabn 2 tienden a 0.
Tambin tenemos como resultados el control
almbrico con el FPGA funcionando.
Figura 5: Interface de control con comunicacin

Rs-232 para el bluetooth.
VI.
RESULTADOS
La anterior metodologa empleada muestra los

siguientes resultados se desarrollo un diagrama
bloques dentro de Simulink y observamos el
control Pd y las siguientes posiciones deseadas
para eslabn 1 y eslabn 2 y Posiciones de error
1,2 tiende a cero.
En este caso introducimos las variables de
posicin eslabn 1
y eslabn 2
.
Figura 7: Control almbrico
186
ISSN 977-2177-128009
Y por ltimo el control Inalmbrico funcionando

con el Pc y Pic va bluetooth
difusa XV congreso de la asociacin chilena de

control automtico Octubre 2002.
[4] Prabhas Chongstitvatana. A FPGA-based
Behavioral Control System for a Mobile Robot.
IEEE Asia-Pacific Conference on Circuits and
Systems, Chiangmai, Thailand, 1998.
[5] J. J. Rodriguez-Andina, M. J. Moure, and M.
D. Valdes, Features, design tools, and application
domains of FPGAs, IEEE Transactions on
Industrial Electronics, vol. 54, no. 4, pp. 1810
1823, August 2007.
[6] Reza N. Jazar. Theory of Applied Robotics.
Springer, 2010.
[7] Seref Sagiroglu, Nihat Yilma M. Arif Wani
Web Robot Learning Powered By Bluetooth
Communication System. Department of Computer
Engineering, Faculty of Engineering-Architecture,
Gazi University, Ankara, TURKIYE
[8] Ms. Shilpa Kale FPGA-based Controller for a
Mobile Robot Dept. of Electronics &
Telecommunication Engg. IJCSIS).
International Journal of Computer Science and
Information Security,
Vol. 3, No. 1, 2009
[9] Zvonko Vranesic Steve Brown. Fundamentos
de lgica digital con diseo VHDL. Mc Graw
Hill, 2000.
[10] Bluetooth modules
http://www.rovingnetworks.com/products/RN_41
[11] Matlab
http://www.mathworks.com/
[12] Xilinx ISE WebPack 13.2. Disponible en
http://www.xilinx.com/ise/logic_design_prod/web
pack.htm.
Figura 8: Control Inalmbrico
VII.
CONCLUSIONES
Las arquitecturas del FPGA y el Pic establecen

una organizacin clara y eficiente de los
elementos de software del robot, proporcionando
una metodologa para el desarrollo sistema de
control y para la posterior implementacin de
tareas particulares. Es verstil puesto que permite
implementar distintos tipos de tareas y favorece la
reutilizacin de cdigo.
Y sobre la implementacin del modelado se
requiere una plataforma compleja donde se pueda
implementar dicho modelado.
AGRADECIMIENTOS
Agradezco a Conacyt por el financiamiento y a la
Facultad de Ciencias de la Electrnica por el
apoyo de la realizacin de este proyecto.
REFERENCIAS
[1] Balaguer C. Aracil R. Barrientos A., Pein L.F.
Fundamentos de Robtica, McGraw-Hill, 1997.
[2] J. O. Gray, Recent developments in advanced
robotics and intelligent systems, Comput.
Control Eng. J., vol. 7, no. 6, pp. 267276, Dec.
1996.
[3] Carrasco Rodrigo Cipriano Aldo. Sistema de
Guiado para un robot mvil basado en lgica
187
ISSN 977-2177-128009
Multiport Analysis of Two-Dimensional Nanosystems

in a Magnetic Field Based on the NEGF Formalism
Victor H. Vega
Luis Enrique Erro No. 1, Puebla, 72840, Mxico
Email: hvega@inaoep.mx
Edmundo Gutirrez
Luis Enrique Erro No. 1, Puebla, 72840, Mxico
Email: edmundo@inaoep.mx
AbstractModeling and characterization of nano-scaled

semiconductor devices requires the electron wave nature to be
accounted for. The Non-Equilibrium Greens Function (NEGF)
is an optimal formalism to account for these wave-nature
quantum effects. Therefore, a systematic description of different
multiport nanostructures, modulated by the action of a magnetic
field, is introduced. This approach allows a full comparison of
the transmission probability and current-voltage relationship
under different electrostatic potential profiles, and serves for the
purpose of device design optimization.
I.
cases in the analysis that agree with the theory of quantum

transport are presented to validate the results.
This document is organized as follows: Section II
describes the NEGF formalism. Section III shows the quantum
transport analysis of one-dimensional systems, and Section IV
extends the analysis to bi-dimensional multiport structures
under the influence of a magnetic field and shows the
simulation results.
II.
INTRODUCTION.
NEGF FORMALISM.
With the purpose of improving device performance and

increase integration density, modern transistors are down
scaled to a 10 nm size and below. At these device dimensions,
electrons (or holes) travel from source to drain without
suffering inelastic scattering. Under this condition, the
dynamics of the carriers in the channel is deterministic and is
represented by the Schrodinger equation. The NonEquilibrium Greens Function (NEGF) formalism provides an
efficient manner to impose the appropriate boundary
conditions needed to solve the Schrodinger equation.
Therefore, the NEGF formalism can be used to analyze the
wave nature of the carrier transport in a large variety of nanoscaled semiconductor devices.
The kind of devices analyzed in this work is one composed

by reservoirs and baths that surround and perturb the
nanoestructure of interest. A reservoir is a subsystem that is
assumed to remain in local equilibrium all the time and that
can exchange particles with the nanostructure. A bath is a
subsystem that only exchanges energy but not particles. It is
clear that the transistor is in this category. The source and
drain contacts constitute the reservoirs, the ideal gate is the
bath that is able to induce an electric field and the channel is
the part of interest. In this case, it is said that the channel is
open to the contacts. It must be emphasized that the analysis
will include only elastic scattering inside the channel and that
all the heat dissipation occurs in the contacts.
There are a variety of nano-scaled MOSFET models.

However, many of them apply semiclasical approaches that
treat the electrons only as particles, neglecting the possible
phenomena resulting of the wave nature of the electrons [1,2].
Some models include the influence of a magnetic field
classically, which is incorrect at high intensities of a magnetic
field [3]. Other descriptions consider only two ports or include
the influence of another port by post-processing the resulting
two ports data [4]. There are quantum models of onedimensional nanostructures [5,6], but in this case the charges
cannot be deflected by a magnetic field. Therefore, a multiport
analysis that uses a magnetic field as an auxiliary modeling
tool, will render in the development of a wave-nature new
transistor model, very much required for the new 10nm scale
era.
An open system like the transistor is mathematically

described by the Hamiltonian of the channel (H) and the selfenergies corresponding to the source and drain contacts
(! , ! ) [7]. The Hamiltonian represents the dynamics of the
charge carriers inside the channel (kinetic and potential
energy) and the self-energies represent the coupling degree
between the contacts and the channel (see Fig. 1).
To overcome the drawbacks of previous works, a

systematic analysis of different representative nanostructures
that can explain the non-ohmic behavior of some nanosystems
under certain polarization conditions. In addition, extreme
Fig. 1. Schematic view of the channel coupled to the contacts.
188
ISSN 977-2177-128009
The time-independent Schrdinger equation for one

particle corresponding to an open system is (in matrix form):
! ! =
from source to drain. Transistor action occurs by modulating

the height of an energy barrier.
(1)
Where is the total energy; is the identity matrix of the

same size as the Hamiltonian of the isolated channel ; is
the wave function and corresponds to an external
perturbation (current and voltage probes). This equation is the
discrete version of a non-homogeneous differential equation.
The Hamiltonian of the isolated channel including the
electromagnetic fields is:
=
!!! !
!!
(2)
Where and are the vector and scalar potentials that

correspond to the electromagnetic field and is the effective
mass. represents a closed system and does not describe the
in and out flux of electrons from the channel. On the other
hand, ! ! can be considered as an effective
Hamiltonian that allows the desired charge flux to occur when
the appropriate boundary conditions are imposed by the
contacts. The energy dependent quantities ! and ! can be
numerically determined by a recursive procedure [7].
Since this work focuses on determining macroscopic
observables such as the current, it is important to determine
the probability that an electron injected from a contact into the
channel has to be transmitted to the other contact. The
transmission probability is defined as (7):
= ! ! !
(3)
! = ! !! , ! = ! !!
(4)
Where,
= ! !
!!
(5)
Note that is the Greens function of the open system and ()

represents the conjugate transpose operator. Finally, the
equation for calculating the current as a function of the
transmission is (7):
=
! !
()
! !
! !
(6)
Here, ! and ! are the Fermi functions of the source

and drain contacts, respectively.
! =
!!!!
!! !
!!
+1
, ! () =
!!!!
!! !
!!
+1
(7)
Where ! and are the Boltzmann constant and the

temperature, respectively; ! and ! are the electrochemical
potentials of the source and drain contacts, respectively.
III.
ONE-DIMENSIONAL ANALYSIS (B=0)
The band diagram or potential profile () versus position

of a MOSFET under different polarization voltages is depicted
in Fig 2. At low gate voltages, the energy barrier between the
source and drain is high, and the device is off. A high drain
bias lowers the energy in the drain, and when a high gate
voltage lowers the potential energy barrier, electrons flow
Fig. 2. Energy band diagrams under (a) low drain bias and
(b) high drain bias. The parameter is the gate voltage.
The well-known structure that shows the result of
considering the electrons as quantum waves is the doublebarrier structure, which consists of two tunneling barriers in
series. The Fig. 3 shows the resultant transmission for the
double-barrier with a height of 0.4 eV. Note that electrons
with energy E=0.3 eV have a unity transmission probability.
The Fig. 4 depicts the corresponding current-voltage
relationship ! !" . We considered the effective mas of the
gallium arsenide at room temperature for all the simulations.
IV.
TWO-DIMENSIONAL ANALISYS
The presence of transversal modes in a two-dimensional

system gives place to discrete subbands or transmission modes
(separated from each other by a certain amount of energy).
Hence, more than one electron with the same total energy can
be transmitted simultaneously in analogy with an
electromagnetic waveguide where different propagation
modes can coexist. In order to save simulation time, the
chosen structure to be analyzed is simple and small. The
structure is a nanowire of 6 nm wide and 15 nm long.
189
ISSN 977-2177-128009
its effects. Despite this, the analysis does not lack of

generality. For instance, in the case of a real MOSFET that
has a width of 1 , the magnetic field intensities at which
quantum phenomena become evident are relatively small (for
classical effects the intensity needed is even smaller). Fig. 6
depicts the transmission of the quantum wire of the past
section for different magnetic field intensities. The ballistic
case is shown as a reference. The important thing to note is
that at high enough magnetic field intensities (B = 2100 T) the
transmission becomes ballistic again. This is evident from the
formation of discontinuities. Fig. 7 shows that even in the
presence of a potential barrier, ballistic transmission can be
achieved! This is known as the quantum Hall effect. In this
regime the carrier transport is carried out near to the edges of
the structure. The carrier states responsible of this are known
as edge states. Besides, quantized transmission give rise to the
quantized conductance and within the quantum Hall effect
regime the quantization is so precise that it has been
established as an electrical resistance standard [8].
Fig. 3. Electron transmission probability as a function of

energy for electron motion across the double-barrier structure.
Inset: Potential profile versus position.
Fig. 5. Transmission of a two-dimensional structure for

different barrier heights.
Fig. 4. Drain current (! ) from a series of applied drainsource voltages (!" ) to a double-barrier structure.
A. Diferent barrier heights (B=0)
In order to visualize the activation of the different
transmission modes, the simulated transmission of the
nanowire under study (6 nm wide and 15 nm long) is shown in
Fig. 5. For an absent potential barrier (barrier height = 0), it
was obtained a ballistic transmission. The onset energies of
the different transmission modes are clearly shown by the
discontinuities (E=0.2, 0.5, 1.2, 2.1, 3.3, 4.6). Thus, one
transversal mode propagates for the range of energy between
0.2 and 0.5eV; two transversal modes propagate for the range
of energy between 0.5 and 1.2eV and so on. Note that the
transmission becomes continuous and lowers in magnitude in
the presence of a potential barrier as expected (Fig. 5).
B. Including the magnetic field ( 0)
The influence of a perpendicular magnetic field (normal to
the surface of the two-dimensional structure) is added to the
analysis. Because the structure under analysis is too small,
very high intensities of a magnetic field are needed to observe
190

different magnetic field intensities. The potential barrier is
absent for all cases.
ISSN 977-2177-128009
The MOSFET can be considered as a three-port (four-port)

system if the gate is considered as a contact (if the gate an
bulk have a contact).

different magnetic field intensities. The height of the potential
barrier is 0.4 eV in all cases.
C. Multiport analysis (B=0)
In general, the presence of active or inactive ports modifies
the quantum transport properties of the device being analyzed
(see Fig. 8) [7]. Fig. 9 shows, as references, the ballistic
transmission from port 1 to port 2 (from port 3 to port 4)
without the presence of ports 3 and 4 (ports 1 and 2). It is
interesting how the transmission from port 1 to port 2 lowers
if port 3 or port 4 (ports 3 and 4) is (are) present inactively
(there is not current flux across these ports).
Fig. 9. Transmissions for the multiport two-dimensional

system.
Relation (9) is known as the Onsager-Casimir reciprocity
relation for nanosystems. Fig. 10 shows that the simulator
developed for this analysis matches the predicted behavior by
the Onsager-Casimir reciprocity relation validating the
analysis.
Fig. 8. Four-port two-dimensional system.

D. Multiport analysis including the magnetic field ( 0)
The transmission for all the systems analyzed until now is
symmetric (time reverse invariant) [9]. This means that the
transmission from port m to port n ( ) is the same as
the transmission from port n to port m ( )
( ) = ( )
(8)
However, with a magnetic field present this is no longer

true. In general, the only thing that can be said is:
( )
!!
= ( )
!!
(9)
The only exception is the two-port system:

(1 2)
!!
= (1 2)
!!
(10)
Fig. 10. The Onsager-Casimir reciprocity relation is

matched.
E. Drain current-gate voltage relationship (! ! ) of the
three- port two-dimensional system in a magnetic field
with opposite directions.
Fig. 11 shows the impact of changing the magnetic field
direction to an opposite side on the drain current-gate voltage
curve. The main thing to consider is that for V! < 0.8 the
respective values for I! is larger for the curve corresponding
to the positive magnetic field and for V! > 0.8 the
corresponding values for I! are greater for the negative
magnetic field.
191
ISSN 977-2177-128009
V.
CONCLUSIONS
We introduced a model and analysis of a different variety of

semiconductor nanostructures, where the wave-particle nature
is accounted for when considering the devices like multi-port
waveguide-like. The influence of an external magnetic field as
well as different potential profiles have been also accounted
for, which serves as an exploratory tool to study magnetoquantum effects in nano-scaled semiconductor devices, like
MOSFETs, FinFETs, nano-wires, and quantum dots. We
believe this analysis and model may serve also as an
additional tool for helping device designers to optimize the
electrical performance of nano-scaled semiconductor devices.
Fig. 11. Relationship I! V! of the three-port twodimensional system in a magnetic field with opposite
directions.
REFERENCES
[1]
Mark S. Lundstrom et al, Nanoscale transistors: device physics,

modeling and simulation, Springer, 2006.
[2]
Y. Tsividis, "Operation and Modeling of the MOS Transistor," 2nd

Edition, McGraw-Hill, New York, 1999
[3]
R. S. Popovic, Hall Effect Devices, 2nd Edition, UK, 2004.
[4]
Vosloo, W.L., Holtzhausen, J.P., The prediction of insulator

leakage currents from environmental data, IEEE AFRICON. 6th,
Issue Date: 2-4 Oct. 2002, pages: 603 - 608 vol.2.
[5]
P. Harrison, Quantum Wells, Wires and Dots: Theoretical and

Computational Physics of Semiconductor Nanostructures, 2nd
Edition, Willey, 2005.
[6]
A. F. Levi, Applied Quantum Mechanics, Cambridge University

Press, 2003.
[7]
S. Datta, Electronic Transport in Mesoscopic Systems,

Cambridge University Press, Cambridge, UK, 1996.
[8]
B. Jeckelmann and B, Jeanneret, The Quantum Hall Effect as an

Electrical Resistance Standard, in Magnetism, vol. III, G. T. Rado
and H. Suhl, Eds. New York: Academic, 1963, pp. 271350.
[9]
L. I. Schiff, Quantum Mechanics, 3th Edition, McGraw-Hill,

1995.
192
ISSN 977-2177-128009

Proceedings I Ber Chip 2012

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Proceedings I Ber Chip 2012

Transféré par

Droits d'auteur :

Formats disponibles

IBERCHIP 2012

IBERCHIP 2012 Program Co-Chairs Message

Jos Luis Huertas

Carlos Torres-Torres. Electromagnetic blooming by vectorial laser irradiation

Thiago Brito Bezerra, Antonio Petraglia and Sebastian Yuri Cavalcante

Proceedings of the XVIII International IBERCHIP Workshop

Playa del Carmen, Mexico, February 29-March 2, 2012

Proceedings of the XVIII International IBERCHIP Workshop

Playa del Carmen, Mexico, February 29-March 2, 2012

Proceedings of the XVIII International IBERCHIP Workshop

Playa del Carmen, Mexico, February 29-March 2, 2012

Proceedings of the XVIII International IBERCHIP Workshop

Playa del Carmen, Mexico, February 29-March 2, 2012

Proceedings of the XVIII International IBERCHIP Workshop

Playa del Carmen, Mexico, February 29-March 2, 2012

Proceedings of the XVIII International IBERCHIP Workshop

Implementacin sobre FPGA de un sistema de

sistema recibe la imagen de una cmara y la almacena en la

AbstractEn esta comunicacin se presenta un sistema

La deteccin de caras constituye una tarea importante en

En segundo lugar se usa un clasificador simple y eficiente

El principal xito de un algoritmo de deteccin de caras

Fig. 1. La suma de pxeles del rectangulo D se calcula mediante la siguiente

En tercer lugar el clasificador est constituido combinando

Recientemente se han realizado algunas propuestas de

Playa del Carmen, Mexico, February 29-March 2, 2012

ALGORITMO DE DETECCIN DE CARAS DE VIOLA-JONES

El algoritmo de Viola-Jones [3] permite procesar imgenes

Proceedings of the XVIII International IBERCHIP Workshop

localizar caras de diferentes tamaos. Las primeras etapas

A. Aceleracin del algoritmo de Viola-Jones

Fig. 2. Ejemplos de caractersticas de tipo Haar.

Las caractersticas tipo Haar de la distribucin OpenCV han

Fig. 3. Arquitectura de detectores en cascada

La aplicacin de deteccin de caras de OpenCV se

El algoritmo de deteccin de objetos de Viola-Jones

Modo 1: escalando la imagen. En este modo la imagen es

) se obtienen una sola vez de la imagen

original. Sin embargo las caractersticas tipo Haar de los

HaarFeature Sum Area I Weight Iscaled

Area representa la suma de todos los pxeles dentro de un

donde J=[12135] representa el ndice de la caracterstica

Fig. 4. Sistema hardware-software para deteccin de caras

Playa del Carmen, Mexico, February 29-March 2, 2012

Proceedings of the XVIII International IBERCHIP Workshop

Si no se escalan los pesos de las caractersticas de tipo de

controlado por el componente Imse_stage_evaluator_unit

B. Mdulo IP para aceleracin hardware

Fig. 6.Diagrama de bloques del mdulo IP IMSE_OBJECT_DETECTION

El mdulo IP tiene una memoria compartida basada en una

Fig. 5. Algoritmo propuesto para la aceleracin en la deteccin de caras

Para mantener un alto grado de flexibilidad y poder

El sistema de deteccin de caras propuesto trabaja con

Como se ha mencionado anteriormente el mdulo IP

Playa del Carmen, Mexico, February 29-March 2, 2012

El sistema ha sido implementado en una FPGA de Xilinx

Proceedings of the XVIII International IBERCHIP Workshop

10,962 flip-flops (38% del dispositivo). El consumo de

Con objeto de medir las prestaciones del sistema empotrado

El software OpenCV portado al sistema empotrado.

En la figura 7 se muestran los resultados obtenidos. Puede

Se ha descrito el diseo e implementacin de un sistema de

J. Cho, S. Mirzaei, J. Oberg, and R. Kastner, Fgpa-based face detection