Image Processing

Passport Photo Compression Technique
With JPEG2000

Jun Hou
1
, Ran Li
2

1. Shanghai Key Lab of Modern Optical System
University of Shanghai for Science and Technology
Shanghai, 200093, China

fjshj@hotmail.com
Yan Cheng
3
, Haojie Shi
2

2. School of Optical Electrical and Computer Engineering
University of Shanghai for Science and Technology
3. East China University of Political Science and Law

AbstractA formal facial image compression algorithm in
JPEG2000 standard coding is proposed in this paper. Object
segmentation technique is used to locate the facial region
firstly. Greyscale and projection are applied. By the analysis of
minima and maxima, facial features are extracted and located.
Secondly, the background is reassigned to decrease those less
valuable information in final bitstream. Finally, the facial
region is compressed using the region of interest coding
technique in JPEG2000. The produced bitstream is totally
compatible with the JPEG2000 standard one. As a result, the
image compressed by the proposal algorithm could be
delivered widely and browsed by many picture tools that
support JPEG2000 format. Experiments demonstrate the
efficiency of the method.
Index Term - formal ID photo compression, JPEG2000
compression, region segmentation, up-shifting
I. INTRODUCTION
Compressing human facial image efficiently has been
explored for years. Geometric pre-analysis of the image is
utilized in [1] and [2] to reach this aim. It uses the feature
detection to locate those semantic landmarks. A principal
component analysis (PCA) is applied in [3] to train a
transform that could optimize the images compact energy. A
previous work in [4] proposes a transform training and then
processes those small tiles. Independent component analysis
(ICA) for the representation of image tiles is used. Treating
the group of images as a 3-D tensor, [5] and [6] decompose
images into three-way rank-one approximation. In [7], a
given face is deformed into a canonical form, in which the
same facial features are mapped to the same spatial locations.
A tree of vector-quantization dictionaries is constructed for
each location. The work in [8] trains K-SVD dictionaries for
predefined image patches. The encoding is based on sparse
coding of each image patch using the relevant trained
dictionary. In [9], the spatial content and the frequency
distribution of each image are combined to produce a
quantization scheme, which indicates spatial and frequency
difference for each image being processed. The shape-
adaptive discrete wavelet transforms (DWT), which can be
applied to objects with arbitrary shape, is introduced in [10].
Although the work in [10] achieves high compression
efficiency, it requires a predefined codebook in decoder. If
the image resolution changes, a new codebook is required.
Classified energy and pattern blocks(CEPB) are constructed
using the training blocks and located at both the transmitter
and receiver side. Matching processing is used to determined
the index of CEPB which matches the input CEPB best. The
scheme demands special lookup table on both side.
JPEG2000 is widely used in the medical field and digital
photography. It achieves better compression quality than
older methods of compression. Getting closer to our
approach, the work in [11] uses JPEG2000 coding to
compress facial image. It adopts the freeman chain code and
its arithmetic code to describe the background boundary. The
facial area is then up-shifted a bitplane. Unfortunately, the
bitstream produced is not compatible with the standard.
Therefore, it has lots of application restrictions. Based on
simple energy measures, the most perceptually significant
region of the image is discerned in [12]. However, the
effectiveness depends on the accuracy of this classification,
which in turn limits its application.
In this paper, we address the compression of color frontal
facial image in JPEG2000 coding. Our purpose is to produce
bitstream compatible with the standard one and achieve
higher compression quality at the same time. Every codec
that supports images in JPEG2000 format can understand the
bitstream without additional requirements. The images we
deal with in this paper are passport-type photosface frontal
view, plain background, no dark glasses, without hats and
other nonstandard clothing. An example of such is shown

Fig. 1 An example of passport-type picture
1389 978-1-4673-5560-5/13/$31.00 2013 IEEE
Proceedings of 2013 IEEE
International Conference on Mechatronics and Automation
August 4 - 7, Takamatsu, Japan

in Fig. 1. The image is processed by object segmentation
techniques and followed by region of interest (ROI) coding.
The paper is organized as follows. Section II presents a
detailed description of the proposal algorithm. In section III,
the scheme is demonstrated and compared to previous
algorithms. Conclusion remarks are given in section IV.

II. PROPOSAL ALOGRITHM
A frontal facial image can be divided into three parts:
facial region, body region and background. We treat these
regions with different schemes.
A. Region Segmentation
The background information is valueless in
understanding the image. Therefore, the proposal method
wastes no bit to this region. It firstly takes samples of the
background color in several locations. Then a linear
background model is constructed for the whole image to
compensate for non-uniform illumination.

= =
= = =
i
i u
j
j v
v u j i
j v i u if p p
2 2
, ,
)]} ( & ) [( ! {
8
1

Where
j i
p
,
is prediction value for pixel p
i,j
.
>
<
threshold
threshold
p p
j i j i , ,

else
Background j i ) , (

If (i, j) belongs to background, uniform it. By this way, the
background becomes uniform. The background is then
detected by edge operator and threshold decision, followed
by morphological filtering.
For facial feature extraction is based on the obervation
that in intensity color images, facial features differ from the
rest of the face because of their low brightness. In case of the
eyes, reasons for that are the color of the pupils and the
sunken eye sockets. The light red color of the lips
emphasizes the mouth aginst its surrounding region. Thus
come to the idea of applying intensity information in the
interior of the connected componets. The proposal algorithm
enhances facial features by applying greyscale erosion and
an extremum sharpening operation.
We use the min-max ananlysis method similar as [13] to
locate the key feature and then outline the face. Before facial
feature detection, set evey pixel in background to 0. At first,
the projections of the topographic greylevel relief of the
connected component is evaluated.
By searching for local minima and maxima, facial features
are extracted. The details are described as follows. Firstly,
the y axis projection is processed by computing the mean
greylevel value of each row of the connected component.
=
=
M
i
j i j
p
N
m
1
,
1

Where N*M is the size of image. Then local minima and
maxima are searched in the smoothed y direction. After the
gradien is calculated for the mean in y coordinate, we can
find the local significant minima according to
positive/negative gradient. For each significant minimum in
the y coordinate, values in x axis are computed by averaging
the greylevel values of three neighboured rows of each
column.
Similarly, after smoothing the x reliefs, we can get the
local minma and maxima. Beginging with the uppermost
minima of the y relief, we search throuth the list of minima
and maxima of x reliefs to find facial feature candidates. For
eyes, we look for two minima that meet the requirment of
eyse with consideration of relative position inside of facial
contour, as well as significance of maximum between
minima, and the ratio fo distance between minima to head
width and similarity of greylevel values. For mouth, we
search for two maxima that form the borders of the mouth
region. Thus we get a set of facial feature candidates. Next,
using the unsupervised min-max algorithm for clustering ,
those candidates are clustered according to their left and right
x coorinates. By this way, we can decrease the number of
candidates significantly and obtain representative candidates
for facial features. Then build all possible face constellations
and assess each of them according to the vertical symmetry
of the constellation, the distance between facial features and
the assessment of each facial featrue. By this way, facial
feature constellation are ranked and the best constellation are
considered as well. The key rank features used are shown in
Fig.2. Note that in fact we only focus on chin to outline the
facial contour, eyes and mouth are not our concern. But the
chin location is less accurate than other features. Thus we
have to locate eyes and mouth to verify the chin location
result.
After locating eyes and mouth, calculate the vertical
distance between them and denote it as DIS. Based on the
data, we can estimate the region where the tip of chin would
be. The distance between mouth and tip of chin is about
0.3*DIS0.65*DIS. Find the tip of chin and then start from
the chin to draw the contour of mandible based on the edge
image, which is achieved by the Sobel operator. Sometimes
the contour of mandible and hair are not closed. In most of
these cases, it is caused by the naked neck, which is almost
monochrome. So we simply start from the end of mandible
curve to find the nearest hair(outer) curve and link them
rough.
Finally, body region is detected by facial region
subtraction. Compression quality of body region can be

Fig. 2. Features used in facial contour detection
1390

lower than the facial one since it has very limited
information in person identification.
B. Quality Grade Image
JPEG2000 supports ROI coding. The standard supports
two ROI methods: the general scaling based method(GSBM)
and the maximum shift(MAXSHIFT) method. GSBM needs
to encode and transmit the shape information of an ROI,
resulting in an increase in the computational complexity as
well as the bitrate. Only rectangle and ellipse ROI shapes are
actually used. The MAXSHIFT scales up the coefficients
associated with an ROI well above the background. At the
decoder, the ROI and background coefficients are identified
by their magnitude. There is no need to explicitly transmit
the shape information of the ROI. Here we use the
MAXSHIFT to permit the facial region has the highest
priority.
Data in the body region is not upscaled. Information
about uniform background focuses in the LL subband and
increases the bitrate. Data in the body region contains more
information than that in background. Thus information about
body region should be coded before that of background.
Sometimes background is encoded earlier than the body
region due to its bright color. Its not consistent with our
intent: we want to encode the body region prior to the
background. Thus we set all components values in
background to zeros. The foreground cant be confused with
the zeroed background. In the foreground, the luminance is
always non-zero.
For the range 0xFFA0 to 0xFFAF, they are reserved for
users usage in JPEG2000 standard bitstream [14]. The data
within this range provide a perfect possible extension of
JPEG2000 bitstream to carry uniform background
information. There are usually three components for a color
image. Suppose each is 8-bit depth. Each component value is
right-shift four bits to get a hexadecimal figure
i
X (i=1
3). Thus for a color image, only three 16-bit data are
needed to code background region. Sending background
information is optional. For a large image, 48 extra bits may
be negligible in total bitrate. But for a small image, that may
be a large cost. The attached side information about
background can be extracted and multiplied by a factor,
which lies on the depth of the data, to retrieve background
roughly.
C. Algorithm Description
The steps of the proposal algorithm are described as
follows.
1) Make the background uniform. Record each
components value and denote them as , ,
b b
U Y and
b
V
separately.
2) Segment the foreground and the background.
3) Segment the facial region and the body one. Decide
hair color and outline hair. Use color model to detect the skin
region roughly and then locate eyes and mouth. Find chin
according to iris and position of mouth. Outline the facial
contour. Subtract the facial region to get the body region.
4) Set all components value in background to zeros
5) Compress image using JPEG2000 coding with
MAXSHIFT.
6) (Optional)Edit the side information about
background. Right shift
b b b
V U Y , , to get four-bit values:

Fig. 3. Compression result at 0.0625bpp. The image on the right is compressed by the proposal algorithm. The image on the left is coded by [11]
1391

,
b
Y ,
b
U ,
b
V separately. Add 0xFFA0 to ,
b
Y ,
b
U ,
b
V
respectively. Attach results to the end of the bitstream.
The produced bitstream is totally compatible with the
JPEG2000 standard. Any JPEG2000 codec can show the
foreground correctly. It does not affect the understanding of
the key information even if the receiver does not know what
the reserved data means. Since the background is unicolor,
every image-editing software can change it freely.

III. SIMULATIONS
The JPEG2000 encoder kakadu, which supports
MAXSHIFT, is used to compress images. Reversible
wavelet is applied to decompose images. Facial images are
also compressed by the algorithm in [11] for comparison.
The region of interest in [11] is up-shifted a bitplane. Fig. 3-
Fig. 5 illustrate the result at high compression ratio. The
bitrates are 0.05bpp(bit per pixel) and 0.0625bpp
respectively. The results using our method show better
performance in facial understanding.
IV. CONCLUSION
A frontal facial compression method was presented and
its performance is explored in this paper. It locates
background firstly. Then object segmentation technique is
used to divide foreground into the facial region and the body
one. Facial region is set as the ROI in the following
JPEG2000 compression. Moreover, uniform background can
be set to zeros before the wavelet transform to reduce the
bitrate that background occupies in the LL subband.
Background information could be sent as user-reserved data
or omitted. The proposal approach achieves better
performance. The produced bitstream is totally compatible
with the JPEG2000 standard.
REFERENCES
[1] M. Sakalli, H. Yan, and A. Fu, A region-based scheme using RKLT
and predictive classified vector quantization, Comput. Vis. Image
Understand., vol. 75, no. 3, pp. 269280, 1999.
[2] O. N. Gerek and C. Hatice, Segmentation based coding of human
face images for retrieval, Signal Process., vol. 84, no. 6, pp. 1041
1047, 2004.
[3] B. Moghaddam and A. Pentland, An automatic system for model
based coding of faces, Proc. Conf. Data Compression 1995 (DCC
1995), pp362-365, Mar, 1995.
[4] A. J. Ferreira and M. A. T. Figueiredo, On the use of independent
component analysis for image compression, Signal Process: Image
Commun., vol. 21, no. 5, pp. 378389, 2006.
[5] K. Inoue and K. Urahama, DSVD: A tensor-based image
compression and recognition method, Proc. IEEE Int. Symp. Circuits
and Systems, pp. 2326, May, 2005.
[6] T. Hazan, S. Polak, and A. Shashua, Sparse image coding using a 3D
non-negative tensor factorization, Proc. 10th IEEE Int. Conf.
Computer Vision, pp. 1721, Oct, 2005.
[7] M. Elad, R. Goldenberg, and R. Kimmel, Low bit-rate compression
of facial images, IEEE Trans on image processing, vol.16, No.9, pp.
2379-2383, 2007.

Fig. 4. Compression result at 0.05bpp. The image on the right is compressed by the proposal algorithm. The image on the left is coded by [11]
1392

[8] O. Bryt and M. Elad, "Compression of facial images using the K-
SVD algorithm", Journal of Visual Communication and Image
Representation, Vol.19, No.4, pp.270-282, May, 2008.
[9] E. Barzykina, P. Nasiopoulos, and R. K. Ward, "Image compression
for facial photographs based on wavelet transform", Proc. IEEE
Pacific Rim Conference on Communications, Computers and Signal
Processing, vol.1, pp.322 325, Aug, 1997.
[10] L. Zhu, G. Y. Wang, and C. Wang, "Formal photograph compression
algorithm based on object segmentation", International Journal of
Automation and Computing, vol.5, No.3, pp. 276-283, 2008.
[11] Q. Zhu and S. Wang, Color personal ID photo compression based
on object segmentation, Proc. Pacific Rim Conf. on
Communications, Computers and Signal Processing, pp. 2426, Aug,
2005.
[12] S. Battiato, A. Buemi, G. Impoco, and M. Mancuso, JPEG2000
coded image optimization using a content-dependent approach,
IEEE Trans. Consumer Electronics., vol. 48, no. 3, pp. 400-408, Aug.
2002.
[13] Karion Sobottka and Ioannis Pitas, Segmentation and Tracking of
faces in color images Proc. of the Second Intl. Conf. on
Automatic Face and Gesture Recognition, pp.236-241, 1996
[14] P. Chung, C. Wu, and Y. Huang, "A JPEG 2000 error resilience
method using uneven block-sized information included markers" ,
IEEE Trans. Circuits Syst. Video Technol., vol.15, no. 3, pp420-424,
Mar, 2005

Fig.5 Performance at rate 0.05bpp. The image on the left is compressed by the proposal algorithm. The image on the right is
coded by [11]

1393

Image Processing

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Image Processing

Transféré par

Droits d'auteur :

Formats disponibles

Passport Photo Compression Technique

Vous aimerez peut-être aussi