Vous êtes sur la page 1sur 6

ComparisonofBrotli,Deflate,Zopfli,LZMA,LZHAM

andBzip2CompressionAlgorithms
JyrkiAlakuijala,EvgeniiKliuchnikov,ZoltanSzabadka,andLodeVandevenne
Google,Inc.


Abstract This paper compares six compression techniques, and based on theresults
proposes that brotli could be used as a replacement of the common deflate algorithm.
We compared the performance of brotli bymeasuringthecompressionratioandspeed,
as wellasdecompressionspeedonthreedifferentcorpora:theCanterburycompression
corpus, an ad hoc crawled web content corpus, and enwik8. On all three corpora we
show performance superior to that of deflate. Further, we show that Zopfli, LZMA,
LZHAM and bzip2 use significantly more CPU time for either compression or
decompressionandcouldnotalwaysworkasdirectreplacementsofdeflate.

Introduction
Muchofthepracticallosslessdatacompressionisdonewiththedeflatealgorithm,notonly
becauseitiswellsupportedbyexistingsystems,butalsobecauseitisrelativelysimpleandfast
toencodeanddecode.In2013welaunchedZopfli [1]
,acompressionalgorithmthatallowsfor
densercompressionwhileremainingcompatiblewiththedeflateformat.WhileZopfliisnow
wellacceptedinthefield,therewereopinionsexpressedthatweshouldmoveonfromthe
deflatefileformattoamodernsolution.Brotli
[2]
isourattemptatbuildingacompressionformat
andanexampleimplementationofthisformatthatisfundamentallymoreefficientthandeflate.
Inthispaperwemeasuretheperformanceofourimplementationandcompareitwithdeflate
andafewothercompressionalgorithms.

Methods
Thetestswererunwitha22bitwindowsizeforbrotli,LZMAandLZHAM,anda15bitwindow
sizefordeflateandzopfli.Weuseda22bitwindowsizebecausepastexperienceshowedthat
largerwindowscanbeslowertodecode.Largerwindowsizestendtogiveahigher
compressionratioattheexpenseofdecodingspeed.Fordeflateandzopfliweusedthe
maximumsizeallowedbytheformat.Theversionsofthealgorithmstestedare:
brotliversion0.2.0
[2]
,
deflatealgorithmfromzlib1.2.8 [3]
,
Zopfliversionfromgithub20150901 [4],
LZMAimplementationin7zip9.20.1 [5]
,
LZHAM1.0stable1 [6],and
bzip21.0.6,6Sept2010 [7]
.

ThetestcomputerweusedisanIntelXeonCPUE51650v2runningat3.5GHzwithsix
coresandsixadditionalhyperthreadingcontexts.Werunlinux3.13.0.Allcodecswere
compiledusingthesamecompiler,GCC4.8.4atO2leveloptimization.Alltestswererun
singlethreadedonanotherwiseidlecomputer.

ThecompressioncorporaweusedinthetestingaretheCanterburycompressioncorpus [8]
,an
adhoccrawledwebcontentcorpus,1285files,70611753bytestotal,andenwik8,asinglefile
corpusthatisusedintheHutterprize[9]
.Theaveragefilesizeonthewebcontentcorpusis
only55kB,sothelargerwindowsizeadvantageofadvancedalgorithmsoverdeflatemostly
disappearsthere.

Wemeasuredthecompressionratio,compressionspeedanddecompressionspeedfor
selectedalgorithmsandcompressionlevels.Thecompressionanddecompressionspeedof
eachalgorithmweremeasuredwiththesamebenchmarkprogramthatcalledthecompression
anddecompressionroutinesofeachalgorithmfromstaticallylinkedlibraries.

Welimitedtheselectionofalgorithmstothosethatgenerallyhaveahighercompressionratio
thanthatofdeflate.Forthisreasonweexcludedalgorithmslikelz4andzstdfromthisstudy.

Unlikeotheralgorithmscomparedhere,brotliincludesastaticdictionary.Itcontains13504
wordsorsyllablesofEnglish,Spanish,Chinese,Hindi,RussianandArabic,aswellascommon
phrasesusedinmachinereadablelanguages,particularlyHTMLandJavaScript.Thetotalsize
ofthestaticdictionaryis122784bytes.Thestaticdictionaryisextendedbyamechanismof
transformsthatslightlychangethewordsinthedictionary.Atotalof1633984sequences,
althoughnotallofthemunique,canbeconstructedbyusingthe121transforms.Toreducethe
amountofbiasthestaticdictionarygivestotheresults,weusedamultilingualwebcorpusof93
differentlanguageswhereonly122ofthe1285documents(9.5%)areinlanguagessupported
byourstaticdictionary.

Inaveragingovertheresultsofindividualfilesandoverthecorporawechosetousegeometric
meaninsteadofthemorecommonarithmeticmean.Thegeometricmeangivesabitmore
weightforpoorperformance,i.e.,ifaparticularalgorithmcompressesonefiletypeextremely
fastordensely,itwillnotbepropagatedintotheresultsasstronglyaswithanarithmeticmean.
Results
Tables1,2and3showtheresultsofthreedifferentcorpora.Figure1showsthedecompression
speedvs.compressionratioonCanterburycorpus,showinggraphicallysomeoftheresults
fromTable1.Wecanseefromthetablesthatbrotliatqualitysetting1(shorthandnotationin
thisdocumentbrotli:1)compressesanddecompressesroughlythesamespeedasdeflate:1,but
offers1216%highercompressionratio.Brotli:9isagainroughlysimilarwithdeflate:9onthe
Canterburyandwebcontentcorpora,butgivesaspeedincreaseof28%indecodingofenwik8,
andacompressionratioincreaseof13 1%.Brotli:11issignificantlyfasterincompression
2
thanzopfliandgives2026%highercompressionratio.

Brotligivesslightlyfasterdecompressionthandeflateforthetestedcorpora,whileother
advancedalgorithms(LZMA,LZHAMandbzip2)areslowerthandeflate.Thegeometricmeans
forallreporteddecompressionspeedsinthetableare342.2MB/sforbrotliand323.6MB/sfor
deflate,a5.7%advantageforbrotli.

Incompressionbrotli:1issimilarly5.7%fasterthandeflate:1,butbrotli:9happenstobe32.3%
slowerthandeflate:9.However,oneshouldnotcomparecompressionspeedsimplybythe
qualitysetting.Amoreusefulcomparisonistoconsidercompressionspeedforanaimed
compressionratio.Oftenbrotli:1isclosetodeflate:9,andsometimesevenexceedingits
compressionratio.Forexample,whencompressingtheCanterburycorpusdownto3.3ratio
onecoulduseeitherbrotli:1at98.3MB/sordeflate:9at15.5MB/s.

LZMAcancompressenwik8witha2.5%highercompressionratio,butthatcomeswitha
penaltyof3.5timeslongerdecompressiontime.LZHAM:4issomewhatsimilarinperformance
withbrotli:11:1%highercompressionratioatacostof25%slowerdecompressionspeed.On
shorterfiles,likeCanterburycorpusandwebdocuments,brotliscompressionratiosare
unmatchedbyLZMAandLZHAM.

Bzip2wasabletocompresssometextfilesratherwell,butintheoverallresultsitfallsbehind.


Table1.ThistableshowstheresultsofcompressionalgorithmsontheCanterburycorpus.The
Canterburycorpuscontains11files,andweshowthegeometricmeanforthemeasured
attributes:compressionratio,compressionspeedanddecompressionspeed.
Algorithm: Compression Compressionspeed Decompression
qualitysetting ratio [MB/s] speed[MB/s]
brotli:1 3.381 98.3 334.0
brotli:9 3.965 17.0 354.5
brotli:11 4.347 0.5 289.5
deflate:1 2.913 93.5 323.0
deflate:9 3.371 15.5 347.3
zopfli 3.580 0.2 342.1
lzma:1 3.847 10.2 70.0
lzma:9 4.240 3.9 71.7
lzham:1 3.836 3.9 116.0
lzham:4 3.952 0.5 117.7
bzip2:1 3.757 11.8 40.4
bzip2:9 3.869 12.0 40.2


Figure 1. The decompression speed vs. compression ratio of the Canterbury corpus as a
scatter plot. For the decompression speed vs. compression ratio, brotli:9 and brotli:11 form the
paretooptimalfront.

Table2.Resultsofthecompressionalgorithmsonasampleofdocumentscrawledfromthe
Internet.Thesampleconsistsof1285HTMLdocuments,with93differentlanguages.
Algorithm: Compression Compressionspeed Decompression
qualitysetting ratio [MB/s] speed[MB/s]
brotli:1 5.217 145.2 508.4
brotli:9 6.253 30.1 508.7
brotli:11 6.938 0.6 441.8
deflate:1 4.666 146.9 434.8
deflate:9 5.528 32.9 484.1
zopfli 5.770 0.2 460.1
lzma:1 5.825 7.9 100.5
lzma:9 6.231 4.4 102.2
lzham:1 5.580 4.7 168.7
lzham:4 5.768 0.2 172.7
bzip2:1 5.710 11.0 52.3
bzip2:9 5.867 11.1 52.3

Table3.
Resultsofdifferentcompressionalgorithmsontheenwik8file.
Algorithm: Compression Compressionspeed Decompression
qualitysetting ratio [MB/s] speed[MB/s]
brotli:1 2.711 78.3 228.6
brotli:9 3.308 5.6 279.4
brotli:11 3.607 0.4 257.4
deflate:1 2.364 70.8 211.7
deflate:9 2.742 18.1 217.4
zopfli 2.857 0.6 227.7
lzma:1 3.106 9.8 60.6
lzma:9 3.696 3.44 71.8
lzham:1 3.335 2.4 177.9
lzham:4 3.643 0.4 192.2
bzip2:1 3.007 12.3 30.8
bzip2:9 3.447 12.4 30.3
DiscussionandConclusions
Thesecomparisonsaredonewithafixedwidth(22bits)backwardreferencewindow.Other
algorithmscouldpossiblybenefitfromdifferentwidths.LZMAandLZHAMarecommonly
appliedwithalargerwindow.Inthisstudywearelookingforareplacementcandidatealgorithm
forthedeflatealgorithm,andalargerwindowcouldslowdownencodinganddecodingaswell
asusemorememoryduringdecoding.Applyingthesamebackwardreferencewindowsizein
LZMA,LZHAMandbrotliremovesonecomplicationfromthecomparison.

Brotliusesastaticdictionarythatcanbehelpfulforcompressingshortfiles.Otheralgorithms
couldbeeasilymodifiedtodothesame,andtheywouldobtainslightlybettercompression
ratios.Foralongfilelikeenwik8astaticdictionaryisnotveryhelpful.Canterburycorpus
containsshortdocumentswithEnglish,andtherebrotlisstaticdictionarymightbegivingitan
unfairadvantage.

Ourresultsindicatethatbrotli,andonlybrotlioutofallthebenchmarkedalgorithms,wouldbea
goodreplacementforthecommonusecasesofthedeflatealgorithminallthreeaspects,
compressionratio,compressionspeed,anddecompressionspeed.

References
1. https://zopfli.googlecode.com/files/Data_compression_using_Zopfli.pdf
2. https://github.com/google/brotli/releases/tag/v0.2.0
3. http://www.zlib.net/
4. https://github.com/google/zopfli/commit/89cf773beef75d7f4d6d378debdf299378c3314e
5. http://www.7zip.org/history.txt
6. https://github.com/richgel999/lzham_codec/releases/tag/v1_0_stable1
7. http://www.bzip.org/
8. http://corpus.canterbury.ac.nz/
9. http://prize.hutter1.net/

Vous aimerez peut-être aussi