Académique Documents
Professionnel Documents
Culture Documents
In percent
decrease. Post the original size and the output size. Also
include the algorithm/process used.
25 Answers
It is, however, worth pointing out that there is no theoretical upper bound to how much
arbitrary data can be compressed by an arbitrary algorithm. A pathological example can
easily achieve an arbitrary amount of compression.
To get the compressed version of a file you store the index of the first digit in pi and the size
of your data as the compressed value. This can compress your data to a pretty insanely small
percentage of its original size. It also takes an unbelievable amount of time just find it unless
you get lucky.
This process can, potentially, be a larger value since you might find the subsequence you're
looking or only after a very large amount of digits. In fact it's quite likely that you'll
compress to an even larger value. This is something that the joke pages rarely point out. In
fact the pifs @philipl/pifs (Pi Filesystem) project claims 100% compression which is just
mathematically incorrect.
However, pi compression definitely compresses some values to much smaller. For any given
percentage compression you'd like, pi compression can beat it for some example file.
So how much can I compress a well-chosen file? Just take whatever the best answer has
been so far and add 0.1%.
Promoted by DigitalOcean
DigitalOcean® Python hosting - free $100/60-day credit.
Sign up now for $100 free credit. Deploy your Python project with 1vCPU, 1GB RAM and
25GB SSD.
Data Compression: What is the most powerful file compressor in terms of size reduction?
What is the potential of a compression utility app in today's market? If it can compress any
file (literally any file) to sizes of 1% of origi...
Can I compress a video to 1% of the original size? And is it possible to compress the data
again using the same method?
My website has a size of 3GB. How can I compress page size?
How do I reduce Zip file size?
Ask New Question
It can compress a 786kb bmp down to (arbitrary small size dependent on system
specifications).
using the LenPEG 3 algorithm will be exactly 0%. Since this will be less than the disk usage
of a machine that doesn't contain any copy of Lena at all—even one compressed using any
other compression algorithm—it is literally impossible to top its compression ratio. In fact,
this means that the compression ratio is an arbitrarily large negative number! Furthermore,
since one need not demonstrate an image compression algorithm on any image except Lena,
this is all the information you need to realize that it is truly the perfect compression
algorithm.
I mean, how many other answers to this question have demonstrated a negative
compression ratio? I'll tell you how many. None. LenPEG 3 will not be beat.
(P.S. If you want to see what a LenPEG 3 compressed Lena looks like, it's very easy to find
out. All new hard disks, thumb drives, and SD cards are sold with a copy these days. That's
how impressed the industry is with this amazing feat!)
Edit: Everyone go upvote Ian Kelly's comment proposing a new algorithm for LenPEG 4. It's
a great idea.
Suppose you want to compress the BluRay of the movie “The Wizard of Oz” - about 4 GBytes
- into one bit. Here’s the pseudocode:
{ output 1; }
else
output 0;
output file;
There you go - if the file is The Wizard of Oz, it outputs one bit (a 1).
if (first_bit == 1)
else
That create bzip2 archives of files consisting only of zeros, for sizes of 1, 10, 100, 1000 and
10000 MB. It ran in about 3 minutes on my box.
45 testcomp1M.bz2
49 testcomp10M.bz2
113 testcomp100M.bz2
753 testcomp1000M.bz2
7346 testcomp10000M.bz2
It looks like it slowly goes on decreasing with file size. Probably you could achieve an
arbitrarily high compression rate with really huge files full of zeros.
14.6k Views · View Upvoters
Kelly Kinkade, CS major for a few years way back. Programmer for far longer.
Answered 203w ago · Upvoted by Tamer Aly, M.S. Computer Science & Cognitive Neuroscience,
Fordham University (2015) and Nupul Kukreja, Ph.D. Computer Science & Software Engineering,
University of Southern California · Author has 7.6k answers and 37.3m answer views
I got in a small bit of trouble once by creating a zip bomb in a public directory on one of the
UNIX machines where I went to college. dd if=/dev/zero of=- bs=65536
count=65536 | compress - > interesting-file.Z The resulting file was fairly small,
but if uncompressed would expand to 4 gigabytes of zeros. Predictably, some junior system
admin, trolling about, saw it, copied it to his home directory, and uncompressed it to look at
it... and almost immediately filled the file system. He got in trouble for doing so, and tried
to blame me for his foolishness.
27.7k Views · View Upvoters
Cedric Mamo
Answered 202w ago · Author has 143 answers and 1.2m answer views
All compression is based on statistics in some way or another. (more frequent patterns
represented in less symbols, less frequent symbols represented in more symbols).
If I have a string "AAAAAAABBC", there are 7 As, 2 Bs and 1 C. So I can choose to represent
A as "0", B as "10" and C as "11". You always use the probability of a symbol (or a sequence)
of appearing. There are lots of different methods, but almost all compression is based on
this concept.
So if I had a string "AAAAAAAAAAAAAAAAAA", then I always know at the next step I will
get an A with 100% certainty (the string has no other letters), so in theory, that can be
compressed to just an empty string. In practice, you'd just have to store one of the letters.
So basically, asking what is the most compressed file ever is a bit pointless, because as I
said, it depends on the data just as much as it does on the algorithm itself.
So again, depending on the data, you can get compression ratios to be as large as you want.
You just put a long string of repeated characters in there and any algorithm will produce the
absolute best compression ratio it can that way. And if the compression ratio doesn't seem
high enough, you just make the string longer and it'll still compress to the same size
(because to the algorithm, there's still always a certainty what the next character will be)
3.8k Views · View Upvoters
JPEG compression allows you to compress images quite a lot because it doesn't matter too
much if the red in your image is 0xFF or 0xFE (usually).
The PAQ compression algorithm is by far the best algorithm for compressing files. However,
it is also one of the slowest, taking well over 10 hours to compress 1GB on even the best
CPU's
All PAQ compressors use a context mixing algorithm described in Wikipedia. A large
number of models independently predict the next bit of input. The predictions are combined
using a neural network and arithmetic coded. There are specialized models for text, binary
data, x86 executable code, BMP, TIFF, and JPEG images (except in the paq8hp* series,
which are tuned for English text only).
Best Compression Software:
PeaZip (Giorgio Tani) is a GUI front end for Windows and Linux that supports the paq8o,
lpaq1, and many other compression formats.
PeaZip is a free cross-platform file archiver & compressor that provides an unified portable
GUI for many Open Source technologies like 7-Zip, FreeArc, PAQ, UPX...
Create 7Z, ARC, BZ2, GZ, *PAQ, PEA, QUAD/BALZ, TAR, UPX, WIM, XZ, ZIP files
Open and extract over 180 archive types: ACE, ARJ, CAB, DMG, ISO, LHA, RAR, UDF,
ZIPX files and more...
Features of PeaZip includes extract, create and convert multiple archives at once, create
self-extracting archives, split/join files, strong encryption with two factor authentication,
encrypted password manager, secure deletion, find duplicate files, calculate hashes, export
job definition as script.
File Compression
File compression is used to reduce the file size of one or more files.
When a file or a group of files is compressed, the resulting "archive"
often takes up 50% to 90% less disk space than the original file(s).
Common types of file compression include Zip, Gzip, RAR, StuffIt, and
7z compression. Each one of these compression methods uses a
unique algorithm to compress the data.
So how does a file compression utility actually compress data? While
each compression algorithm is different, they all work in a similar
fashion. The goal is to remove redundant data in each file by replacing
common patterns with smaller variables. For example, words in a plain
text document might get replaced with numbers or another type of
short identifier. These identifiers then reference the original words that
are saved in a key within the compressed file. For instance, the word
"computer" may be replaced with the number 5, which takes up much
less space than the word "computer." The more times the word
"computer" is found in the text document, the more effective the
compression will be.
While file compression works well with text files, binary files can also
be compressed. By locating repeated binary patterns, a compression
algorithm can significantly reduce the size of binary files, such
as applications and disk images. However, once a file is compressed, it
must be decompressed in order to be used. Therefore, if
you downloador receive a compressed file, you will need to use a file
decompression program, such as WinZip or StuffIt Expander, to
decompress the file before you can view the original contents.
Algorithm
An algorithm is a set of instructions designed to perform a specific
task. This can be a simple process, such as multiplying two numbers,
or a complex operation, such as playing a compressed video
file. Search engines use proprietary algorithms to display the most
relevant results from their search index for specific queries.
In computer programming, algorithms are often created as functions.
These functions serve as small programs that can be referenced by a
larger program. For example, an image viewing application may
include a library of functions that each use a custom algorithm to
render different image file formats. An image editing program may
contain algorithms designed to process image data. Examples of
image processing algorithms include cropping, resizing, sharpening,
blurring, red-eye reduction, and color enhancement.
In many cases, there are multiple ways to perform a specific operation
within a software program. Therefore, programmers usually seek to
create the most efficient algorithms possible. By using highly-efficient
algorithms, developers can ensure their programs run as fast as
possible and use minimal system resources. Of course, not all
algorithms are created perfectly the first time. Therefore, developers
often improve existing algorithms and include them in future software
updates. When you see a new version of a software program that has
been "optimized" or has "faster performance," it most means the new
version includes more efficient algorithms.
A New Algorithm for Data Compression
Optimization I Made Agus Dwi Suarjaya Information
Technology Department Udayana University Bali, Indonesia
Abstract— People tend to store a lot of files inside theirs
storage. When the storage nears it limit, they then try to reduce
those files size to minimum by using data compression software.
In this paper we propose a new algorithm for data compression,
called jbit encoding (JBE). This algorithm will manipulates each
bit of data inside file to minimize the size without losing any
data after decoding which is classified to lossless compression.
This basic algorithm is intended to be combining with other data
compression algorithms to optimize the compression ratio. The
performance of this algorithm is measured by comparing
combination of different data compression algorithms.
Keywords- algorithms; data compression; j-bit encoding; JBE;
lossless.
I. INTRODUCTION Data compression
is a way to reduce storage cost by eliminating redundancies that
happen in most files. There are two types of compression, lossy
and lossless. Lossy compression reduced file size by eliminating
some unneeded data that won’t be recognize by human after
decoding, this often used by video and audio compression.
Lossless compression on the other hand, manipulates each bit of
data inside file to minimize the size without losing any data after
decoding. This is important because if file lost even a single bit
after decoding, that mean the file is corrupted. Data compression
can also be used for in-network processing technique in order to
save energy because it reduces the amount of data in order to
reduce data transmitted and/or decreases transfer time because
the size of data is reduced [1]. There are some well-known data
compression algorithms. In this paper we will take a look on
various data compression algorithms that can be use in
combination with our proposed algorithms. Those algorithms
can be classified into transformation and compression
algorithms. Transformation algorithm does not compress data
but rearrange or change data to optimize input for the next
sequence of transformation or compression algorithm. Most
compression methods are physical and logical. They are
physical because look only at the bits in the input stream and
ignore the meaning of the contents in the input. Such a method
translates one bit stream into another, shorter, one. The only way
to understand and decode of the output stream is by knowing
how it was encoded. They are logical because look only at
individual contents in the source stream and replace common
contents with short codes. Logical compression method is useful
and effective (achieve best compression ratio) on certain types
of data [2]. II. RELATED ALGORITHMS A. Run-length
encoding Run-length encoding (RLE) is one of basic technique
for data compression. The idea behind this approach is this: If a
data item d occurs n consecutive times in the input stream,
replace the n occurrences with the single pair nd [2]. RLE is
mainly used to compress runs of the same byte [3]. This
approach is useful when repetition often occurs inside data. That
is why RLE is one good choice to compress a bitmap image
especially the low bit one, example 8 bit bitmap image. B.
Burrows-wheeler transform Burrows-wheeler transform (BWT)
works in block mode while others mostly work in streaming
mode. This algorithm classified into transformation algorithm
because the main idea is to rearrange (by adding and sorting)
and concentrate symbols. These concentrated symbols then can
be used as input for another algorithm to achieve good
compression ratios. Since the BWT operates on data in memory,
you may encounter files too big to process in one fell swoop. In
these cases, the file must be split up and processed a block at a
time [3]. To speed up the sorting process, it is possible to do
parallel sorting or using larger block of input if more memory
available.
C. Move to front transform Move to front transform (MTF) is
another basic technique for data compression. MTF is a
transformation algorithm which does not compress data but can
help to reduce redundancy sometimes [5]. The main idea is to
move to front the symbols that mostly occur, so those symbols
will have smaller output number. This technique is intended to
be used as optimization for other algorithm likes Burrows-
wheeler transform.
D. Arithmetic coding Arithmetic coding (ARI)
is using statistical method to compress data. The method starts
with a certain interval, it reads the input file symbol by symbol,
and uses the probability of each symbol to narrow the interval.
Specifying a narrower interval requires more bits, so the number
constructed by the algorithm grows continuously. To achieve
compression, the algorithm is designed such that a high-
probability symbol
assignmen ofthe
compiler design
1
Expalin about the syntax direccted traselation? With
examples
Compiler Design | Syntax Directed Translation
Background : Parser uses a CFG(Context-free-Grammer) to validate the input string and produce
output for next phase of the compiler. Output could be either a parse tree or abstract syntax tree. Now to
interleave semantic analysis with syntax analysis phase of the compiler, we use Syntax Directed
Translation.
Definition
Syntax Directed Translation are augmented rules to the grammar that facilitate semantic analysis. SDT
involves passing information bottom-up and/or top-down the parse tree in form of attributes attached to
the nodes. Syntax directed translation rules use 1) lexical values of nodes, 2) constants & 3) attributes
associated to the non-terminals in their definitions.
Example
E -> E+T | T
T -> T*F | F
F -> INTLIT
This is a grammar to syntactically validate an expression having additions and multiplications in it. Now, to
carry out semantic analysis we will augment SDT rules to this grammar, in order to pass some information
up the parse tree and check for semantic errors, if any. In this example we will focus on evaluation of the
given expression, as we don’t have any semantic assertions to check in this very basic example.
For understanding translation rules further, we take the first SDT augmented to [ E -> E+T ] production
rule. The translation rule in consideration has val as attribute for both the non-terminals – E & T. Right
hand side of the translation rule corresponds to attribute values of right side nodes of the production rule
and vice-versa. Generalizing, SDT are augmented rules to a CFG that associate 1) set of attributes to
every node of the grammar and 2) set of translation rules to every production rule using attributes,
constants and lexical values.
Let’s take a string to see how semantic analysis happens – S = 2+3*4. Parse tree corresponding to S
would be
To evaluate translation rules, we can employ one depth first search traversal on the parse tree. This is
possible only because SDT rules don’t impose any specific order on evaluation until children attributes
are computed before parents for a grammar having all synthesized attributes. Otherwise, we would have
to figure out the best suited plan to traverse through the parse tree and evaluate all the attributes in one
or more traversals. For better understanding, we will move bottom up in left to right fashion for computing
translation rules of our example.
Above diagram shows how semantic analysis could
happen. The flow of information happens bottom-up and
all the children attributes are computed before parents, as
discussed above. Right hand side nodes are sometimes
annotated with subscript 1 to distinguish between children
and parent.
Additional Information
Synthesized Attributes are such attributes that depend
only on the attribute values of children nodes.
Thus [ E -> E+T { E.val = E.val + T.val } ] has a
synthesized attribute val corresponding to node E. If all the
semantic attributes in an augmented grammar are
synthesized, one depth first search traversal in any order
is sufficient for semantic analysis phase.
Inherited Attributes are such attributes that depend on
parent and/or siblings attributes.
Thus [ Ep -> E+T { Ep.val = E.val + T.val, T.val = Ep.val }
], where E & Ep are same production symbols annotated
to differentiate between parent and child, has an inherited
attribute val corresponding to node
Kendel Ventonda
39115
add a comment
4 Answers
activeoldestvotes
up vote3down vote
Although what is "correct" always depends on theory, there are
various things that are definitely not quite right with your trees.
Tree #1
the founder of the church of England
The whole thing taken together is an NP (it starts with a definite
article and can serve as the subject of a sentence, so it is
something nominal, not prepositional), so the root of the tree
should be labelled NP rather than PP.
In general, an XP must always have an X as its head.
Thus, when there is an NP, there must be an N as the head, and
for a PP, there is a P head. This principle is not always follwed
in your trees.
The same goes for NPs. Now I don't know what theory you are
using, because there are basically two opposing approaches:
1) Make the whole thing an NP, i.e. a phrase with an N head to
which the determiner is a specifier:
You can now argue about whether the PP "of the church of
England" is an adjunct rather than a complement, but in this case
I find the latter approach more plausible. So within N', we have
an N head "founder" and a PP complement "of the church of
England":
Now about the PP. As said above, the head of the PP must be a
P of which the complement is an NP, thus:
The NP "the church of England" again branches into the
determiner and the N' "church of England":
Again, you could also argue about making the PP "of England"
an adjunction, but here too I find a complement more plausible.
The PP "of England" itself looks similar as the other PPs, with
the difference that the NP "England" doesn't have a DP
specifier:
Tree #2
the brother of the girl who left us
I'll keep my explanation a bit briefer here.
Similarly as above, you have an NP in which the N' consists of
the N head "brother" and a PP complement "of the girl who left
us":
there can not be one. Details of what a tree looks like always
depends on theory. In particular,
there are opposing views on how to account for
determiner + noun (making it an NP, as I did, or a DP,
with consequences for their internal structure)
whether to omit redundant bar levels (as I did),
NPs.
Which solution is deemed correct depends on what theory
you are using.
You really should take a look again into the basics of how
phrase structure trees work. For example, having a VP with a
P head, as you did in your second tree, makes absolutely no
sense. It seems that there are some substantial assumptions
about phrase structure trees that are not quite clear to you yet.
You must always make sure that the labels of your (sub) trees
are in accordance with what is in the tree: A PP consists of a
P and an NP complement, if you have an NP, then this must
have an N as its head, and an expression "the brother of ..." is
certainly not a VP.
Once you gained a better understanding of how phrase
structure trees work, what a phrase consists of an what
relations hold between constutuents, it will get far more
obivious to you how to assign a sentence a tree structure.
shareimprove this answer
edited Jan 28 '17 at 17:20
answered Jan 28 '17 at 13:06
lemontree♦
4,4691826
add a comment
up vote2down vote
The two sentences are Nps
check this:
A Simple Language
Consider the following grammar:
P→D;E
D → D ; D | id : T
T → char | integer | array [ num ] of T | ↑ T
E → literal | num | id | E mod E | E [ E ] | E ↑
Translation scheme:
P→D;E
D→D;D
D → id : T { addtype (id.entry , T.type) }
T → char { T.type : = char }
T → integer { T.type : = integer }
T → ↑ T1 { T.type : = pointer(T1.type) }
T → array [ num ] of T1 { T.type : = array ( 1… num.val ,
T1.type) }
In the following rules, the attribute type for E gives the type
expression assigned to the expression generated by E.
3. While statement:
S → while E do S1
4. Sequence of statements:
S → S1 ; S2 { S.type : = if S1.type = void and S1.type =
void then void else type_error }