Vous êtes sur la page 1sur 19

# Binary Data with struct

31.07.2012

## What is binary data?

Let's look at a file that contains nothing but
Hello, world!. It has a size of 14 bytes:
Content: Hello, world!
Byte number: 0123456789ABCD
The file might also be represented by:
01001000
01101111
01101111
00100001

31.07.2012

## 01100101 01101100 01101100

00101100 00100000 01110111
01110010 01101100 01100100
00001010

## Representation of binary as hex

The binary representation is clumsy.
One byte can be represented unambiguously
bin
01101111 00101100 00100000
dec
111
44
32
hex
6F
2C
20
ascii
o
,

31.07.2012

31.07.2012

## Data storage example: 7 numbers

We want to store 7 short numbers:
7, 29, 67, 98, 113, 169, 204
Traditionally, we might store them in a text
file (CSV) like this:
7,29,67,98,113,169,204
This text file would have a size of 22 bytes.

31.07.2012

## Data storage example: 7 numbers

We want to store 7 short numbers:
7, 29, 67, 98, 113, 169, 204
If we encode each number as one byte, the
file is only 7 bytes large:
07 1D 43 62 71 A8 CC

31.07.2012

## Data storage example: 2 floats

Storing two float numbers: and e
An ASCII file with one number on each line:
3.1415926535897931
2.7182818284590451
This file is 37 bytes long.

31.07.2012

## Data storage example: 2 floats

Storing two float numbers: and e
In binary float format, one number is 4 bytes
large, so two numbers are only 8 bytes:

31.07.2012

## Python's struct module

Binary data is represented by strings
(so don't try to print and understand it):
>>> struct.pack('5c',*'Hello')
'Hello'
>>> struct.pack('f',math.pi)
'\xdb\x0fI@'
>>> struct.pack('f',math.e)
'T\xf8-@'

31.07.2012

## Python's struct module

struct.pack() converts things to binary:
struct.pack('10B',*range(8,18))
08 09 0A 0B 0C 0D 0E 0F 10 11
Note: (*range(3)) is the same as (0,1,2)
struct.pack('2f4s3i',

CD CC BC 40 85 EB 6D 41 48 45 41 44
41 7E 00 00 EE 00 00 00 63 00 00 00

31.07.2012

## Python's struct module

Data structure described by format string:
struct.pack('2f4s3i',
'2f4s3i' 2 floats, then a string of length 4, then 3 ints
s: string
h: short
H: unsigned short
Q: unsigned long long
f: float
x: ignore byte
(for the complete list, see Python Library docs, section 7.3)

31.07.2012

## Python's struct module

>>> struct.unpack('2sHh',
'\x79\x6F\x55\x6b\xFF\xA9')
('yo', 27477, -22017)

31.07.2012

## Python's struct module

struct.unpack() used on a file:
f.open('myfile','rb')
f.seek(0,2)
fsize = f.tell()
#get file size
f.seek(0,0)
#jump back
while f.tell() < fsize:

## ('yo', 5921, -8529)

('da', 632, 28442)
('go', 57732, -4272)

31.07.2012

## Smaller files faster file operations/transfers

Faster processing: Less interpretation
(conversion, parsing, seeking)
struct yields native Python objects
(good for slicing, no conversion needed)

31.07.2012

## KuT-Vortrag Michael Bker

Performance comparisons

in ASCII: 25kB

31.07.2012

## Handling of variable-length dataas strings often

arecomplicates matters greatly (need stop
bytes, seeking etc.), often to the point of
impracticability
Limited precision of decimal fraction formats
(especially float)
Precise documentation of binary file structure is

31.07.2012

## KuT-Vortrag Michael Bker

Thank you!

51 75 65 73 74 69 6F 6E 73 3F
Q

31.07.2012

## KuT-Vortrag Michael Bker

Backup slide:
Performance comparison code
ASCII file:
0\n1\n2\n3\n4\n5\n \n59999

f=open('casc','r')
print sum([ int(j) for j in f.readlines() ])

31.07.2012

## KuT-Vortrag Michael Bker

Backup slide:
Performance comparison code
Binary file:
0000 0100 0200 0300 0400 0500 0600 0700
0800 0900 0A00 0B00 0C00 0D00 0E00 0F00
1000 1100 5EEA 5FEA