Vous êtes sur la page 1sur 19

Binary Data with struct

31.07.2012

KuT-Vortrag Michael Bker

What is binary data?


Let's look at a file that contains nothing but
Hello, world!. It has a size of 14 bytes:
Content: Hello, world!
Byte number: 0123456789ABCD
The file might also be represented by:
01001000
01101111
01101111
00100001

31.07.2012

01100101 01101100 01101100


00101100 00100000 01110111
01110010 01101100 01100100
00001010

KuT-Vortrag Michael Bker

Representation of binary as hex


The binary representation is clumsy.
One byte can be represented unambiguously
by a two-digit hexadecimal number:
bin
01101111 00101100 00100000
dec
111
44
32
hex
6F
2C
20
ascii
o
,

31.07.2012

KuT-Vortrag Michael Bker

Representation of binary as hex

31.07.2012

KuT-Vortrag Michael Bker

Data storage example: 7 numbers


We want to store 7 short numbers:
7, 29, 67, 98, 113, 169, 204
Traditionally, we might store them in a text
file (CSV) like this:
7,29,67,98,113,169,204
This text file would have a size of 22 bytes.

31.07.2012

KuT-Vortrag Michael Bker

Data storage example: 7 numbers


We want to store 7 short numbers:
7, 29, 67, 98, 113, 169, 204
If we encode each number as one byte, the
file is only 7 bytes large:
07 1D 43 62 71 A8 CC

31.07.2012

KuT-Vortrag Michael Bker

Data storage example: 2 floats


Storing two float numbers: and e
An ASCII file with one number on each line:
3.1415926535897931
2.7182818284590451
This file is 37 bytes long.

31.07.2012

KuT-Vortrag Michael Bker

Data storage example: 2 floats


Storing two float numbers: and e
In binary float format, one number is 4 bytes
large, so two numbers are only 8 bytes:

31.07.2012

KuT-Vortrag Michael Bker

Python's struct module


Binary data is represented by strings
(so don't try to print and understand it):
>>> struct.pack('5c',*'Hello')
'Hello'
>>> struct.pack('f',math.pi)
'\xdb\x0fI@'
>>> struct.pack('f',math.e)
'T\xf8-@'

31.07.2012

KuT-Vortrag Michael Bker

Python's struct module


struct.pack() converts things to binary:
struct.pack('10B',*range(8,18))
08 09 0A 0B 0C 0D 0E 0F 10 11
Note: (*range(3)) is the same as (0,1,2)
struct.pack('2f4s3i',
5.9,14.87,'HEAD',32321,238,99)

CD CC BC 40 85 EB 6D 41 48 45 41 44
41 7E 00 00 EE 00 00 00 63 00 00 00

31.07.2012

KuT-Vortrag Michael Bker

Python's struct module


Data structure described by format string:
struct.pack('2f4s3i',
5.9,14.87,'HEAD',32321,238,99)
'2f4s3i' 2 floats, then a string of length 4, then 3 ints
s: string
h: short
H: unsigned short
Q: unsigned long long
f: float
x: ignore byte
(for the complete list, see Python Library docs, section 7.3)

31.07.2012

KuT-Vortrag Michael Bker

Python's struct module


struct.unpack() reads things from binary:
>>> struct.unpack('2sHh',
'\x79\x6F\x55\x6b\xFF\xA9')
('yo', 27477, -22017)

31.07.2012

KuT-Vortrag Michael Bker

Python's struct module


struct.unpack() used on a file:
f.open('myfile','rb')
f.seek(0,2)
#jump to end of file
fsize = f.tell()
#get file size
f.seek(0,0)
#jump back
while f.tell() < fsize:
struct.unpack('2sHh',f.read(6))

('yo', 5921, -8529)


('da', 632, 28442)
('go', 57732, -4272)

31.07.2012

KuT-Vortrag Michael Bker

Advantages of binary data


Advantages:

Smaller files faster file operations/transfers


Faster processing: Less interpretation
(conversion, parsing, seeking)
struct yields native Python objects
(good for slicing, no conversion needed)

31.07.2012

KuT-Vortrag Michael Bker

Performance comparisons

Storing range(-2500,2500) in file

in ASCII: 25kB

in binary (signed short): 10kB

Reading range(60000) from file, then sum()

ASCII file: ~160ms

binary file (unsigned short): ~50ms

31.07.2012

KuT-Vortrag Michael Bker

Disadvantages of binary data


Disadvantages:

Handling of variable-length dataas strings often


arecomplicates matters greatly (need stop
bytes, seeking etc.), often to the point of
impracticability
Limited precision of decimal fraction formats
(especially float)
Precise documentation of binary file structure is
indispensable for successful readout

31.07.2012

KuT-Vortrag Michael Bker

Thank you!

51 75 65 73 74 69 6F 6E 73 3F
Q

31.07.2012

KuT-Vortrag Michael Bker

Backup slide:
Performance comparison code
ASCII file:
0\n1\n2\n3\n4\n5\n \n59999

Readout code:
f=open('casc','r')
print sum([ int(j) for j in f.readlines() ])

31.07.2012

KuT-Vortrag Michael Bker

Backup slide:
Performance comparison code
Binary file:
0000 0100 0200 0300 0400 0500 0600 0700
0800 0900 0A00 0B00 0C00 0D00 0E00 0F00
1000 1100 5EEA 5FEA

Readout code:
import struct
f=open('cbin','rb')
print sum(struct.unpack('60000H',f.read()))

31.07.2012

KuT-Vortrag Michael Bker