Académique Documents
Professionnel Documents
Culture Documents
In-memory Databases
Peter Storeng
www.firstderivatives.com
What is Memory?
• Random Access Memory (RAM)
• A form of computer data storage
• As opposed to Sequential Access Memory (SAM)
• Random refers to a byte of data having an address
• The bit size of a cpu tells you how many bytes it can process at once.
• Eg. 16-bit CPU can process 2 bytes at a time (1 byte = 8 bits, so 16 bits = 2
bytes)
www.firstderivatives.com
• Eg. Suppose we want to access memory location 2,871,405. This
corresponds to a binary address of "10101111010 00001101101".
www.firstderivatives.com
Types of RAM
• 2 main types: DRAM (Dynamic) & SRAM (Static). (There are loads of other
types!)
• Most machines use both types. Small amount of SRAM and a large
amount of DRAM (used for main memory)
• 1 capacitor and 1 transistor for each bit in DRAM, 1 capacitor and 4-6
transistors per bit in SRAM
www.firstderivatives.com
Hard Drive
• Hard drive has a magnetic read-write head rides on a finger-like
mechanism just above a spinning metallic disk.
• When the computer requests data from the disk, it has to wait for the
data on the disk to "come around" to the head's position.
• Become slower the more data you have.
www.firstderivatives.com
Solid State Drives
• Another type of hard drive known as a solid-state drive which
has no moving parts stores data similar to how its stored in
memory
www.firstderivatives.com
In-memory or on disk?
www.firstderivatives.com
What is an In-Memory DB System?
• A database is a collection of organised data
www.firstderivatives.com
Advantages of an IMDB
• Working with data in memory is much faster than writing to
and reading from a file system.
www.firstderivatives.com
Disadvantages
www.firstderivatives.com
However…
• RAM is getting cheaper with advances in technology
www.firstderivatives.com
Relational Databases
Definition: A relational database is one where the data is
organised as a set of formally described tables from which
data can be accessed easily.
www.firstderivatives.com
Primary key table
www.firstderivatives.com
ACID
Definition: set of properties that guarantee
that database transactions are processed reliably
• Atomicity: If one part of the transaction fails, the entire transaction fails.
DB state is unchanged.
www.firstderivatives.com
Row Oriented DB
For storing Tables:
• Each row in the table is stored as one
entity. To access a particular value
within the row we first have to read
the entire row.
www.firstderivatives.com
Column Oriented DB
• Stores data by column. A particular
column can be accessed without
reading in redundant extra data
www.firstderivatives.com
OLAP and OLTP
• OLTP (On-line Transaction Processing) is a class of IT
system that facilitates and manages transaction-
oriented applications. Typically INSERT, UPDATE,
DELETE.
www.firstderivatives.com
OLAP and OLTP
• OLAP and OLTP are usually done on separate databases even
though the data you want to analyse is the data created from
OLTP
• So usually you would have to take the data created from OLTP
and load it into a separate database where we can analyse
this
www.firstderivatives.com
Attributes in kdb+ Context
• We apply attributes to a specific column in a table to
reduce storage requirements or speed retrieval.
www.firstderivatives.com
Attributes in kdb+ Context
• Grouped Attribute: We apply this attribute to a column which
has significant repetition. We store a hashtable (dictionary)
which maps a unique column value to a list of positions of all
its occurences. Disadvantage of this is we actually have to
store extra data.
www.firstderivatives.com
Paging
• In computer OS’s, paging is one of the memory-management schemes by
which a computer can store and retrieve data from secondary storage for
use in main memory.
www.firstderivatives.com
Paging
• The main functions of paging are performed when a program tries to
access pages that are not currently mapped to physical memory (RAM).
This situation is known as a page fault. The operating system must then
take control and handle the page fault, in a manner invisible to the
program. Therefore, the operating system must:
www.firstderivatives.com
Page Table
www.firstderivatives.com
Memory Mapping
• A memory-mapped file is a file that has been mapped
(i.e., not copied) into virtual memory such that it looks as
though it has been loaded into memory. Rather than actually
being copied into virtual memory, a range of virtual memory
addresses are simply marked off for use by the file.
www.firstderivatives.com
Memory Mapping
• As each page of the file is accessed and copied into memory so that the
CPU can access it, it bypasses the paging file.
• In this sense, the file takes the place of the paging file as the backing
storage for the particular range of virtual memory addresses into which it
has been mapped.
• Typically, the backing storage for virtual memory is the system paging
file. But with memory-mapped files, things change.
• The files themselves act as virtual extensions of the paging file and serve
as the backing storage for their associated virtual memory address range
for unmodified pages.
www.firstderivatives.com
Memory Mapping
• In kdb+ we typically have HDB worker process with
the tables memory mapped
www.firstderivatives.com
Memory mapped Tables in kdb+
\l /db
t
/ this is unsafe as it will load all of t!
date ti p
-------------------------
2009.01.01 09:30:00 101
2009.01.01 09:31:00 102
2009.01.02 09:30:00 101.5
2009.01.02 09:31:00 102.5
`t insert (10:00:00;42.0)
'splay
www.firstderivatives.com
`t upsert (10:00:00; 42.0)
`t
t
ti p
--------------
09:30:00 101.5
09:31:00 33.5
10:00:00 42
www.firstderivatives.com
\l /db
t
ti p
--------------
09:30:00 101.5
09:31:00 33.5
www.firstderivatives.com
Questions?
www.firstderivatives.com
SAP HANA
• In-memory Relational Database Management
System
www.firstderivatives.com
SAP HANA
www.firstderivatives.com
SAP HANA is an Appliance
• An appliance is the marriage of hardware and
software
www.firstderivatives.com