Vous êtes sur la page 1sur 34

First Derivatives

In-memory Databases
Peter Storeng

www.firstderivatives.com
What is Memory?
• Random Access Memory (RAM)
• A form of computer data storage
• As opposed to Sequential Access Memory (SAM)
• Random refers to a byte of data having an address

• The lowest level of data is a bit (0 or 1, with zero representing off 1


representing on). We call a collection of 8 of these a byte.

• So each address holds one byte

• The bit size of a cpu tells you how many bytes it can process at once.
• Eg. 16-bit CPU can process 2 bytes at a time (1 byte = 8 bits, so 16 bits = 2
bytes)

www.firstderivatives.com
• Eg. Suppose we want to access memory location 2,871,405. This
corresponds to a binary address of "10101111010 00001101101".

• First, "00001101101" would be sent to select the"row",and


then"10101111010" would be sent to select the column. This combination
selects the unique location of memory address 2,871,405.

www.firstderivatives.com
Types of RAM
• 2 main types: DRAM (Dynamic) & SRAM (Static). (There are loads of other
types!)

• They differ in the technology they use to hold data

• Both still volatile

• SRAM is faster (used in the cpu cache)

• Most machines use both types. Small amount of SRAM and a large
amount of DRAM (used for main memory)

• 1 capacitor and 1 transistor for each bit in DRAM, 1 capacitor and 4-6
transistors per bit in SRAM

www.firstderivatives.com
Hard Drive
• Hard drive has a magnetic read-write head rides on a finger-like
mechanism just above a spinning metallic disk.
• When the computer requests data from the disk, it has to wait for the
data on the disk to "come around" to the head's position.
• Become slower the more data you have.

www.firstderivatives.com
Solid State Drives
• Another type of hard drive known as a solid-state drive which
has no moving parts stores data similar to how its stored in
memory

• Therefore it is much faster than traditional rotating hard disk


drives.

• However unlike RAM it can retain data without power

• SSDs are still about 10 times more expensive per unit of


storage when compared to HDDs.

www.firstderivatives.com
In-memory or on disk?

• As you go up the pyramid we’re able to hold less


data and things get more expensive
• We have to decide on a trade off between cost and
speed.

www.firstderivatives.com
What is an In-Memory DB System?
• A database is a collection of organised data

• A database management system is the software that


allows storing modifying and extracting data from
the database

• An in-memory database system is a DBMS that


primarily relies on main memory for computer data
storage

www.firstderivatives.com
Advantages of an IMDB
• Working with data in memory is much faster than writing to
and reading from a file system.

• Their design is typically simpler than that of on-disk databases


therefore IMDBs can also impose significantly lower memory
and CPU requirements.

• Removes multiple copies of data thereby reducing memory


consumption. This simplified processing makes for greater
reliability and minimizes CPU demands.

www.firstderivatives.com
Disadvantages

• Memory is not persisted. If the power goes out you


lose everything!

• Up until recently a server with a large amount of


RAM would be in the 32-64GB range while
commonly core enterprise databases would be 100-
500GB in size.

www.firstderivatives.com
However…
• RAM is getting cheaper with advances in technology

• Most IMDBs offer features for persisting data periodically

• This combined with transaction logging ensures no data loss


snapshot + transaction log = database at time of crash

• Consider RDB log file: (`upd;t;x)

www.firstderivatives.com
Relational Databases
Definition: A relational database is one where the data is
organised as a set of formally described tables from which
data can be accessed easily.

• Relational DBs enforce referential integrity

• Definition: Referential integrity is a database concept that ensures


that relationships between tables remain consistent. When one table
has a foreign key to another table, the concept of referential integrity
states that you may not add a record to the table that contains the
foreign key unless there is a corresponding record in the linked table

www.firstderivatives.com
Primary key table

Foreign key table


• Here we have deleted artist id 4 thereby losing the artist of “Eat the
Rich”. Thus breaking referential integrity
• Some RDBMS enforce referential integrity either by deleting the
value referenced by foreign key or returning an error.
• Kdb+ enforces referential integrity by requiring that a foreign key
column be referenced in the primary key column

www.firstderivatives.com
ACID
Definition: set of properties that guarantee
that database transactions are processed reliably

• Atomicity: If one part of the transaction fails, the entire transaction fails.
DB state is unchanged.

• Consistency: A transaction will bring the DB from one valid state to


another. Otherwise we roll back to last valid state

• Isolation: Concurrent transactions must be independent of one another

• Durability: Ensures that once a transaction has been committed it will


remain so even after power losses, crashes etc.

www.firstderivatives.com
Row Oriented DB
For storing Tables:
• Each row in the table is stored as one
entity. To access a particular value
within the row we first have to read
the entire row.

www.firstderivatives.com
Column Oriented DB
• Stores data by column. A particular
column can be accessed without
reading in redundant extra data

www.firstderivatives.com
OLAP and OLTP
• OLTP (On-line Transaction Processing) is a class of IT
system that facilitates and manages transaction-
oriented applications. Typically INSERT, UPDATE,
DELETE.

• OLAP (On-line Ananlytical Processing) is a class of IT


system used to answer analytical queries swiftly.
Typically SELECT. Queries can be complex and
involve aggregation. OLAP data comes from various
OLTP databases.

www.firstderivatives.com
OLAP and OLTP
• OLAP and OLTP are usually done on separate databases even
though the data you want to analyse is the data created from
OLTP

• So usually you would have to take the data created from OLTP
and load it into a separate database where we can analyse
this

• However modern DBMS like kdb+ and SAP HANA combine


these into one structure. This allows analytics to be
performed immediately

www.firstderivatives.com
Attributes in kdb+ Context
• We apply attributes to a specific column in a table to
reduce storage requirements or speed retrieval.

• Sorted attribute: Speeds search. Indicates a column is


sorted in a particular order. Means when retrieving data
we replace a linear search with a much faster binary
search.

• Unique attribute: Faster because when we have found one


of something we know we can stop looking instead of
searching through the rest of the column.

www.firstderivatives.com
Attributes in kdb+ Context
• Grouped Attribute: We apply this attribute to a column which
has significant repetition. We store a hashtable (dictionary)
which maps a unique column value to a list of positions of all
its occurences. Disadvantage of this is we actually have to
store extra data.

• Parted Attribute: Indicates that column represents a step


function in which all occurences of a particular output value
are adjacent. Again lookup is much faster as linear search is
replaced by hashtable lookup. (parted attribute creates a
hashtable mapping each unique output value to position of
first occurrence)
www.firstderivatives.com
Virtual Memory
• Often the amount of RAM available to the
CPU is not enough to run all programs at
once.

• Virtual memory allows the computer to


look for areas of RAM that have not been
used recently and copy them onto the hard
disk freeing up RAM

• The OS does this automatically for us

www.firstderivatives.com
Paging
• In computer OS’s, paging is one of the memory-management schemes by
which a computer can store and retrieve data from secondary storage for
use in main memory.

• the operating system retrieves data from secondary storage in same-


size blocks called pages.

• Paging is an important part of virtual memory implementation in most


contemporary general-purpose operating systems, allowing them to use
disk storage for data that does not fit into physical random-access
memory (RAM). Talk about page hit and page fault.
• Good discussion here:
http://www.cs.umd.edu/class/spring2003/cmsc311/Notes/Memory/virtu
al.html

www.firstderivatives.com
Paging
• The main functions of paging are performed when a program tries to
access pages that are not currently mapped to physical memory (RAM).
This situation is known as a page fault. The operating system must then
take control and handle the page fault, in a manner invisible to the
program. Therefore, the operating system must:

1. Determine the location of the data in auxiliary storage.


2. Obtain an empty page frame in RAM to use as a container for the data.
3. Load the requested data into the available page frame.
4. Update the page table to show the new data.
5. Return control to the program, transparently retrying the instruction that
caused the page fault.

www.firstderivatives.com
Page Table

www.firstderivatives.com
Memory Mapping
• A memory-mapped file is a file that has been mapped
(i.e., not copied) into virtual memory such that it looks as
though it has been loaded into memory. Rather than actually
being copied into virtual memory, a range of virtual memory
addresses are simply marked off for use by the file.

• You can then access the file as though it were memory

• The OS transparently loads parts of the file into physical


memory as you access them. You don’t have to concern
yourself with which parts are and are not in memory at any
given time.

www.firstderivatives.com
Memory Mapping
• As each page of the file is accessed and copied into memory so that the
CPU can access it, it bypasses the paging file.
• In this sense, the file takes the place of the paging file as the backing
storage for the particular range of virtual memory addresses into which it
has been mapped.
• Typically, the backing storage for virtual memory is the system paging
file. But with memory-mapped files, things change.
• The files themselves act as virtual extensions of the paging file and serve
as the backing storage for their associated virtual memory address range
for unmodified pages.

• So memory mapping allows us to skip the cumbersome process described


previously and is therefore much faster

www.firstderivatives.com
Memory Mapping
• In kdb+ we typically have HDB worker process with
the tables memory mapped

• This decreases query time significantly

• Note in kdb+ operations are one directional:


• Changes made to mapped data are not persisted
• Some changes aren’t allowed whatsoever
• Some are but change not made in persisted image

www.firstderivatives.com
Memory mapped Tables in kdb+
\l /db
t
/ this is unsafe as it will load all of t!
date ti p
-------------------------
2009.01.01 09:30:00 101
2009.01.01 09:31:00 102
2009.01.02 09:30:00 101.5
2009.01.02 09:31:00 102.5

`t insert (10:00:00;42.0)
'splay

www.firstderivatives.com
`t upsert (10:00:00; 42.0)
`t
t
ti p
--------------
09:30:00 101.5
09:31:00 33.5
10:00:00 42

www.firstderivatives.com
\l /db
t
ti p
--------------
09:30:00 101.5
09:31:00 33.5

www.firstderivatives.com
Questions?

• Slides will be sent out after the presentation

www.firstderivatives.com
SAP HANA
• In-memory Relational Database Management
System

• 100% ACID compliant

• Can support graph & text processing

• Takes advantage of multi core processors and solid-


state drives for best performance.

www.firstderivatives.com
SAP HANA

• Supports both row and column oriented physical


representations of relational tables

• Specify at table definition whether the new table is


to be stored in row or column oriented format.

www.firstderivatives.com
SAP HANA is an Appliance
• An appliance is the marriage of hardware and
software

• You could download SAP HANA on to your computer


but you wouldn’t see much performance
improvement

• Hardware needs to be tuned for the software &


software adjusted for the hardware

www.firstderivatives.com

Vous aimerez peut-être aussi