Vous êtes sur la page 1sur 13

04/03/2022

Lesson : HBASE (Hadoop


Database)

1/

1
04/03/2022

Datastores systems

⚫ RDBMS (Relational Database Management System) :


− Increased complexity of SQL
− Sharding introduces complexity (sharding is the process of making
partitions of data in a database or search engine, such that the data is divided
into various smaller distinct chunks, or shards)

− Single point of failure


− Backups more complex
− Operational complexity added

⚫ NoSQL :
3/

NoSQL datastores

− NoSQL :
⚫ Aka (also known as) "Not only SQL," aka "Non-
relational" was specifically introduced to handle the rise
in data types, data access, and data availability
⚫ Why consider NoSQL?
− Flexibility
− Scalability - they scale horizontally rather than
vertically
− Availability
− Lower operational costs
− Specialized capabilities
4/

2
04/03/2022

NoSQL datastores

Four types of NoSQL datastores:

❑ Key-value stores: MemcacheD, REDIS, and Riak


❑ Graph stores: Neo4j and Sesame
❑ Column stores: Hbase, Cassandra
❑ Document stores: MongoDB, CouchDB, Cloudant,
and MarkLogic

5/

3
04/03/2022

Key-Value store

Un système clé-valeur agit comme une


énorme table de hachage distribuée sur le
réseau.
La clé identifie la donnée de manière unique et
permet de la gérer. La valeur contient
n'importe quel type de données.

7/

Stockage orienté clé-valeur

4
04/03/2022

Documents store
❑ ça repose également sur le paradigme [clé,
valeur], où la valeur dite document a une structure
arborescente: elle est formée d’une liste de couples
‘’champ’’:’’valeur’’.
Le format du document est principalement de
type JSON ou XML, et il est compréhensible par le
système.

❑ Terminologie:
o collection: l’équivalent d’une table
o document: l’équivalent d’un enregistrement. Mais
les enregistrements d’une même collection n’ont
pas nécessairement la même structure
9/

Stockage orienté documents

10

5
04/03/2022

Stockage orienté graphe

11

• On utilise notamment les bases de données NoSQL pour les Data


Stores distribués aux besoins élevés en capacité de stockage.

Les géants de la technologie comme Twitter, Facebook ou Google


collectent chaque jour plusieurs terabytes de données sur leurs
utilisateurs

12

6
04/03/2022

Hbase

⚫ Column-oriented
⚫ HBase is an open-source, non-relational,

distributed database modeled after Google's


Bigtable and written in Java
⚫ Allows random, realtime read/write access to Big

DataFile format
⚫ Approriate for OLTP

⚫ Main Purpose of Hbase is read and write more

number of data sets

14 /

14

15

7
04/03/2022

16

17

8
04/03/2022

Creating tables in Hbase

⚫ Starting Hbase

>/usr/bin/hbase shell

⚫ Creating the table and the family columns

create <table name>, <column family 1>, <column family 2>, ...

> create 'products', 'characteristics’, 'inventory'

⚫ Adding the column with data

put <table name>, <row key>, <column family: column>

> put 'products', '123', 'characteristics:description', 'skateboard'


19 /
> put 'products', '123', 'inventory:count', '100'
19

Hbase>create 't1', 'cf1', 'cf2', 'cf3’


➔ Ça fait la création d’un répertoire t1dans
/apps/hbase/data/data/default/t1/

➔et la création d’un sous répertoire pour chaque family colunm:


/apps/hbase/data/data/default/t1/8a45456f26ee4569360c6a
f03e893ed6/cf1
/apps/hbase/data/data/default/t1/8a45456f26ee4569360c6a
f03e893ed6/cf2
/apps/hbase/data/data/default/t1/8a45456f26ee4569360c6a
f03e893ed6/cf3

20

9
04/03/2022

The physical storage of a table

Row CF1 CF2 FC3


key C1 C2 C3 C4 C5 CC6
ra val
rb
rc
rd
re
rf

File 1 for CF1: File 2 for CF2: File 3 for CF3:


ra CF1:C1 version val ra CF2:C3 version val ra CF1:C1 version val
ra CF1:C2 version val ra CF2:C4 version val ra CF1:C2 version val
rb CF1:C1 version val rb CF2:C3 version val rb CF1:C1 version val
rb CF1:C2 version val rb CF2:C4 version val rb CF1:C2 version val

rf CF1:C2 version val rf CF2:C4 version val rf CF1:C2 version val

21

22

10
04/03/2022

Notion of Regions

26 /

26

-there is no notion of data type (int, string, ..)


-the name of the column family and column quantifier are stored in each key
-Hbase does not use schema 27 /

27

11
04/03/2022

28

31

12
04/03/2022

ACID properties
Atomicity guarantees that each transaction is treated as a single "unit", which
either succeeds completely, or fails completely: if any of the statements
constituting a transaction fails to complete, the entire transaction fails and the
database is left unchanged.

Consistency ensures that a transaction can only bring the database from one
valid state to another, maintaining database invariants: any data written to the
database must be valid according to all defined rules.

Isolation,Transactions are often executed concurrently (e.g., reading and


writing to multiple tables at the same time). Isolation ensures that concurrent
execution of transactions leaves the database in the same state that would
have been obtained if the transactions were executed sequentially.

Durability guarantees that once a transaction has been committed, it will


remain committed even in the case of a system failure .

33

13

Vous aimerez peut-être aussi