Académique Documents
Professionnel Documents
Culture Documents
What is Hbase?
The Hadoop database, distributed, scalable, big data
store.
An open-source, versioned, non-relational database
Random, realtime read/write access to your Big Data
Hosting of very large tables -- billions of rows X
millions of columns -- atop clusters of commodity
hardware.
Difference Between HBase and Hadoop/HDFS?
HDFS is a distributed file system that is well suited
for the storage of large files.
Hbase,built on top of HDFS and provides fast record
lookups (and updates) for large tables.
Features
Linear and modular scalability.
Strictly consistent reads and writes.
Automatic and configurable sharding of
tables
Automatic failover support between
RegionServers.
Easy to use Java API for client access.
Block cache and Bloom Filters for realtime queries.
HBase Overview
HBase Data Model
2
1
Table is
lexicographicall
y sorted on
rowkeys
Rowkey
Age
Password
Alice
alice@wonderland.com
23
trickedyou
newpassword
Bob
bob@myworld.com
25
Iambob
Eve
hithere@getintouch.com
30
nice1pass
Cells
4
ts1 = 1
ts2 = 2
ZooKeeper
-Root-
Andy
Arch
Brad
Arch
Dheeraj
Ops
Eleanor
PgM
Francis
Dev
Govind
Dev
Rajiv
Ops
Sumeet
PM
Vandana
Dev
RS1
T1R1
M1
Client contacts
ZooKeeper, a
separate cluster
of ZK nodes
Retrieve RS
hosting ROOTregion
(Row/ Meta region)
RS1
(Row/ table region)
RS2
T1R2, T1R3
M2
RS2
T1R1
RS1
T1R2
RS3
RS2
T1R3
Find Sumeets
role with HBase
RS2
Master Server
Master
The Master server is responsible for monitoring all RegionServer instances in the
cluster, and is the interface for all metadata changes.
Startup Behavior
If run in a multi-Master environment, all Masters compete to run the cluster.
If the active Master loses its lease in ZooKeeper (or the Master shuts down), then
the remaining Masters jostle to take over the Master role.
Runtime Impact
Client talks directly to the RegionServers, the cluster can still function in a "steady
state".
However, the Master controls critical functions such as RegionServer failover and
completing region splits. So while the cluster can still run for a short time without
the Master, the Master should be restarted as soon as possible.
Processes
The Master runs several background threads:
LoadBalancer: Periodically, and when there are no regions in transition, a load balancer will
run and move regions around to balance the clusters load.
CatalogJanitor: Periodically checks and cleans up the hbase:meta table. See
<arch.catalog.meta>> for more information on the meta table.
HBase Overview
HBase High-level
Architecture
Source: http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html
12
Store
MemStore
The MemStore holds in-memory modifications to the Store
When a flush is requested, the current MemStore is moved to a
snapshot and is cleared.
HBase continues to serve edits from the new MemStore and backing
snapshot until the flusher reports that the flush succeeded.
At this point, the snapshot is discarded. Note that when the flush
happens, MemStores that belong to the same region will all be
flushed.
MemStore Flush
When a MemStore reaches the size specified by flush.size
When the overall MemStore usage reaches the value specified by
upper limit.
When the number of WAL log entries in a given region servers WAL
reaches the value specified in hbase.regionserver.max.logs. The flush
order is based on time
Compactions
Minor compactions
Select a small number of small, adjacent StoreFiles and rewrite them as a single
StoreFile.
Do not drop (filter out) deletes or expired versions, because of potential side
effects.
The end result of a minor compaction is fewer, larger StoreFiles for a given Store.
Major Compaction
The end result of a major compaction is a single StoreFile per Store.
Major compactions also process delete markers and max versions
Major Compactions Can Impact Query Results
Major compactions improve performance.
Highly loaded system -> major compactions can adversely affect performance.
Default: run once in a 7-day period. This is sometimes inappropriate for systems
in production.
HBase Operations
get(<ROW>)
put(<ROW>, Map<KEY,VALUE>)
scan(<TABLE>)
Delete()
increment()
check HTable class for further details on operations
Caution:
No queries
No secondary indexes
Namespace Management
A namespace can be created, removed or altered.
Namespace membership is determined during table creation by
specifying a fully-qualified table name of the form:
Predefined namespaces
There are two predefined special namespaces:
hbase - system namespace, used to contain HBase internal tables
default - tables with no explicit specified namespace will
automatically fall into this namespace
Examples
#namespace=foo and table qualifier=bar
create 'foo:bar', 'fam'
#namespace=default and table qualifier=bar
create 'bar', 'fam'
Current Limitations
Deletes mask Puts
Delete masks puts even if Puts that happened after the delete was entered.
Deletes are handled by creating new markers called tombstones. These tombstones, along with
the dead values, are cleaned up on major compactions.
These issues should not be a problem if you use always-increasing versions for new puts to a row.
Example
delete of everything T , put with a timestamp T.
Major compactions change query results
Create three cell versions at t1, t2 and t3, with a maximum-versions setting of 2. So when getting
all versions, only the values at t2 and t3 will be returned. But if you delete the version at t2 or t3,
the one at t1 will appear again. Obviously, once a major compaction has run, such behavior will
not be the case anymore
Column Metadata
No store of column metadata outside of the internal KeyValue instances for a ColumnFamily.
The only way to get a complete set of columns that exist for a ColumnFamily is to process all the
rows
Joins
Denormalize the data upon writing to HBase,
Have lookup tables and do the join between HBase tables in your application
Column Families
HBase currently does not do well with anything above two or three
column.
If one column family is carrying the bulk of the data bringing on
flushes, the adjacent families will also be flushed even though the
amount of data they carry is small.
When many column families exist, flushing and compaction
interaction can make for a bunch of needless i/o. Try to make do
with one column family if you can in your schemas. Only introduce a
second and third column family in the case where data access is
usually column scoped
If ColumnFamilyA has 1 million rows and ColumnFamilyB has 1 billion
rows, ColumnFamilyAs data will likely be spread across many, many
regions (and RegionServers). This makes mass scans for
ColumnFamilyA less efficient.
Rowkey Design
Hotspotting
Hotspotting occurs when a large amount of client traffic is directed at one
node, or only a few nodes, of a cluster.
This can also have adverse effects on other regions hosted by the same
region server as that host is unable to service the requested load.
Salting
Add a randomly-assigned prefix to the row
Increase throughput on writes, but has a cost during reads.
Hashing
Use a one-way hash and allow for predictability during reads.
Allows to reconstruct the complete rowkey to retrieve that row as normal.
Reversing the Key
reverse a fixed-width or numeric row key so that the part that changes the
most often (the least significant digit) is first.
This effectively randomizes row keys, but sacrifices row ordering properties.
Optimizations
Monotonically Increasing Row Keys/Timeseries Data
All clients pounding one of the tables regions (and thus, a single node), then next.
Solution: Randomize the input records
Avoid using a timestamp or a sequence (e.g. 1, 2, 3) as the row-key.
[metric_type][event_timestamp] : if there are dozens or hundreds (or more) of different metric
types.
Minimize row and column sizes
If your rows and column names are large, especially compared to the size of the cell value, then
you may run up against some interesting scenarios.
Compression will make for larger indices
Patterns for ColumnFamilies, attributes, & rowkeys be repeated several billion times in your data.
Shorter Column Families
Preferably, one character (e.g. "d" for data/default).
Shorter Attributes
Although verbose attribute names (e.g., "myVeryImportantAttribute") are easier to read, prefer
shorter attribute names (e.g., "via") to store in HBase.
Rowkey Length
Keep them as short as is reasonable such that they can still be useful for required data access (e.g.
Get vs. Scan).
Expect tradeoffs when designing rowkeys.
Enough hardware
Uses a variable schema where each row is slightly different.
Sparse Schema.
Most of the columns are NULL in each row. Eg, web map.
HBase Donts
Don't try to use as a MySQL replacement
Don't use it when you ONLY do large batch processing (raw
HDFS usually best)
May lose some data locality if major compaction has not recently run
Questions?
References
http://twiki.corp.yahoo.com/view/Gr
id/HBaseHome
http://hbase.apache.org/book.html