Vous êtes sur la page 1sur 5

| Login | Register



Documenta on

Dev Zone



Availability and Opera onal Stability of NoSQL

posted 2 years ago in Dev Pla orm category by Hye Jeong Lee
Like One person likes this. Sign Up to see what your friends like. Tweet

What are the advantages of NoSQL compared to RDBMS? NoSQL products oer a number of advantages that are not oered by RDBMS products, including high performance, scalability, and availability. However, there is no product that is perfect in all aspects. When you closely examine NoSQL products, you can nd some weak points as well as some outstanding benets. For this reason, it is cri cal to use veried NoSQL products. In this ar cle, I will analyze the distribu on and availability of these products from the opera onal aspect . The selected targets are Cassandra, HBase and MongoDB . We have already covered these three solu ons in What is NoSQL for? and NoSQL Benchmarking. You can refer to these ar cles for introduc on and performance comparison.

Fail-over and Data Consistency of Cassandra

Cassandra shows excellent performance in data distribu on and availability. First, I will examine its distribu on capability. Cassandra distributes data by using the consistent hashing.

By using consistent hashing, the client can search and nd the node where the key is saved without querying the metadata. The client can nd the key by calcula ng the hash value of the key, and nd the node with only that hash value. One can think of consistent hashing as a series of hash values sequen ally placed in a ring shape, with each node processing a sec on of the ring. If a node is added to the ring, the resource of a specic node (with a large amount of data) is split and assigned to the new one. If a node is removed, the resource assigned to that node is assigned to a neighboring node. In this way, Cassandra minimizes the number of nodes aected by adding/removing nodes. Cassandra runs without a master server. In other words, there is no specic server that manages data distribu on or failover. That means Cassandra has no Single Point Of Failure (SPoF). Instead of the master server, each node periodically shares metadata with others. This is called the gossip protocol. With the gossip protocol, a node can check whether another node is alive or dead. Cassandra improves its availability by providing consistency levels. When this level is low, there may be no service down me even when a node is down. For example, when one of the three nodes storing the replicated data (key) is down, a generic write request will not immediately return success because the replicate data cannot be wri en to the troubled node at the me of the request. However, when the consistency level is set as the number of quorum or 1, and the number of alive nodes is as many as the set value, success is returned immediately. For this reason, a request error will occur only when all the three nodes are down simultaneously.
But then, is it really true that data read/write is not aected by failed nodes?

To prove this, I reproduced a node failure under a constant stream of service requests while adding new nodes. The results were as follows:

Removing a Node and Adding a New Node

The following is the result of removing an exis ng node and adding a new node.
converted by Web2PDFConvert.com

1. When explicitly removing a node in the management tool, the data stored in the node is migrated to the remaining nodes and then the node is removed. 2. When a new node is added, which is called bootstrapping , the added node communicates with the seed nodes to report that it has been added. Based on the congura ons, the new node will bootstrap either to the range on the ring specied in the congura ons, or to the range near the node with the most disk space used, that does not have another node bootstrapping into that range. 3. The data is migrated from that node to the new node. 4. The new node becomes available once the migra on process is complete.

Adding a New Node a er a Node Failure

The following is the result of adding a new node a er a node failure. 1. When a node is down, the data stored in that node is not migrated to other nodes, and the service is con nued with two replicas. In other words, no error will be returned, even when service requests are received during this me. 2. When a new node is added, the added node is assigned to a specic area of the ring. However, bootstrapping is not performed. Bootstrapping can be performed only when the number of data replica ons to be migrated is three . 3. The added node has no data in it, but it handles the request because it can provide service. If a read request is received at this me, the node returns no data for the key. If the replica factor is 3 and the read consistency level is 1, 1/3 of the read requests may return no data. And if the consistency level is set to the value of quorum, 1/6 of the read requests may return empty data. In short, no read consistency is assured un l the fail-over has been recovered. At the actual level 1, the coordina ng node is most likely to receive the response from the new node rst. This is the case because there is no I/O from the new node - it has no data. For this reason, a new node has a higher chance of returning empty data than the exis ng nodes. 4. When Read Repair-ing the new node by using the management tool, the node is built with replica on data read from other nodes. The read consistency is broken un l the Read Repair is complete. Cassandra can provide service without error, even when a node fails. Although Cassandra shows good performance when wri ng data, this is not so when reading data, because prolonged Rread Repair means prolonged data inconsistency. Therefore, to maintain read consistency during node failure, the following method should be applied. Set the read consistency level to ' all' and execute a read. In this case, the latest data from all replicas can be obtained. If a read is failed, Cassandra retries that read. This is because a Rread Repair at the rst read may be used as the source of the restored data at the second read. However, this method assumes that Rread Repair is completed before the second read. (When the consistency level is low, the read repair is performed in the background, in a thread separate from the read opera on processing thread.)

Failure Factors and Recovery Methods of HBase

HBase consists of several components, which are shown below (gure from HBase: The Deni ve Guide):

HRegionServertakes care of data distribu on, while HMastermonitors HRegionServer. HDFS stores and replicates data, and Zookeeperkeeps the loca on informa on of HMaster and elects a master. If redundancy is not established for each component, all of the components become SPoF. HRegionServer can be detailed as follows: HRegionServer distributes data in a unit called a ' region .' A region is the result of dividing a big table where the sorted data is stored by the sor ng key range (like a tablet in a big table). The key range informa on of each region is stored in a separate region, called the meta region . The region where the loca on of the meta region is stored is called the root region . In short, the region server stores a hierarchical tree consis ng of root regions, meta regions, and data regions. If a region server is down, the region that the failed server covers is unavailable un l that region is assigned to another server. Therefore, service down me occurs un l the region is recovered.
If this is the ca s e, then how long will this down me be?
converted by Web2PDFConvert.com

Let's es mate the down me while examining the failover process.

Region Server Failure

When a failure occurs in a region server, the data is recovered through the steps described below: 1. HMaster detects failure and directs one of the other servers to perform the service of the failed server. 2. The directed HRegionServer rst reads the WAL (Write Ahead Log) of the new region, and recovers the MemStore of that region. 3. Once MemStore is completely recovered, HMaster modies the meta region that stores the loca on of the region to restart the service of the region. 4. The data of the region stored in the disk will be freshly recovered by HDFS. In the end, the recovery me required is as long as the me it takes to detect a failure, read the log and create a new region. Since the server assigned to the recovered region can access the data le in the HDFS, no data migra on occurs on the HDFS. Therefore, the down me is not signicantly long.

HDFS Failure
The HDFS consists of one name node and several data nodes. Here, the name node is the node that stores meta data. So when this is down, service failure will occur. However, if one of the data nodes is down, no service failure will occur because the data has the replica. But the data stored in the failed data node will be built by one of the other nodes to recover the replica factor to normal (recovery). At this me, a huge data replica on may occur, slowing down any read requests from the service or applica on. This is because the disk I/O for read is aected by the data replica on.

Replica on and Failover of MongoDB

MongoDB asynchronously replicates data from the master to the slave. The advantages of asynchronous replica on are that it does not degrade performance of the master and the service performance does not degrade when a slave is added. However, the data will be lost when a failure occurs, because the data is inconsistent. MongoDB recovers from failure in a similar way as the HA of DBMS. It elects a master when a failure occurs. Let's take a look at the two scenarios below.

Node Failure
Congure three nodes with one master and two slave nodes. Stop the master node. One of the two slaves will be automa cally elected as a master. The me it takes to elect a new master when a failure occurs is a couple of seconds. This down me is not that long. However, once the nodes are congured as a master and slaves and then the master is down again, no master is elected again.

Adding a Node
Enter data in the master. Assume that the size of the data is 5 GB, which is smaller than the memory size. Then, add a new slave to the master. In this case, adding a new slave does not degrade the performance of the master. It takes several minutes for the added slave to replicate all data. In MongoDB, the degrada on of performance due to a failed or added node is minimal. However, if a failure occurs in the nodes while the replicas of the master and the slave are inconsistent, the data that has not been replicated by the slave may be lost. In MongoDB, the master writes the opera on history to the Oplog log in the local server and then the slave reads the log and stores it in its database to replicate. If a failure occurs while the slave has not yet nished reading the log of the master, the unread data will be lost. In addi on, if the master log is full while the slave has not nished replica ng the content, all the data in the master is read and stored in the slave, rather than being replicated in the log. This is called data sync. If a failure occurs in the master in the situa on above, a large amount of data will be lost.

So far, I have reviewed failovers of Cassandra, HBase and MongoDB. Cassandra oers high availability for Write opera ons. However, it takes a long me to recover data from a failure. This is because Cassandra iden es all the data to recover, and then reads and writes the latest version of each data. Also since it responds to service requests while the added node is s ll in the process of data recovery, an incorrect read result may be returned. In addi on, Rread Repair is executed twice when reading the data to recover. Although it provides hinted hando for opera on execu on failure as well as for node failure, an incorrect result may be returned if the data to be recovered is read rst. Therefore, if the consistency level is not raised, it cannot be used for the services that require read processing. Because of its congura on, HBase has many factors that may cause a failure. However, while Cassandra has to recover data during a failure, HBase does not need to recover data unless a failure occurs in the HDFS. This gives HBase a short down me. The down me during an HDFS failure is not that long either. The read performance may be hit while recovering data, but the data consistency is maintained. In this way, higher availability is oered if the SPoF part can become redundant.
converted by Web2PDFConvert.com

MongoDB provides automa c failover and has a short down me. However, its asynchronous replica on method may cause data loss a er a failover. Thus, before choosing the database solu on that will suit your purpose, you should consider these characteris cs of each product. For reference, CUBRID RDBMS provides synchronous High-Availability for data consistency which results in no data loss, though it lacks the performance of NoSQL solu ons.

See also

The Story behind LINE App Development

Dev Pla orm If you have a smartphone, you must be using LINE, KakaoTalk, or Whatsapp messaging app, if not all of them. But unli...
9 months ago by Esen Sagynov 0 10817

NoSQL Benchmarking
Dev Pla orm NoSQL is the talk of the town. And we have already covered what it is for in one of our previous blogs. Today I would ...
2 years ago by Hye Jeong Lee 16 43183

Log Analysis System Using Hadoop and MongoDB

Dev Pla orm I have been searching for a way to process the exponen ally increasing service logs at a low cost. For the past several mo...
2 years ago by Jeong Hyun Lee 5 25530

What is NoSQL for?

Dev Pla orm In a social network service where many people form rela onships, a connec on becomes the informa on. A fundamental method ...
2 years ago by Kyu Jae Lee 5 14605

Overview of New High-Availability Features in CUBRID 8.4.0

CUBRID Life Perhaps, the HA (High-Availability) feature is one of the most important in any fully-featured database management system. It ref...
2 years ago by Esen Sagynov 2 14125

converted by Web2PDFConvert.com

Leave a message...
Discussion Community Share


What's this?

Understanding TCP/IP Network Stack

1 comment 12 days ago

The Principles of Java Application Performance Tuning

1 comment a month ago

Comment feed

Subscribe via email

About CUBRID | Contact us | 2012 CUBRID.org. All rights reserved.

converted by Web2PDFConvert.com