Google Cloud

2010 Second WRI World Congress on Software Engineering
Google Cloud Computing Platform Technology Architecture and the Impact of Its Cost
JIA Xiaojing
Central University of Finance and Economics Beijing, China, 100081
AbstractThis paper compares the technology architecture between Google cloud computing platform and traditional IT system, and posts that the key of extremely low cost of Google cloud computing platform is applying the Top-down " design method to infrastructure construction. Keywords-Cloud Computing; Cost; Technology Architecture
Scheduler
Chubby
GFS master Node
Application Node
Google Cloud Infrastructure
User
Node Node
BigTable Server MapReduce Job Scheduler slave GFS chunkserver
I.
INTRODUCTION
No doubt, cloud computing is the most popular topic of the IT industry in 2009. Google, Amazon , Yahoo and other Internet service providers, as well as IBM, Microsoft and other IT companies have proposed their own cloud computing strategy, while various telecom operators paid great attention to cloud computing, especially extremely low cost of Google cloud computing platform is the focus. For example, Google claims that computational cost is only competitors 1 / 100, and storage costs is only competitors 1 / 30 due to the use of cloud computing. If it is true, how did Google do it? This paper attempts to compare the technology architecture between Google cloud computing platform and traditional IT system on the in-depth analysis of the key technology about extremely low cost of Google cloud computing platform, and find out the fundamental reason of extremely low cost of Google cloud computing platform in both computing cost and storage cost. II. THE KEY TECHNOLOGY OF GOOGLE CLOUD
COMPUTING PLATFORM
Linux
Figure 1. the technology architecture of Google cloud computing platform
Google cloud computing platform is built on the cluster of a large number of x86 servers; Node is the basic processing unit. Its overall technology architecture is shown in Fig. 1. In addition to little nodes for specific management functions (such as GFS Master, Chubby and Scheduler, etc.), all the nodes are isomorphic in the technology architecture of Google cloud computing platform, that is the core function module that runs Big Table Server, GFS Chunk server and Map Reduce Job at the same time is corresponded by three key technologies of data storage, data management and programming models. So they are the focus of the study in this paper.
A. Data Storage Technology Web search business needs massive data storage, but also needs to meet high availability, high reliability and economy, etc. So Google developed a distributed file system based on several assumptions, Hardware failures are the norm. Supporting large data sets. Processing mode of write once and read many. High concurrency. GFS is constituted by a Master and a large block of servers. They are shown in Fig. 2. Master stores all the Meta data of file system including namespace, access controlling, file block information, the file block location information etc. GFS file are cut into 64MB blocks for storage. To ensure reliability of data, GFS file system uses a redundant storage means. Each data is saved by more than 3 copies in the system, including two copies in different nodes of the same rack in order to take full advantage of the bandwidth within the cabinet while the other copy is stored in different nodes of the rack. To avoid a large number of reading operations make Master be the bottleneck to the system, the client does not read data through the Master but interacts the block server directly after obtaining the location information of the target data block from the Master. GFSs writing operation separates control signal from data flow. This means that the client transfers data to all the copies after obtaining Master's written authorization, requests control signal after received the revised data by all the copies, and receives the writing operation by the major copy to carry the control signal out after updated the data by all the copies.
Funded projects: National Natural Science Foundation (70801067), Humanities and Social Science Fund Project under the Ministry of Education of the PRC (07JC630052), the Special Project for Youth under the Ministry of Education of the PRC (EFA080250)
978-0-7695-4303-1/10 $26.00 2010 IEEE DOI 10.1109/WCSE.2010.93
17 13
Through the co-design of server and client, GFS gets the optimization about performance and availability on application support. A number of GFS clusters are deployed in the Google cloud computing platform, some clusters have more than 1,000 store nodes and more than 300T hard disk space, which are visited frequently and continuously by hundreds of clients of different machines.
Figure 3. Logical Architecture of Big Table
Figure 2. System structure of GFS
B. Data Management Technology Google has developed a large-scale database system Big Table with the weak consistency requirements. Big Table is optimized for data read operation, which adopts the columnbased distributed data storage management to improve data access efficiency. It is basic elements: line, column, record tablet and time stamp board. Among them, the record tablet is a collection of lines. It's shown in Fig. 3. Data items in Big Table are arranged according to the order of line keyword in the dictionary. Each row is assigned dynamically into record tablet. Each tablet server node is responsible to manage for about 100 record tablets. Time stamp is an integer of 64 bit, which indicates different versions of the data. Column family is a collection of several columns, Big Table access permissions is controlled in family size for the column. Big Table system depends on the underlying structure of the cluster system, which includes a distributed cluster task scheduler; the GFS file system which has been addressed above, and a distributed lock service Chubby. They are seen in Fig. 4. Big Table use Chubby which is a very robust coarse-grained lock to save the pointer of Root Tablet, and use one server as master server to conserve and operate metadata. When a client reads data, first Root Tablet location information and meta-data table Metadata Tablet location information are obtained from the Chubby Server, second the User Table of location information including the target data location information are read from the Metadata Tablet, and then the target data location information item are read from the User Table. Big Tables main server not only manages the metadata but also is responsible to remote manage remotely and allocate for the Tablet Serve; Client-side proceeds control communication with the main server to obtain metadata through the programming interface, and proceeds data communication with the Tablet Server that is responsible to deal with specific read and writing requests.
Figure 4. Big Table storage services architecture
C.
Programming Model Google constructs Map Reduce programming framework to support parallel computing. Map Reduce is not only the programming model that processes and brings about a large data set but also an efficient task scheduling model. It is through two simple concepts about the "Map (map)" and "Reduce (simplified)," that constitute the operation basic unit. The programmer can finish the distributed parallel program development who specifies the data processing to each block data in the Map function and how to regress the intermediate results of the block data processing in the Reduce function. When Map-Reduce program is running in the cluster, programmers do not care how to block, allocate and schedule the input data; even more the system will manage the node failure in clusters and communication between nodes. The implementation of a Map Reduce program requires five steps: inputting file, one file being assigned to many workers parallel, writing intermediate file (local writing), many Reduce workers running, outputting the final result. Writing intermediate file locally decreases the pressure on the network bandwidth and reduces the time on writing intermediate file; When Reduce is doing, it reads the data needed from the node where the intermediate file is with remote procedure call according to the location information of the intermediate file obtained from the Master. III. THE DIFFERENCE BETWEEN GOOGLE CLOUD COMPUTING PLATFORM AND TRADITIONAL IT
SYSTEMS
Traditional IT systems, especially large-scale IT systems, almost all are built on high-performance UNIX server cluster. Its architecture has experienced the development stages of a host / terminal and Client / Server etc. With the development of the Internet, three-tier Browser
18 14
/ Server architecture is used by the Most of the mainstream of IT systems at present The difference between Google cloud computing platform and traditional IT systems will be analyzed from three aspects about the data storage, data management and programming frameworks in the following. A. Data Storage Technology For data storage technology, the storage reliability, I / O throughput and scalability are the core of technical indicators. Data storage technologies in traditional IT systems include direct-attached storage DAS, network access storage NAS and storage area network SAN etc. In the Google cloud computing platform architecture, the single node is used in inexpensive x86 servers. Each node is in charge of computing, while it is necessary to manage the data stored by the node through the GFS Chunk server. In other words, computing and data maintain consistency. GFS renounce the use of RAID technology, but adopted a simple method of redundant storage. Not only storage reliability is met, but also the requirement of the read performance is improved effectively. In order to reduce the processing load of a single node, the amount of data managed by a single Google node is generally smaller than the bare 1TB. However it can satisfy the requirements of mass data storage that a large number of nodes process parallel. B. Data Management Technology Traditional IT systems adopt a mode of centralized data storage, and the centralized data management is realized by using RDBMS relational database management system. In order to avoid database server being the bottlenecks to system performance, they apply mainly the data caching, indexing and data partitioning techniques. nevertheless these technologies mentioned above are difficult to play its full role because Google needs to search a large number of fulltext for web search and other applications pay loaded by the cloud computing platform, Big Table designs a simplified structure of the table for the characteristics that data reading operation is high proportion in its applications, and adopts column-based distributed data storage management, which satisfy the massive data management, high concurrency, and stringent response time requirements very well. Traditional IT systems generally take the method that task is divided in the server clusters to reduce database server burden and improve overall system performance. For example, in the B/S system structure, Web server is responsible for receiving users requests from the browser, application server is responsible for calling the appropriate business logic components to complete processing, database server only needs to implement the functions about the database querying, amending and updating etc.. However, for Google cloud computing platform, the traditional B/S structure is difficult to significantly reduce database server load because applications such as Web search is generally easy reading operation not including complex business logic. Thus Google will introduce the parallel computing to database systems, data scattered in a large number of
completely homogeneous nodes, and provides data management services by means of Tablet Server, which will distribute evenly the processing load in each node. This greatly improves the performance of database systems. C. programming framework In the traditional IT systems, concurrency is a common programming framework in order to make full use of the advantages of UNIX multi-tasking operating system, such as the technology of multi-process and multi-threading etc. to improve processing performance. Compared the framework of Google's Map Reduce programming, the differences is mainly as follows. In parallel execution mode, the data is centrally managed (usually RDBMS is responsible for completing it). Each application can directly manipulate data in the database, and the database system is responsible to ensure the consistency and integrity of data; In Map Reduce model, because data is managed dispersedly by each node, there is no separate, centralized database system, and each node can only operate the data managed by it. Consequently it needs the intervention of upper application software to ensure consistency and integrity of across-node data; The system in the Map Reduce mode increases the map that decompose the task, the Reduce other processing link about the protocol of results, and parallel processing to support multiple worker node, while need to complete the failure handling about worker node, as well as coordination and communication between the worker nodes and so on. Therefore the system processing load increases. But it is totally worth considering the enormous advantages that the large-scale parallel processing brings. In fact, Map-Reduce programming model is applied not only to Google cloud computing platform, multi-core and multi-processor, cell processor, but also the heterogeneous cluster. However, the programming mode is only applied to compile the loosely coupled within the task, to the program in a high degree of parallelism. It is the future about the Map-Reduce programming mode that how to improve the programming mode in order that programmers can easily write tightly-coupled program and that task is scheduled and executed efficiently when running. In the meanwhile, it still needs to continue to be improved and enhanced that the development tools about Hardtop of Map Reduce is deficient, especially the scheduling algorithm is too simple, and the algorithm that judges the need of inferring the implementation of the task causes too task to be judged the speculating, reducing the overall system performance. IV.
COST ANALYSIS OF GOOGLE CLOUD COMPUTING PLATFORM
The unique technology structure of Google cloud computing platform has a profound impact on its overall cost, mainly in:
19 15
Due to the distributed data storage and data management, Google reduces the capacity requirements on a single node, it doesnt need to buy expensive UNIX servers and SAN storage devices, but the low-cost consumer-level X86 chipset and built-in hard drive to build a server cluster, significantly reducing construction investment. According to China Mobile Research Institutes estimate, its investment in equipment is only 1 / 6 of UNIX platforms provided that the same processing capacity is satisfied. In addition to a small number of management nodes on the Google cloud computing platform, all nodes are isomorphic, while they commit the functions of data storage, data management and task management etc.. So it is easy to achieve standardization of equipment, by way of bulk purchases of tailored computer motherboard, and cutting all independent components on calculation (such as monitors, peripheral interfaces, and even chassis shell, etc.), reducing the investment in equipment fatherly. Google cloud computing platform takes hardware failure as the norm and turns to switch automatically between nodes to achieve high availability by means of software fault-tolerant, significantly reducing equipment redundancy. For example, supposing that the reliability of a single node is 95%, the system reliability is 99.75% when traditional IT systems is in the mode of 1 +1 backup, but now the equipment redundancy is 100%. However, the Google cloud computing platform increases only 12% of equipment redundancy with 100 nodes to achieve the same level of reliability, and equipment investment decreased by 44%. Reduction of redundant equipment brings equipment utilization to improve. Google claims that its equipment utilization rate could reach 280% of enterprises. Based on the unique advantages of parallel computing, Google has developed excellent load balancing technology that enable to ensure business continuity in the means of dynamic load switching in different data centers worldwide. Therefore, for a single data center, the requirements on the power and air conditioning and other ancillary equipments are significantly reduced. The data center of Google only installs a small backup battery on the server motherboard and abandons the Uninterruptible Power System (UPS) needed by the traditional IT systems, while the data center is built on the cold mountains and other areas, and air conditioning system of private computer room is changed to the groundwater cooling system etc. These greatly reduce the construction investment and operating costs on matching equipment. Google claims that the average PUE of six data centers is 1.21. PUE of the best data center is 1.15 in a year, and the PUE of a particular quarter is 1.13. But the PUE of traditional data center is 3.0 or more in general.
Google cloud computing platform applies the software of Linux and proprietary components developed by them, almost no investment in software. But for traditional IT systems, operating systems, database and middleware software etc. account for more than 15% of construction investment in general, but also operational cost that upgrading and maintenance supporting has to be paid impacts significantly on its cost. Based on the above analysis we can conclude that low computing and storage cost claimed by Google is entirely feasible. It fully shows that the technology architecture plays a decisive role on the cost of IT system. V. CONCLUSIONS Through analyzing the platform technology architecture of Google, three basic characteristics are be found, namely: system built on large-scale clusters of cheap servers; build synergies between infrastructure and the upper application in order to achieve maximum efficiency in utilization of hardware resources; and the methods to achieve faulttolerant nodes through software, which have a strong contrast to traditional IT systems based on high-performance UNIX servers clusters. In fact, the difference of this platform technology architecture comes from a completely different design. Traditional IT systems apply the "bottom-up" design method to load the upper application by way of layer stacking, stresses that infrastructure is transparent to the application, centralized management is tiered, as well as interconnect heterogeneous devices is realized through industry-standard. It is essentially a common platform for cloud computing platform. Google applies the Top-down design method, namely, from the upper application, the infrastructure is reconstructed based on the operational characteristics about the specific application (rather than the optimization of a general sense). It is a proprietary platform in essence, which is cloud computing platform, which is the basic reason that Google has a very low computational cost and storage cost. REFERENCE
[1]
[2]
[3] [4]
Mike Burrows. The Chubby lock service for loosely-coupled distributed systems. 2006, http://labs.google.com/papers/chubby-osdi06.pdf Dean J, Ghemawat S. Distributed programming with Mapreduce. In: Oram A, Wilson G, eds. Beautiful Code. Sebastopol: OReilly Media, Inc., 2007. 371384. Chen Kang, Zheng Weimin. Cloud computing: Examples and Research Systems, Software, May 2009. Chen Quan, Deng Qian-Ni. Cloud computing and its key technology, Computer Applications, in September 2009.
20 16

Google Cloud

Transféré par

Informations du document

Description originale:

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Google Cloud

Transféré par

Droits d'auteur :

Formats disponibles

2010 Second WRI World Congress on Software Engineering

GFS master Node

Google Cloud Infrastructure

BigTable Server MapReduce Job Scheduler slave GFS chunkserver

Figure 1. the technology architecture of Google cloud computing platform

978-0-7695-4303-1/10 $26.00 2010 IEEE DOI 10.1109/WCSE.2010.93

Figure 2. System structure of GFS

Figure 4. Big Table storage services architecture

Vous aimerez peut-être aussi