Vous êtes sur la page 1sur 98

01 infra TI

O que um Data Center?

Uma viso geral da complexidade de um centro de dados; Definio; Exemplos; Seus principais componentes; Outros
componentes e recursos; Tiers de Data Centers; Uma viso geral do curso, referncias e sua logstica.

Data center
Um centro de dados, ou data center, uma instalao que contm o
armazenamento de informaes e outros recursos fsicos de tecnologia da
informao (TI) para a processar, comunicar e armazenar de informaes.

Data center: imagens

Acesse: google.com data center > imagens
Para ver imagens de alguns principais data centers

Acesse: https://www.google.com/about/datacenters/inside/streetview
para fazer um passeio no data center do Google

Data center: Quantos servidores?


Microsofthas more than 1 million servers, according to CEO Steve Ballmer (July, 2013)

Facebook has hundreds of thousands of servers (Facebooks N. Ahmad, June 2013)

Akamai Technologies: 127,000 servers (company, July 2013)

Intel: 75,000 servers (company, August, 2011)

eBay: 54,011 servers (DSE dashboard, July 2013)

Data center: Quantos servidores?

Google: The company doesnt release numbers, but a recent report from energy expert Jonathan Koomey
estimated that Google had 900,000 servers, based on an extrapolation from data Google provided on its total
energy usage. Googles recently revealed container data center holds more than 45,000 servers, and thats a
single facility built in 2005.
Amazon: It runs the worlds largest online store and one of the worlds largest cloud computing operations.
Amazon says very little about its data center operations, but we know that it bought $86 million in servers from
Rackable in 2008, and stores 40 billion objects in its S3 storage service. A 2009 analysis by Randy Bias
estimates that 40,000 servers are dedicated to running Amazon Web Services EC2.
HP/EDS: While server ownership is less distinct with system integrators, EDS has an enormous data center
operation. Company documents (PDF) say EDS is managing 380,000 servers in 180 data centers.

Data center: Complexidade

Nesse tour voc pode perceber que um Data Center uma estrutura complexa
que envolve um grande volume de recursos, pessoas e tecnologias para
prover servios de processamento.

Data center: 5 Componentes Chave

Application: A computer program that provides the logic for computing operations
Database management system (DBMS): Provides a structured way to store data in logically
organized tables that are interrelated
Host or compute: A computing platform (hardware, firmware, and software) that runs applications
and databases
Network: A data path that facilitates communication among various networked devices
Storage: A device that stores data persistently for subsequent use.
These core elements are typically viewed and managed as separate entities, but all the elements must
work together to address data-processing requirements.

Data center: Outros Recursos

Embora os recursos anteriores sejam o principal foco desse curso, outros recursos ainda
precisam ser considerados no desenvolvimento e manuteno de um Data Center
Facilities: espao, instalaes fsicas, dispositivos de refrigerao etc.
Energia: fontes de energia prprias
Processos: Operao, Segurana, Provisionamento etc.
Pessoas: no final os responsveis por tudo isso

Data center: main components

FIGURE 4.1: The main components of a typical


Data center: typical components

FIGURE 1.1: Typical elements in warehouse-scale systems: 1U server

(left), 7 rack with Ethernet switch (middle), and diagram of a small cluster
with a cluster-level Ethernet switch/router (right).

Data center: typical components

FIGURE 4.2: Datacenter raised floor with hotcold aisle setup

Data center: energy

Currently, the typical 3-year cost (operating expenses + amortized capital expenses) of powering and
cooling servers is approximately 1.5 times the cost of the server hardware itself, and the projections
for 2012 go much higher. Energy efficiency measures are thus of high importance for data center
designers, operators, and owners.

Data center: Tier Classifications

Tier I datacenters have a single path for power and cooling distribution, without redundant
Tier II adds redundant components to this design (N + 1), improving availability.
Tier III datacenters have multiple power and cooling distribution paths but only one active
path. They also have redundant components and are concurrently maintainable, that is,
they provide redundancy even during maintenance, usually with an N + 2 setup.
Tier IV datacenters have two active power and cooling distribution paths, redundant components
in each path, and are supposed to tolerate any single equipment failure without
impacting the load.

Discusso e exerccios
Por que centralizar os recursos computacionais em um centro de dados?
Relacione isso com o fato de vrias empresas terem centros de dados
distribudos geograficamente.
Relacione (os principais) tipos de aplicaes fornecidas por um data center.
Se justifica um data center de um hospital (menos de 100 servidores) em tier
IV enquanto encontramos um data center de hosting (mais de 1000 servidores)
com tier II ou III ?

Leitura recomendada
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale
Luiz Andr Barroso and Urs Hlzle 2009

Viso Geral do Curso

Programa geral
Sistemas de Armazenamento
~ 5 semanas
Tipos de dados, Conexes SCSI e FC, Redes de Armazenamento SAN, NAS e CAS
Continuidade de Negcios
~ 2 semanas
Tipos de falha, Backup, Replicao de dados, Tempo de recuperao de falha
Computao em Nuvem
~ 3 semanas
Virtualizao, Computao em Nuvem, modelos de servio em Nuvem
Aspectos fsicos de um Data Center
~ 1 semana
Energia e refrigerao, eficincia energtica, Green Data Centers

Viso Geral do Curso

Information Storage and Management Storing, Managing, and
Protecting Digital Information in Classic, Virtualized, and Cloud
2nd Edition Edited by Somasundaram Gnanasundaram, Alok Shrivastava

UNIX and Linux system administration handbook

Evi Nemeth [et al.]. 4th ed. ISBN 978-0-13-148005-6

+Leituras recomendadas ao longo do curso

Viso Geral do Curso

2 Avaliaes Intermedirias
1 Avaliao Final
Mdia intermediria MI = ( 1 P1 + 2 P2 + 1 Atividades ) / 5
Mdia final MF = ( MI + PF ) / 2

02 infra TI

Armazenamento de Informaes

Crescimento dos dados e da importncia das informaes; Tipos de dados; Evoluo das tecnologias de
armazenamento; Estrutura e requisitos do data center; Ciclo de Vida da Informao;

Informao e dados
Informao : cada vez mais importante
Crescimento exponencial da importncia, do volume e
da dependncia do mundo corporativo por informaes
Aumentam, portanto, os desafios relacionados
proteo e ao gerenciamento dos dados

Crescimento exponencial
Computerworld - In 2011 alone, 1.8 zettabytes (or 1.8 trillion gigabytes) of data will be created, the equivalent to every
U.S. citizen writing 3 tweets per minute for 26,976 years. And over the next decade, the number of servers managing
the world's data stores will grow by ten times.

Ooops a break: KB, MB, GB, :-) B

Crescimento: exemplo 1
10,000,000,000 photos
2-3 Terabytes of photos are being uploaded
to the site every day
Serve over 15 billion photo images per day
Photo traffic now peaks at over 300,000
images served per second

Crescimento: exemplo 2
Inglaterra: Uma cmera de vigilncia
para cada 14 cidados
4 milhes de cmeras registrando
imagens diariamente
Voc tem ou pode encontrar outros
exemplos na Internet ?
O desafio do armazenamento:
Armazenar, proteger, otimizar e influenciar essa enorme quantidade crescente de

O desafio do armazenamento:
Armazenar, proteger, otimizar e influir* nessa enorme
quantidade crescente de dados

influir*, pense em como o armazenamento suporta a capacidade de gerar informaes sobre os dados

Tipos de dados

No Estruturados

Big Data: novos desafios

Big Data:
Novos desafios para armazenamento
de dados nos centros de informao

Dispositivos de armazenamento
Os dispositivos de armazenamento variam conforme o tipo de dados, a
velocidade com que esses so criados e usados, e a capacidade.

Devices, such as a media card in a cell phone or digital camera, DVDs,

CD-ROMs, and disk drives in personal computers are examples of storage
Businesses have several options available for storing data, including internal
hard disks, external disk arrays, and tapes.

Evoluo dos dispositivos de Armazenamento:
Do armazenamento interno no inteligente para
o armazenamento em rede inteligente.

Redundant Array of Independent Disks (RAID)
Direct-attached storage (DAS)
Storage area network (SAN) This is a dedicated, highperformance Fibre Channel (FC) network to facilitate block-level
communication between servers and storage.
Network-attached storage (NAS) This is dedicated storage for
file serving applications. Unlike a SAN, it connects to an existing
communication network (LAN) and provides file access to
heterogeneous clients.
Internet Protocol SAN (IP-SAN) One of the latest evolutions in
storage architecture, IP-SAN is a convergence of technologies
used in SAN and NAS.

Data center: 5 Componentes Chave

Application: A computer program that provides the logic for computing operations
Database management system (DBMS): Provides a structured way to store data in logically
organized tables that are interrelated
Host or compute: A computing platform (hardware, firmware, and software) that runs applications
and databases
Network: A data path that facilitates communication among various networked devices
Storage: A device that stores data persistently for subsequent use.
These core elements are typically viewed and managed as separate entities, but all the elements must
work together to address data-processing requirements.

Arquitetura tpica
Uma arquitetura tpica de processamento de um data center usando uma rede de armazenamento
(SAN) em um data center

Old to Modern approach

Caractersticas Chave de um DC

ILM Information Life Cycle Management

The information lifecycle is the change in the value of information over time.
When data is first created, it often has the highest value and is used frequently. As data ages, it is
accessed less frequently and is of less value to the organization. Understanding the information
lifecycle helps to deploy appropriate storage infrastructure, according to the changing value of

A proactive strategy that enables an IT

organization to effectively manage the
data throughout its lifecycle

ILM Storage Hierarchy

Uma ideia bsica que Custo x Velocidade x Capacidade de Armazenamento definem naturalmente
Tiers de Armazenamento. A informao, para ser armazenada de forma eficiente, precisa ter um
custo de armazenamento correspondente ao seu valor para a Organizao

ILM Information Life Cycle Management

ILM Process

ILM Benefcios
Improved utilization
Tiered storage platforms Low Costs
Simplified management
Processes, tools and automation

Mas h um custo $ e, na prtica,

nem sempre algo fcil de
implementar um ILM de modo

Simplified backup and recovery

A wider range of options to balance the need for business continuity
Maintaining compliance
Knowledge of what data needs to be protected for what length of time
Lower Total Cost of Ownership
By aligning the infrastructure and management costs with information value

Discusso e exerccios
Um crescimento exponencial dos dados e dos Data Centers pode significar um incremento igual de
profissionais e recursos ($) em TI nos prximos anos?
Considere os dados de um venda no caixa de um supermercado. O valor dessa informao o
mesmo ao longo do tempo (primeiros dias, meses e aps um ano por exemplo)?
Cite facilidades ou recursos que voc espera de uma ferramenta de automao de ILM.
Na sua opinio que tipo de dado, estruturado ou no estruturado, parece ter um crescimento maior
hoje e por que?
Que vantagens voc v no armazenamento em rede sobre o interno?

Leitura recomendada
Captulo 1
Information Storage and Management Storing, Managing, and Protecting Digital Information in
Classic, Virtualized, and Cloud Environments
2nd Edition Edited by Somasundaram Gnanasundaram, Alok Shrivastava

03 infra TI

Ambiente de Armazenamento

Principais componentes de Hosts e Armazenamento; Tipos de conectividade PCI, IDE/ATA, SCSI etc.;
Componentes de um drive de disco; Desempenho de drives de disco; Sistemas de arquivos; LVM, Logical
Volume Manager

Principais Componentes
Application: A computer program that provides the logic
for computing operations
Database management system (DBMS): Provides a
structured way to store data in logically organized tables
that are interrelated
Host or compute: A computing platform (hardware,
firmware, and software) that runs applications and
Network: A data path that facilitates communication
among various networked devices
Storage: A device that stores data persistently for
subsequent use.

do Ambiente de Armazenamento

Host, Conectividade e Storage

Hosts: Physical Components

Applications runs on hosts that can range from simple laptops to complex
server clusters. Physical components of host:

Disk device and internal memory

I/O device
Host to host communications, Network Interface Card (NIC)
Host to storage device, Host Bus Adapter (HBA)

Hosts: Logical Components

Hosts: Logical Components

Application data access can be classified as:
Block-level access:
Data stored and retrieved in blocks, specifying the LBA (logical block address)
File-level access:
Data stored and retrieved by specifying the name and path of files

Operating system
Resides between the applications and the hardware
Controls the environment
File System
File is a collection of related records or data stored as a unit
File system is hierarchical structure of files
Examples: FAT 32, NTFS, UNIX FS, EXT2/3 e HDFS

Hosts: Logical Components

LVM Logical Volume Manager
Responsible for creating and controlling host level logical storage
Physical view of storage is converted to a logical view by mapping
Logical data blocks are mapped to physical data blocks
Usually offered as part of the operating system or as third party
host software
Device Drivers
Enables operating system to recognize the device
Provides API to access and control devices
Hardware dependent and operating system specific

LVM Logical Volume Manager

LVM Logical Volume Manager
Responsible for creating and controlling host level logical storage
Physical view of storage is converted to a logical view by mapping

LVM Partitioning & Concatenation

LVM Logical Volume Manager

Files: Storage-User path

Protocols define a format for
communication between sending
and receiving devices

Tightly connected entities such as central processor to RAM, or storage buffers to controllers (example PCI)
Directly attached entities connected at moderate distances such as host to storage (example IDE/ATA)
Network connected entities such as networked hosts, NAS or SAN (example SCSI or FC)

PCI (Peripheral Component Interconnect) is used for local bus system
It is an interconnection between microprocessor and attached devices, Has Plug and Play
PCI is 32/64 bit, Throughput is 133 MB/sec
PCI Express is a enhanced version of PCI bus with higher throughput and clock speed

Integrated Device Electronics (IDE) / Advanced Technology Attachment (ATA)

Most popular interface used with modern hard disks
Good performance at low cost, Inexpensive storage interconnect
Used for internal connectivity

Serial Advanced Technology Attachment (SATA)

Serial version of the IDE /ATA specification
Hot-pluggable, Enhanced version of bus provides up to 6Gb/s (revision 3.0)

Parallel SCSI (Small computer system interface)

Most popular hard disk interface for servers
Higher cost than IDE/ATA
Supports multiple simultaneous data access
Used primarily in higher end environments
Data transfer speeds of 320 MB/s (SCSI Ultra) to 3 Gb/s (SAS 300)

Storage Medias
Magnetic Tape
Low cost solution for long term data storage
Sequential data access, Single application access at a time, Physical wear and tear and Storage/retrieval overheads
Optical Disks
Popularly used as distribution medium in small, single-user computing environments
Write once and read many (WORM): CD-ROM, DVD-ROM
Limited in capacity and speed
Disk Drive
Most popular storage medium with large storage capacity
Random read/write access
Ideal for performance intensive online application
Solid State Media or FLASH DRIVES
Sem partes mveis, como circuitos integrados e placas-me em computadores

Disk Drive Components

Disk Drive Estrutura fsica e Endereamento Lgico

Disk Drive Performance

Disk Service Time
Time taken by a disk to complete an I/O request

Seek Time
Rotational Latency
Appx. 5.5 ms for 5400-rpm drive, 2.0 ms for 15000-rpm drive

Data Transfer Rate

Qual maior ?

Disk Drive Performance Calc

I/O arrival rate, a

Average inter-arrival time, Ra = 1 / a
Utilization, U = Rs / Ra
Average response time, R = Rs / (1 U)
Average queue size = U2 / (1 U)
Time spent by a quest in queue = U x R

Disk Drive Performance Calc

Consider a disk I/O system in which an I/O request arrives at a rate of 100
I/Os per second. The service time, RS, is 8 ms.
I/O arrival rate, a
Average inter-arrival time, Ra = 1 / a
Utilization, U = Rs / Ra
Average response time, R = Rs / (1 U)
Average queue size = U2 / (1 U)
Time spent by a quest in queue = U x R

100 IOPS
10 ms
8 ms / 10 ms = 0,8 = 80%
8 ms / (1-0,8) = 40 ms
32 ms

Utilizao x Performance

Consider a disk I/O system in which an I/O request arrives at a rate of 100 I/Os per second. The service time, RS, is 4 ms.
Utilization of I/O controller (U= a Rs)
Total response time (R= Rs /(1-U) )
Calculate the same with service time is doubled

Flash Disk Drives

Discusso e exerccios
D exemplos de conexes PCI e SCSI.
Um banco de dados requer um disco de 2TB. Mas os disk drives disponveis so somente de 500GB.
Que componente lgico do sistema pode ser utilizado para soluo desse problema e como ?
Um disco com 500GB tem mesmo 500GB teis?
Um sistema emprega 10 discos de 500GB e vem apresentando problemas de performance no I/O
(alto tempo de resposta). Tendo disponvel apenas mais volumes de disco como voc resolveria esse
Altere o exemplo de clculo de performance de discos para 3000 IOPS. Qual o tempo de resposta e
tamanho de fila obtidos?

Leitura recomendada
Captulo 2
Information Storage and Management Storing, Managing, and Protecting Digital Information in
Classic, Virtualized, and Cloud Environments
2nd Edition Edited by Somasundaram Gnanasundaram, Alok Shrivastava

03 infra TI


MTBF; RAID Protection; Mirroring and Parity; RAID levels; write penalty

Por que RAID ?

Redundant Array Inexpensive Disks x Redudant

Array Independent Disks

Performance limitation of disk drive

An individual drive has a certain life expectancy
Measured in MTBF (Mean Time Between Failure)
The more the number of HDDs in a storage array, the larger the probability for disk failure.
For example: If the MTBF of a drive is 750,000 hours, and there are 100 drives in the array, then the MTBF
of the array becomes 750,000 / 100, or 7,500 hours

RAID was introduced to mitigate this problem

RAID provides:
Increase capacity
Higher availability
Increased performance

Disk array components



Hard Disks

RAID Array

Hardware (usually a specialized disk controller card)
Melhor escolha!
o Controls all drives attached to it
o Array(s) appear to host operating system as a regular disk drive
o Provided with administrative software
o Runs as part of the operating system
o Performance is dependent on CPU workload
o Does not support all RAID levels

Unix, Oracle e outros sistemas

RAID levels

Disk Stripes

Mirroring & Parity

RAID 0, RAID 1 and write penalty

Write Penalty vs. Full Protection...

Nested RAID 1+0 0+1

RAID 1+0 Striped Mirror
RAID 0+1 Mirrored Stripe

RAID 3, 4
Stripes data for high performance and uses parity for improved fault tolerance. One drive is dedicated for
parity information. If a drive files, data can be reconstructed using data in the parity drive.
For RAID 3, data read / write is done across the entire stripe.
Provide good bandwidth for large sequential data
access such as video streaming.
For RAID 4, data read/write can be independently on
single disk.

RAID 5, 6
RAID 5 is similar to RAID 4, except that the parity is distributed
across all disks instead of stored on a dedicated disk.
This overcomes the write bottleneck on the parity disk.
It is largely used by Database systems
RAID 6 is similar to RAID 5, except that it includes a second
parity element to allow survival in the event of two disk failures.
The probability for this to happen increases and the
number of drives in the array increases.

RAID Comparative


Storage Efficiency


Read Performance

Write Performance



Very good for both random and

sequential read

Very good


Better than a single disk


where n= number of
where n= number of


Slower than a single disk, as every
write must be committed to two disks

Good for random reads and very good Poor to fair for small random writes
for sequential reads
Good for large, sequential writes


Very good for random reads

Good for sequential reads

Fair for random write

Slower due to parity overhead
Fair to good for sequential writes

where n= number of

Moderate but more

than RAID 5

Very good for random reads

Good for sequential reads

Good for small, random writes

(has write penalty)



Very good


Compute penalty example

Consider an application that generates 5,200 IOPS, with 60 percent of them being reads.
The disk load in RAID 5 is calculated as follows:
RAID 5 disk load = 0.6 5,200 + 4 (0.4 5,200) [because the write penalty
for RAID 5 is 4]
= 3,120 + 4 2,080
= 3,120 + 8,320
= 11,440 IOPS
The disk load in RAID 1 is calculated as follows:
RAID 1 disk load = 0.6 5,200 + 2 (0.4 5,200) [because every write manifests
as two writes to the disks]
= 3,120 + 2 2,080
= 3,120 + 4,160
= 7,280 IOPS

Hot spare disks


Discusso e exerccios
Por que h uma penalidade de WRITE mas no de READ nos mecanismos de RAID?
Em geral as controladoras de disco local dos servidores implementam RAID 1 enquanto grandes
sistemas de armazenamento em geral optam por RAID 5 ou suas variantes. Por que?
Compare os mecanismos de espelhamento e paridade.
Altere o exemplo de clculo de write penalty na condio de que somente das operaes so de
gravao. H penalty para o RAID 0?
Que tipo de gargalo RAID 3 apresenta quando comparado com o RAID 5?

Leitura recomendada
Captulo 3
Information Storage and Management Storing, Managing, and Protecting Digital Information in
Classic, Virtualized, and Cloud Environments
2nd Edition Edited by Somasundaram Gnanasundaram, Alok Shrivastava

03 infra TI

Armazenamento Inteligente

Components of intelligent storage system; List benefits of intelligent storage system; I/O Optimization; FrontEnd; Back-End; Explain intelligent cache algorithms and protection

What is an Intelligent Storage System

Intelligent Storage Systems are RAID arrays that are:
Highly optimized for I/O processing
Have large amounts of cache for improving I/O performance
Have operating environments that provide:
Intelligence for managing cache
Array resource allocation
Connectivity for heterogeneous hosts
Advanced array based local and remote replication options

Benefits of Intelligent Storage

Increased capacity
Improved performance
Easier data management
Improved data availability and protection
Enhanced Business Continuity support
Improved security and access control

Components of Storage System

Intelligent Storage System

Front End

Back End


Intelligent Storage System: Front End

Intelligent Storage System

Front End

Back End





Front End Command Queuing



I/O Processing



Without Optimization (FIFO)


I/O Processing



With command queuing

Intelligent Storage System: Cache

Intelligent Storage System
Front End


Back End



Write Operation with Cache

Write-through Cache






Read with Cache: Hits and Misses

Data found in cache = Hit


No data found = Miss



Cache Management: Algorithms

Least Recently Used (LRU)
Discards least recently used data

New Data

Most Recently Used (MRU)

Discards most recently used data
Oldest Data

Cache Management: Watermarking

Manage peak I/O requests bursts through flushing/de-staging

Idle flushing, High Watermark flushing and Forced flushing
For maximum performance: Provide headroom in write cache for I/O bursts

100 %



High watermark


Cache Data Protection

Protecting cache data against failure:
Cache mirroring
Each write to the cache is held in two different memory locations on two
independent memory cards

Cache vaulting
Cache is exposed to the risk of uncommitted data loss due to power

Intelligent Storage System: Back End

Intelligent Storage System

Front End

Back End





Intelligent System: Physical Disks

Intelligent Storage System

Front End

Back End


What the Host Sees RAID Sets and LUNs

Host 1

Intelligent Storage System

Back End

Front End







Host 2

LUN Masking
Logical Unit Number

LUN masking is access control mechanism

Process of masking LUNs from unauthorized access
Implemented on storage arrays
Storage group logical entity that contains one or more
LUNs and one host

ISS: High-end Storage Systems

Active-Active Configuration

Following are high-end array capabilities:

Large storage capacity

Huge cache to service host I/Os
Fault tolerance architecture
Multiple front-end ports and support to interface protocols
High scalability
Ability to handle large amounts of concurrent I/Os

Designed for large enterprises




B e


A e

Also referred as Active-active arrays

I/Os are serviced through all the available path



Midrange Storage Systems


Midrange array have two controllers, each with cache,


RAID controllers and disks drive interfaces


Designed for small and medium enterprises

Less scalable as compared to high-end array



B e

Also referred as Active-passive arrays

Host can perform I/Os to LUNs only through active paths
Other paths remain passive till active path fails

A e




Discusso e exerccios
Cite ao menos 2 mecanismos encontrados nos sistemas inteligentes de armazenamento.
Explique os dois principais mecanismos de gerenciamento de cache encontrados nos sistemas
inteligentes de armazenamento.
Por que o Command Queue do Front End nos sistemas estudados tm sentido para o acesso a
discos de estado slido ?
Operaes de READ e WRITE no cache apresentam que diferenas ?
Como voc cr que podemos medir a eficincia do cache de um sistema inteligente de
Por que no encontramos essa inteligncia em sistemas de armazenamento interno local ?

Leitura recomendada
Captulo 4
Information Storage and Management Storing, Managing, and Protecting Digital Information in
Classic, Virtualized, and Cloud Environments
2nd Edition Edited by Somasundaram Gnanasundaram, Alok Shrivastava

Vous aimerez peut-être aussi