Vous êtes sur la page 1sur 72

Amazon EC2 Cloud Computing and Application Design

Jorge Noa CTO, HyperStratus Jorge.Noa@HyperStratus.com


v8

Copyright 2009 HyperStratus

About HyperStratus
Silicon Valley-based cloud computing consultancy Founded by executives with deep experience in corporate IT, enterprise software, and global consultancy We assist clients in establishing cloud computing strategies, cloud application architectures, system selection and implementations We also provide cloud computing training and workshops

Topics Covered

Introduction to Cloud Architecture Basic Amazon AWS Concepts and Considerations AWS Cloud Application Design and Best Practices

Introduction to Cloud Architecture

What is the Cloud?

UC Berkeley RAD Lab Definition


Huge Resources No Commitment Pay by the Drink
The illusion of infinite computing resources available on demand, thereby eliminating the need for Cloud Computing users to plan far ahead for provisioning

The elimination of an up-front commitment by Cloud users, thereby allowing companies to start small and increase hardware resources only when there is an increase in their needs The ability to pay for use of computing resources on a short-term basis as needed (e.g., processors by the hour and storage by the day) and release them as needed

Key Cloud Benefits

Huge Resources No Commitment Pay by the Drink

IT agility as systems can be sized to meet demand -as load scales, system resources are easily obtained to ensure SLAs can be met

No longer face the tradeoff between overprovisioning (waste of capital) and underprovisioning (waste of users)

Move IT payments from CAPEX to OPEX. Pay only for actual resources consumed. Tie IT cost to business benefit received

Cloud Service Categories


Infrastructure as a Service (IaaS)
Amazon EC2 GoGrid Eucalyptus

Platform as a Service (PaaS)


Google AppEngine (Python, Java) Windows Azure (.Net)

Software as a Service (Saas)


Salesforce.com Gmail

How the Cloud is Delivered


More Structured Less Control

Public Cloud -- SaaS

Public Cloud -- PaaS

Private Cloud -- IaaS

Less Structured

Public Cloud -- IaaS

More Control

IaaS Cloud Providers

Public

CohesiveFT (VPN Cubed) Amazon VPC (IPsec VPN)

Amazon (AWS) GoGrid Rackspace

Virtual Private Cloud Internal Private Cloud IBM HP Cisco/VMware Microsoft 3Tera Eucalyptus

Public Cloud External Private Cloud

Private

Terremark HP (EDS) AT&T IBM

Isolated

Shared

Cloud Application Example

Grows from 1MM to 100+ MM insurance claims/day in one week Traditional solution: $750K new hardware + $30K/month maintenance/hosting Cloud solution: $600/month Amazon Web Services

Cloud Taxonomy

Source: Christofer Hoff, Cloud Security Alliance Security Guidance for Critical Areas of Focus in Cloud Computing, Page 22

Foundation of cloud is virtualization Upper cloud services are incremental to


lower cloud services

Lower level services are key for higher level


services

IaaS/Paas in Detail
Components Providers

Adapted: Christofer Hoff, The Frogs Who Desired a King


Adapted: Christofer Hoff, The Frogs Who Desired a King

Amazon AWS EC2 is an IaaS environment with RESTful Web Services API to allocate & manage resources

IaaS/PaaS in Detail
Components Providers

Adapted: Christofer Hoff, The Frogs Who Desired a King

AWS SQS, SimpleDB, and CloudFront are PaaS Middleware Google AppEngine and Microsoft Azure are PaaS AppServers

Basic Amazon AWS Concepts and Considerations

Amazon Web Services


Elastic Compute Cloud EC2 (IaaS) Simple Storage Service S3 (IaaS) Elastic Block Storage EBS (IaaS) SimpleDB (SDB) (PaaS) Simple Queue Service SQS (PaaS) CloudFront (S3 based Content Delivery Network PaaS) Consistent AWS Web Services API

IaaS Taxonomy :
AWS Components
VM Images - Gold-Master Amazon Machine Images (AMI) VM Compute - EC2 Instance Types VM Storage - Default Local Disks, EBS, S3 Network Regions, Availability Zones, Virtual NICs IPAM/DNS (Internet Protocol Address Management) Dynamic internal & external IP Addresses and fixed Elastic IP Addresses (Domain Name System) Automatic AWS DNS name assignment

IaaS Taxonomy :
AWS Components (cont)

Security Network Firewall Security Groups S3 file ACLs IAM/Auth (Identity Access Mgmt) AWS Credentials & X.509 Certificates VMM (Virtual Machine Mgmt) Self-Discovery, AutoConfiguration LB & Transport (Load Balancing) AWS Auto-Scaling API Web API, Command-Line Tools Mgmt - AWS Mgmt Console, Firefox Elasticfox plug-in

PaaS Taxonomy :
AWS Components

Messaging/Queuing Simple Queue Service (SQS) Database SimpleDB (SDB)

IaaS Network Component :


EC2 Regions & Zones
Amazon EC2 locations are composed of Regions which contain Availability Zones. Regions consist of one or more Availability Zones, are geographically dispersed in separate geographic areas or countries
Currently only two Regions: us-east-1, eu-west-1

Availability Zones are distinct datacenter locations that are engineered to be insulated from failures in other Zones and provide inexpensive, low latency network connectivity to other Availability Zones in the same Region
E.g. us-east-1a, us-east-1b,

IaaS Network Component :


EC2 Regions & Zones (cont)

Traffic between Availability Zones in a single region is on AWS-controlled redundant infrastructure All traffic between Regions is across a multiple Tier-1 Public Internet infrastructure

IaaS Compute Component:


AWS EC2

EC2 is based upon Xen Hypervisor (with


significant constraints) 1 EC2-CU = CPU capacity of 1.0-1.2 GHz 2007 Opteron or 2007 Xeon Compute capacity is defined at granular levels I.e Number of CPU Cores and Compute Units per core (1 core @ 1CU up to 8 cores @2.5 CU) Virtual Memory ranges are 1.7GB, 7.5GB and 15GB depending on instance type Default quota of 20 VM instances per account

IaaS Compute Component :


EC2 Compute Unit

Several AWS benchmarks and tests manage the consistency and predictability of the performance of an EC2 Compute Unit Over Time, there may be several different types of physical commodity hardware underlying EC2 instances, but EC2-CU performance should remain constant

EC2 Standard Linux Instance Types


Type CPU 1 EC2-CU (1 virtual core with 1 EC2 Compute Unit)
Memory

Storage (unformatted) 170GB instance storage (160GB plus 10GB root partition, 1 spindle) 910GB instance storage (2 x 450 GB plus 10GB root partition, 3 spindles).

Platform

I/O
Moderate

AWS Name

Cost/ hour $0.085


$747 a year or $490.30 a year Reserved

Small

1.7 GB (917MB swap)

32-bit

m1.small

Large

4 EC2-CU (2 virtual cores with 2 EC2 Compute Units each)

7.5 GB (No swap)

64-bit

High
m1.large

$0.34
$2978 a year or $1961 a year Reserved

Extra Large

8 EC2-CU (4 virtual cores with 2 EC2 Compute Units each)

15 GB (No swap)

1810GB instance storage (4 x 450GB plus 10GB root partition, 5 spindles).

64-bit

High
m1.xlarge

$0.68
$5957 a year or $3922 a year Reserved

EC2 High-CPU Linux Instance Types


Type CPU
Memory

Storage (unformatted) 370 GB instance storage (360 GB plus 10 GB root partition, 1 spindle)

Platform

I/O
Moderate

AWS Name

Cost/ hour $0.17


$1489 a year or $981 a year Reserved

5 EC2-CU (2 virtual cores with High2.5 EC2 CPU Medium Compute Units each) 20 EC2-CU (8 virtual cores with 2.5 EC2 Compute Units each)

1.7 GB (917MB swap)

32-bit

c1.medium

HighCPU Extra Large

7.5 GB (No swap)

1810 GB 64-bit instance storage (4 x 450 GB plus 10 GB root partition, 5 spindles)

High
c1.xlarge

$0.68
$5957 a year or $3922 a year Reserved

IaaS Storage Component :

EC2, EBS, S3
EC2 Instance Default Local Storage ephemeral virtual disks that are integral part of EC2 VM instance
Range from 170GB to 1.8TB total space, 1 to 5 disks

Elastic Block Storage EC2 Additional persistent disk volumes that can be attached and mounted on a running VM.
1TB max per volume, default quota of 20 volumes

S3 File storage Reliable web URL accessible file-based storage.


5GB max per file

IaaS Storage Component :


EBS An EBS volume is created in a user specified AWS Availability Zone. AWS equivalent of a local SAN RAID Disk and can only be attached to one running EC2 instance at a time in the same Zone Appears to running OS VM as standard disk drive (e.g. /dev/sdg) Must be partitioned and/or formatted with file system before being mounted Higher reliability, lower latency and higher throughput than than Instance Default Storage Supports live snapshots to S3

IaaS Storage Component :


S3

S3 File storage Reliable web URL accessible file storage (e.g. <bucket>.s3.amazonaws.com/file_1.mpg). Buckets are created in user assigned Regions (e.g. us-east-1, eu-west-1) Unlimited number of index folders and files (i.e. objects) per bucket, 5GB max per file Files in a bucket are replicated to dispersed Zones in the buckets Region

IaaS Storage Component :


EC2 Ephemeral Storage Notes

All Default Local instance storage devices (I.e. nonEBS EC2 volumes) are ephemeral and all data on them is lost when the instance is terminated (or crashes and cannot be rebooted). Use S3, EBS, or SDB for permanent data. Analogous to the file system lifecycle of a Linux Live-CD that uses RAM drives However, default instance storage data is retained on reboot. This is a major EC2 constraint that must be taken into consideration in an applications design.

IaaS Storage Component :


Default Ephemeral Storage Devices
Location /dev/sda1 /dev/sda2 /dev/sda3 /dev/sdb /dev/sdc /dev/sdd /dev/sde Description
Formatted and mounted as 10GB root (/) on all instance types. Formatted and mounted as /mnt on m1.small (150GB) and c1.medium (350GB) instances Formatted and mounted as /swap on m1.small and c1.medium instances (Size 939MB) Formatted and mounted as /mnt on m1.large, m1.xlarge, and c1.xlarge instances (430GB) Not formatted or mounted on m1.large, m1.xlarge, and c1.xlarge instances (450GB raw) Not formatted or mounted instances (450GB raw) Not formatted or mounted instances (450GB raw) on m1.xlarge and c1.xlarge on m1.xlarge and c1.xlarge

IaaS Image Component:


EC2 and AMIs

EC2 saves a bootable VM root image as an Amazon Machine Image (AMI). An AMI is digitally signed and encrypted by the owner using private x.509 key. AWS has a copy of the corresponding public X.509 certificate for decrypting an AMI at EC2 Instance launch time An AMI is equivalent to a Gold Master image of the configured VM for an EC2 instance Multiple EC2 instances can be launched from the same AMI

IaaS Image Component :


S3 and AMIs EC2 AMIs are stored in S3 as a bundle of segmented 10MB files and EC2 VM instances are instantiated (launched) from their S3 AMI. Users can create their own AMIs from scratch (P2V); use pre-built public AMIs; or use a prebuilt AMI as a starting point and then add custom software assets to finalize the desired AMI. Updating an EC2 AMI requires a full bundling process and results in an additional AMI, different than the original one.

IaaS Image Component :


EBS and AMIs A running EC2 Instance can be imaged as an EBS-Backed AMI and saved as an EBS Snapshot. Instances launched from these EBS-Backed AMI snapshots launch must faster and use persistent default storage. Persistent 15GB root file system. EBS-Backed instances can be Stopped and Started and the contents of the local storage will persist.

Caution - If running instance is Terminated, EBS volume will be deleted.

EC2 Dynamic Data : Typical S3 Usage Pattern

EC2 Dynamic Data : Typical EBS Usage Pattern

IaaS Network Component :


EC2 Virtual NIC

Each EC2 Instance has only one Virtual NIC that


is assigned a dynamic EC2 MAC Address and internal private IP Address AWS VM Prevents network cross-talk among users No visibility beyond individual machine NIC traffic -- even among correlated machines in the same application configuration Communicating within multi-tier VM configurations typically involves dynamic DNS server registration

IaaS IPAM/DNS Component :


EC2 IP Addresses & DNS
No customer control of initial VM IP Address or DNS name assignments EC2 routers map two IP addresses to the EC2 Instance dynamic EC2 Private Address (RFC-1918, e.g. 10.x.x.x) dynamic EC2 Public Address using Network Address Translation (NAT) (Note: public address range belongs to AWS) Auto-generated DNS name has IP Address as a component of the name. Fixed Elastic-IP Addresses pre-allocated for an AWS account and later assigned to a running EC2 instance.

IaaS Security Component :


EC2 Security Groups & ACLs EC2 Security Groups function as network firewall configurations.
A Security Group is a named collection of incoming network traffic rules for an EC2 account.

Access to each S3 file is controlled by its own Access Control List (ACL).
ACL allows READ, WRITE, and FULL CONTROL (includes access to ACL) privileges on: Everyone Authenticated Users (only valid AWS users) A list of individual AWS users or groups

PaaS Messaging/Queuing Component : AWS SQS Highly Reliable Message Queuing Service with built-in redundancy within user assigned Regions Messages accessible from anywhere via Web API Up to 8 KB of Unicode data per message Messages can be retained in queues for up to 4 days Messages can be sent and read simultaneously but FIFO not guaranteed Queues can be securely shared with other AWS accounts and Anonymously. Queue sharing can also be restricted by IP address and time-of-day.

PaaS Database Component : AWS SimpleDB Beta

Enhanced MyISAM-like database service Simple web services interface to create and store multiple data sets and query your data Data is automatically indexed Data stored in Region and automatically replicated to dispersed Zones Requests originating from an application running in same Amazon Region will have near-LAN latency.

PaaS Database Component : AWS SimpleDB Beta (cont) Similar to MyISAM with enhanced features
No SQL grammar support No table JOIN Simple WHERE criteria

100 domains (tables) quota per account, max 10GB per domain, max 256 attributes (columns) per row, max 1KB data per attribute (cell) Typically used to store App logs, EC2 Instance configurations, Application state, Instance status, analytics, indexes to S3 data Scale-out is as simple as creating new domains, rather than building out new servers.

AWS Cloud Application Design and Best Practices

Cloud App Design Attributes


Abstract Resources

Focus on your needs, not on hardware specs. As your needs change, so should your resources. Ask for what you need, exactly when you need it. Get rid of it when you dont need. Design should allow for resources to scale up or down depending on usage needs. No contracts or long-term commitments. Pay only for what you use but design for the possibility of enhanced resource usage. Each machine instance must be capable of dynamically identifying its configuration and relationship to other resources in the system.

On-Demand Provisioning

Scalability

No Up-Front Costs

Dynamism

AWS Cloud Application Design: 10 Best Practices


1. 2. 3. 4. 5. 6. 7. 8. 9. 10. Build cloud apps, not apps in the cloud Virtualize the application stack Design for failure and nothing fails Design for scalability Loose coupling lets you maximize plug&play Design for dynamism Build Security into every component Leverage native cloud storage options Leverage best cloud Management Tools Don't fear cloud constraints

Best Practices:
Dont Just Build apps in the cloud
Business tier

Web Tier Load Balancer

Backup
Source: GigaSpace, Practical Guide for Developing Enterprise Application on the Cloud

Backup
Back-up Back-up

Data Tier Messaging

Dont simply port traditional Apps to the Cloud Traditional Application Stacks are architected in functional silos Each silo has its own machines, network, management, and support

Build Cloud Apps:


Virtualize the Application Stack
Web Processing Units

Business Processing Units

Load Balancer Users

DB

Source: GigaSpace, Practical Guide for Developing Enterprise Application on the Cloud

Re-factor to use standardized VM containers. Each instance should use self-discovery, be self configurable, and network independent Use cloud standardized Messaging & DB when possible Leverage inherent EBS replication and snapshots for DBMS

Build Cloud Apps:


Compensate for Ephemeral Storage
EC2 instance default storage can only be used for transient data (e.g. intermediate or temp data files). Dont use it for archival data logs such as login logs or error dumps.
Consider using SDB to store persistent archival data records that can be associated with a key (e.g. timestamp)

If OK to recover only from most recent backup, consider restoring data from S3 at boot-up and backing-up current data to S3 at shutdown. If not OK, use EBS attached volumes for all persistent file data. DBMS should always use EBS volumes

Build Cloud Apps:


Compensate for Ephemeral Storage (cont) Consider using soft-links (Linux) to map portions of the ephemeral Default Storage application file tree to persistent EBS volumes
This can be used for archival data logs such as login logs or error dumps (.i.e /var/logs/ files can be soft linked to EBS volume).

If only small chunks of persistent storage is needed for each Instance, consider using EBS volumes exported on EC2 NFS servers.

Build Cloud Apps:


Compensate for Dynamic IP Address

Attach ElasticIP for Internet-facing EC2 instances (e.g. the HAProxy load-balancer instance) Use dynamic DNS registration of EC2 instance internal IP address or use SDB EC2 instances should only use the internal IP address for communicating with each other (free!).

Best Practices:
Design for Failure "Everything fails, all the time, Werner Vogels, CTO Amazon.com Avoid single points of failure Assume everything fails, and design backwards Design for failure and your App wont fail

Design for Failure:


What Can Fail in AWS? The EC2 Instance may crash Portions of Zone may not be accessible (i.e. internal network problem within Zone)
EC2 Instance in a Zone may not be launch-able EBS volumes in a Zone may not be accessible

AWS Services in a Region may not be accessible (very low probability)


S3 buckets in Region may not be accessible SDB domains (tables) in a Region may not be accessible SQS Queues in a Region may not be accessible

Design for Failure:


Use Failure Tolerant Features
Use Elastic IP addresses (or their DNS names) for consistent and re-mappable routes Use multiple EC2 Availability Zones Use EBS for persistent file systems and snapshots.
Snapshots can be used to restore EBS volumes on other Zones Use Rsync for real-time synchronization of RBS volumes across Zones

Create multiple DBMS slaves across Availability Zones Use real-time monitoring (Amazon CloudWatch or RightScale)

Best Practices:
Design for Scalability A scalable architecture is critical to take advantage of a scalable infrastructure No central point of data storage contention
Shared Nothing Sharding Distributed Caching

Loose coupling of processing requestors and responders

Design for Scalability : Use AWS Elastic Features Use Load Balancing on multiple layers: either your own (e.g. HAProxy EC2 instance) or AWS Elastic Load Balancing Use Cloud monitoring systems: either your own (e.g. CollectD) or AWS CloudWatch Use Auto-scaling technology (Free with CloudWatch)

Design for Scalability

Source: RightScale

Best Practices:
Build Loosely Coupled Systems Use Independent components Design everything as a Black Box with well defined inputs and outputs Use subsystem de-coupling for Hybrid models Use Load-balanced clusters of Black Boxes to maximize plug&play

Loose Coupling:
Use Message Queues
Tight Coupling Loose Coupling using Queues
Q 1
Controller Controller A Controller A A Controller A Controller B Controller C

Q 2
Controller Controller B Controller B B

Q 3
Controller Controller C Controller C C

Use MQueue system such as Amazon SQS or Gearman to pass along requests Each message queue consumer can be a cluster of EC2 instances

Best Practices:
Design for Dynamism
Dont assume health or fixed location of components Use designs that are resilient to reboot and relaunch Bootstrap your instances based on self-discovery (E.g. EC2 Metadata API)
Store configurations in SimpleDB to bootstrap instances

Enable dynamic configuration


Store application, subsystem, and EC2 instance state in SimpleDB so instances can know health of system

Best Practices:
Security in every component
Use de-perimiterized security model Create distinct network Security Groups for each Amazon EC2 instance cluster Use group-based network rules for controlling access between components Restrict external access to specific IP ranges Encrypt data at-rest in Amazon S3 Encrypt data in-transit (SSL) Consider encrypted EBS file systems for sensitive data

Best Practices:
Leverage Storage Solutions Amazon S3: large static objects Amazon CloudFront: content distribution Amazon SimpleDB: simple data indexing/querying Amazon EC2 local disc drive : transient data Amazon EBS: RDBMS persistent storage + S3 Snapshots

Best Practices:
Leverage Best AWS Mgt Tools
Management of any but the simplest cloud application configurations is very cumbersome without advanced tools. RightScale is a script-based instance provisioning, monitoring, & auto-scaling system
Supports collaborative sharing & reuse of scripts

Kaavo Infrastructure & Middleware On Demand (IMOD) is an Application Centric Management System
manages a multitier cloud application system as though it were a monolithic application

Best Practices:
Don't fear cloud constraints Think out of the box and leverage cloud features to solve EC2 constraints Components expect Static IP addresses?
Boot script for software reconfiguration from SimpleDB or use Dynamic DNS

Local data center DBSM has better IOPS?


Try multiple read-only / sharding / DB clustering

AWS Management Tools

AWS Management Tools:


Basic Tools Amazon native AWS tools only leverage basic AWS API capability
AWS Management Console

Firefox plugins are slightly more advanced


Elasticfox EC2 Instance, EBS, EIP management S3 Organizer S3 file upload/download (similar to ftp plugin)

CloudBerry Explorer Windows S3 file upload/download application, slightly better than S3 Organizer

AWS Management Tools:


Ideal Advanced Tools
Attaching EBS volumes, EIPs, and other resources should be scripted and managed by Cloud Deployment & Mgmt System (CDMS) CDMS should incorporate standards-based Performance Monitoring services Should incorporate standards-based Event Notification services Should incorporate Auto-scaling configuration services as remediation of Performance/Load Events CDMS should incorporate Administrator Collaboration allowing sharing and partitioning of admin responsibilities

AWS Management Tools:


Ideal Advanced Tools (cont)
Allow for automated provisioning of EC2 instances Should allow sharing of scripts and launch/terminate of instances based on group roles or at least read/write/execute rights. Should allow for re-use generalized scripts Should allow for auto-scaling based on dynamic load evaluation functions CDMS should support escalating event notification to groups of users.
Should have interfaces to other EMS (e.g. Nagios)

AWS Management Tools:


RightScale
Script-based instance provisioning, monitoring, & auto-scaling system Manages complex deployments involving multiple instance clusters Re-use of version-controlled scripts in different deployments Full automation of auto-scaling, remediation, notification and automatic configuration Cloud application developer and administrator collaboration framework

RightScale Provisioning Pattern

Adapted: 2009 CummunityOne West Conference: Practical Cloud Computing Patterns

RightScale proxy server uses modified Push Pattern


Boot Finished event triggers automated provisioning commands sequence

RightScale Lifecycle Mgmt Pattern RightScale uses an Injection Pattern to push individual command scripts into a running EC2 instance or an entire deployed cluster of instances Boot Scripts are automatically run at Instance Launch after OS boot_finished event Operational Scripts are run during automated Event Handling or manual operations Decommissioning Scripts are automatically run prior to Instance Termination

Current RightScale Cloud Service Monitoring Pattern

Source: 2009 CummunityOne West Conference: Practical Cloud Computing Patterns

Based on collectd framework

AWS Management Tools:


Scalr Similar to RightScale features: instance provisioning, monitoring, & auto-scaling system Less reliant on on-the-fly provisioning. Suite of Scalr AMIs available for common application configurations. Manages complex deployments involving multiple instance clusters Significantly less expensive OpenSource code available for local use.

AWS Management Tools:


Kaavo IMOD
Application Centric Management System Proxy server manages complex multitier cloud application system as if it were a monolithic application via IMOD System Definitions Quickstart Kaavo provides out of the box System Definitions for deploying popular multi-tier HA infrastructure: Ruby on Rails, LAMP, Tomcat, Jboss IMOD workflow engine monitors application run-time state events and responds dynamically with user customized Event Workflows (e.g. MySQL scale-up/scale-down)

Q&A : More Resources

www.hyperstratus.com
White Paper: Migrating Applications to the Cloud: An Amazon Web Services Case Study
Cloud Computing Workshops (via Unitek Education) Jorge.Noa@hyperstratus.com

Vous aimerez peut-être aussi