Vous êtes sur la page 1sur 101

UNIVERSIT DT FRANCO-ALLEMANDE

POUR JEUNES CHERCHEURS 2011

DEUTSCH-FRANZSISCHE SOMMERUNIVERSITT
FR NACHWUCHSWISSENSCHAFTLER 2011

CLOUD COMPUTING :
DFIS ET OPPORTUNITS

CLOUD COMPUTING :
HERAUSFORDERUNGEN UND MGLICHKEITEN

Windows Azure as a Platform as a Service (PaaS)


17.7. 22.7. 2011
Jared Jackson
Microsoft Research

Before we begin Some Results


Ice Cream Consumption

Favorite Ice Cream


Cookies and
Cream
3%
Cheescake

Walnut
3%

Cinamon
3%

3%
Vanilla
23%

Other
29%

Vanilla
33%

Stratiatella
10%
Tiramisu
3%
Cherry
3%

Chocolate
13%
Chocolate Chip
4%

Malaga
3%
Amarena
3%
Mango
3%

Strawberry
10%

Pistachio
7%
Banana Coffee
3%
3%

Neapolitan
4%

Chocolate
11%

Butter Pecan
7%

Cookies and Cream


4%

Cherry Coffee Strawberry


5%
2%
2%

Source: International Ice Cream Association (makeicecream.com)

Windows Azure Overview

Web Application Model Comparison


Ad Hoc Application Model

Machines Running
IIS / ASP.NET

Machines Running
Windows Services

Machines Running
SQL Server

Web Application Model Comparison


Ad Hoc Application Model Windows Azure Application Model

Machines Running
IIS / ASP.NET

Machines Running
Web Role Instances
Windows Services

Machines Running
SQL Server

Worker Role
Instances

Azure Storage
Blob / Queue / Table
SQL Azure

Key Components
Fabric Controller

Manages hardware and virtual machines for service

Compute

Web Roles

Worker Roles

Web application front end


Utility compute

VM Roles

Custom compute role;


You own and customize the VM

Storage

Blobs

Tables

Entity storage

Queues

Binary objects

Role coordination

SQL Azure

SQL in the cloud

Key Components
Fabric Controller

Think of it as an automated IT department


Cloud Layer on top of:
Windows Server 2008
A custom version of Hyper-V called the Windows Azure Hypervisor
Allows for automated management of virtual machines

Key Components
Fabric Controller

Think of it as an automated IT department


Cloud Layer on top of:
Windows Server 2008
A custom version of Hyper-V called the Windows Azure Hypervisor
Allows for automated management of virtual machines

Its job is to provision, deploy, monitor, and maintain applications in data centers
Applications have a shape and a configuration.

The configuration definition describes the shape of a service

Role types
Role VM sizes
External and internal endpoints
Local storage

The configuration settings configures a service


Instance count
Storage keys
Application-specific settings

Key Components
Fabric Controller

Manages nodes and edges in the fabric (the hardware)

Power-on automation devices


Routers / Switches
Hardware load balancers
Physical servers
Virtual servers

State transitions
Current State
Goal State
Does what is needed to reach and maintain the goal state

Its a perfect IT employee!


Never sleeps
Doesnt ever ask for raise
Always does what you tell it to do in configuration definition and settings

Creating a New Project

Windows Azure Compute

Key Components Compute


Web Roles
Web Front End
Cloud web server
Web pages
Web services

You can create the following types:


ASP.NET web roles
ASP.NET MVC 2 web roles
WCF service web roles
Worker roles
CGI-based web roles

Key Components Compute


Worker Roles

Utility compute
Windows Server 2008
Background processing
Each role can define an amount of local storage.
Protected space on the local drive, considered volatile storage.
May communicate with outside services
Azure Storage
SQL Azure
Other Web services
Can expose external and internal endpoints

Suggested Application Model


Using queues for reliable messaging

Scalable, Fault Tolerant Applications


Queues are the application glue
Decouple parts of application, easier to scale independently;
Resource allocation, different priority queues and backend servers
Mask faults in worker roles (reliable messaging).

Key Components Compute


VM Roles

Customized Role
You own the box
How it works:
Download Guest OS to Server 2008 Hyper-V
Customize the OS as you need to
Upload the differences VHD
Azure runs your VM role using
Base OS
Differences VHD

Application Hosting

Grokking the service model


Imagine white-boarding out your service architecture with boxes for nodes and arrows
describing how they communicate
The service model is the same diagram written down in a declarative format
You give the Fabric the service model and the binaries that go with each of those nodes
The Fabric can provision, deploy and manage that diagram for you
Find hardware home

Copy and launch your app binaries


Monitor your app and the hardware
In case of failure, take action. Perhaps even relocate your app

At all times, the diagram stays whole

Automated Service Management


Provide code + service model
Platform identifies and allocates resources, deploys the service, manages
service health
Configuration is handled by two files
ServiceDefinition.csdef
ServiceConfiguration.cscfg

Service Definition

Service Configuration

GUI
Double click on Role Name in Azure Project

Deploying to the cloud


We can deploy from the portal or from script
VS builds two files.
Encrypted package of your code
Your config file
You must create an Azure account, then a service, and then
you deploy your code.
Can take up to 20 minutes
(which is better than six months)

Service Management API


REST based API to manage your services
X509-certs for authentication
Lets you create, delete, change, upgrade, swap,.
Lots of community and MSFT-built tools around the API
- Easy to roll your own

The Secret Sauce The Fabric


The Fabric is the brain behind Windows Azure.
1. Process service model
1. Determine resource requirements
2. Create role images

2. Allocate resources
3. Prepare nodes
1. Place role images on nodes
2. Configure settings
3. Start roles

4. Configure load balancers


5. Maintain service health
1. If role fails, restart the role, based on policy
2. If node fails, migrate the role, based on policy

Storage

Durable Storage, At Massive Scale


Blob
- Massive files e.g. videos, logs

Drive
- Use standard file system APIs

Tables
- Non-relational, but with few scale limits
- Use SQL Azure for relational data
Queues
- Facilitate loosely-coupled, reliable, systems

Blob Features and Functions


Store Large Objects (up to 1TB in size)
Can be served through Windows
Azure CDN service
Standard REST Interface
PutBlob
Inserts a new blob, overwrites the existing blob

GetBlob
Get whole blob or a specific range

DeleteBlob
CopyBlob
SnapshotBlob
LeaseBlob

Two Types of Blobs Under the Hood


Block Blob
Targeted at streaming
workloads
Each blob consists of a
sequence of blocks
Each block is identified by a Block ID

Size limit 200GB per blob


Page Blob
Targeted at random read/write
workloads
Each blob consists of an array
of pages
Each page is identified by its offset
from the start of the blob

Size limit 1TB per blob

Windows Azure Drive


Provides a durable NTFS volume for Windows Azure
applications to use
Use existing NTFS APIs to access a durable drive
Durability and survival of data on application failover

Enables migrating existing NTFS applications to


the cloud
A Windows Azure Drive is a Page Blob
Example, mount Page Blob as X:\
http://<accountname>.blob.core.windows.net/<containername>/<blobname>

All writes to drive are made durable to the Page Blob


Drive made durable through standard Page Blob replication
Drive persists even when not mounted as a Page Blob

Windows Azure Tables


Provides Structured Storage
Massively Scalable Tables
Billions of entities (rows) and TBs of data
Can use thousands of servers as traffic grows

Highly Available & Durable


Data is replicated several times

Familiar and Easy to use API


WCF Data Services and OData
.NET classes and LINQ
REST with any platform or language

Windows Azure Queues


Queue are performance efficient,
highly available and provide reliable
message delivery
Simple, asynchronous work dispatch
Programming semantics ensure that a
message can be processed at least
once

Access is provided via REST

Storage Partitioning
Understanding partitioning is key to understanding performance
Every data object has a partition key

Partition key is unit of scale

System load balances

Server Busy

Different for each data type (blobs, entities, queues)

A partition can be served by a single server


System load balances partitions based on traffic pattern
Controls entity locality
Load balancing can take a few minutes to kick in
Can take a couple of seconds for partition to be available on a different
server
Use exponential backoff on Server Busy
Our system load balances to meet your traffic needs
Single partition limits have been reached

Partition Keys In Each Abstraction


Blobs Container name + Blob name

Every blob and its snapshots are in a single partition

Container Name

Blob Name

image

annarbor/bighouse.jpg

image

foxborough/gillette.jpg

video

annarbor/bighouse.jpg

Entities TableName + PartitionKey

Entities w/ same PartitionKey value served from same partition

PartitionKey (CustomerId)

RowKey
(RowKind)

Name

CreditCardNumber

Customer-John Smith

John Smith

xxxx-xxxx-xxxx-xxxx

Order 1

Customer-Bill Johnson

Order 3

Messages Queue Name

OrderTotal

$35.12
Bill Johnson

xxxx-xxxx-xxxx-xxxx
$10.00

All messages for a single queue belong to the same partition

Queue

Message

jobs

Message1

jobs

Message2

workflow

Message1

Scalability Targets
Storage Account
Capacity Up to 100 TBs
Transactions Up to a few thousand requests per second
Bandwidth Up to a few hundred megabytes per second

Single Blob Partition


Throughput up to 60 MB/s

Single Queue/Table Partition


Up to 500 transactions per second

To go above these numbers, partition between multiple storage accounts and partitions
When limit is hit, app will see 503 server busy: applications should implement exponential backoff

Partitions and Partition Ranges


PartitionKey
PartitionKey
(Category)
(Category)

RowKey
RowKey
(Title)
(Title)

Timestamp
Timestamp

ReleaseDate
ReleaseDate

Action
Action

Fast & Furious

2009

Action
Action

The
The Bourne
Bourne Ultimatum
Ultimatum

2007
2007

Animation

Open Season 2
Open Season 2
The Ant Bully
The Ant Bully

2009
2009
2006
2006

PartitionKey

(Category)
Comedy

Timestamp

ReleaseDate

Comedy

RowKey

(Title)
Office Space
Space
Office

1999
1999

SciFi
SciFi

X-Men
X-Men Origins:
Origins: Wolverine
Wolverine

2009
2009

Defiance
Defiance

2008
2008

Animation
Animation

Animation

War
War

Key Selection: Things to Consider


Scalability

Distribute load as much as possible


Hot partitions can be load balanced
PartitionKey is critical for scalability

Query Efficiency & Speed

Avoid frequent large scans


Parallelize queries
Point queries are most efficient

Entity group transactions

Transactions across a single partition


Transaction semantics & Reduce round trips

See http://www.microsoftpdc.com/2009/SVC09
and http://azurescope.cloudapp.net
for more information

Expect Continuation Tokens Seriously!

Maximum
rowsinina aresponse
response
Maximumof
of1000
1000 rows
At the
endofofpartition
partition range
At the
end
rangeboundary
boundary

Maximum of 5 seconds to execute the query

Tables Recap
Select PartitionKey and RowKey
that help scale
Avoid Append only patterns
Always Handle
continuation tokens
OR predicates are not optimized

Efficient for frequently used queries


Supports batch transactions
Distributes load
Distribute by using a hash etc. as prefix

Expect continuation tokens for range queries

Execute the queries that form the OR predicates as separate queries

Implement back-off
strategy for retries

Server busy
Load balance partitions to meet traffic needs
Load on single partition has exceeded the limits

WCF Data Services

Use a new context for each logical operation


AddObject/AttachTo can throw exception if entity is already being tracked
Point query throws an exception if resource does not exist. Use
IgnoreResourceNotFoundException

Queues
Their Unique Role in Building Reliable, Scalable Applications
Want roles that work closely together, but are not bound together.
Tight coupling leads to brittleness
This can aid in scaling and performance

A queue can hold an unlimited number of messages


Messages must be serializable as XML
Limited to 8KB in size
Commonly use the work ticket pattern

Why not simply use a table?

Queue Terminology

Message Lifecycle
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Tue,
09 Dec 2008 21:04:30 GMT
PutMessage
Server: Nephos Queue Service Version 1.0 Microsoft-HTTPAPI/2.0

Msg 1

RemoveMessage
GetMessage
(Timeout)

Worker Role

<?xml version="1.0" encoding="utf-8"?>


<QueueMessagesList>
POST http://myaccount.queue.core.windows.net/myqueue/messages
Msg 2
Msg 2
1
<QueueMessage>
Web Role
DELETE <MessageId>5974b586-0df3-4e2d-ad0c-18e3892bfca2</MessageId>
MsgGMT
3 </InsertionTime>
<InsertionTime>Mon, 22 Sep 2008 23:29:20
http://myaccount.queue.core.windows.net/myqueue/messages/messageid?popreceipt=YzQ4Yzg1MDI
<ExpirationTime>Mon, 29 Sep 2008 23:29:20 GMT</ExpirationTime>
GM0MDFiZDAwYzEw
<PopReceipt>YzQ4Yzg1MDIGM0MDFiZDAwYzEw
</PopReceipt>
Msg
4
Worker Role
<TimeNextVisible>Tue, 23 Sep 2008 05:29:20GMT</TimeNextVisible>
<MessageText>PHRlc3Q+dG...dGVzdD4=</MessageText>
</QueueMessage>
</QueueMessagesList>
Msg 2

Queue

Truncated Exponential Back Off Polling


Consider a backoff
polling approach
Each empty poll
increases interval by 2x
A successful sets the
interval back to 1.

Removing Poison Messages


Producers

Consumers

C1

P2
4
0

33
0

1
01

01

C2

P1
2. GetMessage(Q, 30 s) msg 2

44

1. GetMessage(Q, 30 s) msg 1

Removing Poison Messages


Producers

Consumers

P2

4
0

33 0

21

2
1

P1
2. GetMessage(Q, 30 s) msg 2
3. C2 consumed msg 2
4. DeleteMessage(Q, msg 2)
7. GetMessage(Q, 30 s) msg 1

45

1. GetMessage(Q, 30 s) msg 1
5. C1 crashed

6. msg1 visible 30 s after Dequeue

1
1

C1

C2

Removing Poison Messages


Producers

Consumers

C1

P2
4
0

33 0

1
32

1
P1
2. Dequeue(Q, 30 sec) msg 2
3. C2 consumed msg 2
4. Delete(Q, msg 2)
7. Dequeue(Q, 30 sec) msg 1
8. C2 crashed

46

1. Dequeue(Q, 30 sec) msg 1


5. C1 crashed
10. C1 restarted
11. Dequeue(Q, 30 sec) msg 1
12. DequeueCount > 2
13. Delete (Q, msg1)

C2
6. msg1 visible 30s after Dequeue
9. msg1 visible 30s after Dequeue

Queues Recap
Make message
processing idempotent

Do not rely on order

Use Dequeue count to remove


poison messages
Use blob to store
message data with
reference in message
Use message count
to scale

No
No need
need to
to deal
deal with
with failures
failures

Invisible
Invisible messages
messages result
result in
in out
out of
of order
order

Enforce
Enforce threshold
threshold on
on messages
messages dequeue
dequeue count
count
Messages > 8KB
Batch messages
Garbage collect orphaned blobs
Dynamically
Dynamically increase/reduce
increase/reduce workers
workers

Windows Azure Storage Takeaways


Blobs
Drives
Tables
Queues

http://blogs.msdn.com/windowsazurestorage/
http://azurescope.cloudapp.net

A Quick Exercise

Then lets look at some code and some tools


49

Code AccountInformation.cs
public class AccountInformation
{
private static string storageKey = tHiSiSnOtMyKeY";

private static string accountName = "jjstore";


private static StorageCredentialsAccountAndKey credentials;
internal static StorageCredentialsAccountAndKey Credentials
{
get
{
if (credentials == null)
credentials = new StorageCredentialsAccountAndKey(accountName, storageKey);

return credentials;
}
}
}
}
50

Code BlobHelper.cs
public class BlobHelper
{
private static string defaultContainerName = "school";
private CloudBlobClient client = null;
private CloudBlobContainer container = null;
private void InitContainer()
{
if (client == null)
client = new CloudStorageAccount(AccountInformation.Credentials, false).CreateCloudBlobClient();
container = client.GetContainerReference(defaultContainerName);

container.CreateIfNotExist();
BlobContainerPermissions permissions = container.GetPermissions();
permissions.PublicAccess = BlobContainerPublicAccessType.Container;
container.SetPermissions(permissions);
}
51

Code BlobHelper.cs
public void WriteFileToBlob(string filePath)
{
if (client == null || container == null)
InitContainer();
FileInfo file = new FileInfo(filePath);

CloudBlob blob = container.GetBlobReference(file.Name);


blob.Properties.ContentType = GetContentType(file.Extension);
blob.UploadFile(file.FullName);
// Or if you want to write a string replace the last line with:
// blob.UploadText(someString);
// And make sure you set the content type to the appropriate MIME type (e.g. text/plain)
}

52

Code BlobHelper.cs
public string GetBlobText(string blobName)
{
if (client == null || container == null)
InitContainer();
CloudBlob blob = container.GetBlobReference(blobName);
try
{
return blob.DownloadText();
}
catch (Exception)
{
// The blob probably does not exist or there is no connection available
return null;
}
}

53

Application Code - Blobs


private void SaveToCloudButton_Click(object sender, RoutedEventArgs e)
{
StringBuilder buff = new StringBuilder();

buff.AppendLine("LastName,FirstName,Email,Birthday,NativeLanguage,FavoriteIceCream,YearsInPhD,Graduated");
foreach (AttendeeEntity attendee in attendees)
{
buff.AppendLine(attendee.ToCsvString());
}
blobHelper.WriteStringToBlob("SummerSchoolAttendees.txt", buff.ToString());
}

The blob is now available at:


http://<AccountName>.blob.core.windows.net/<ContainerName>/<BlobName>
Or in this case:
http://jjstore.blob.core.windows.net/school/SummerSchoolAttendees.txt
54

Code - TableEntities
using Microsoft.WindowsAzure.StorageClient;
public class AttendeeEntity : TableServiceEntity
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string Email { get; set; }
public DateTime Birthday { get; set; }
public string FavoriteIceCream { get; set; }
public int YearsInPhD { get; set; }
public bool Graduated { get; set; }

55

Code - TableEntities
public void UpdateFrom(AttendeeEntity other)
{
FirstName = other.FirstName;
LastName = other.LastName;
Email = other.Email;
Birthday = other.Birthday;
FavoriteIceCream = other.FavoriteIceCream;

YearsInPhD = other.YearsInPhD;
Graduated = other.Graduated;
UpdateKeys();
}
public void UpdateKeys()
{
PartitionKey = "SummerSchool";
RowKey = Email;
}
56

Code TableHelper.cs
public class TableHelper {
private CloudTableClient client = null;
private TableServiceContext context = null;
private Dictionary<string,AttendeeEntity> allAttendees = null;
private string tableName = "Attendees";
private CloudTableClient Client {
get {
if (client == null)
client = new CloudStorageAccount(AccountInformation.Credentials, false).CreateCloudTableClient();
return client;
}
}
private TableServiceContext Context {
get {
if (context == null)
context = Client.GetDataServiceContext();
return context;
} } }
57

Code TableHelper.cs
private void ReadAllAttendees()
{
allAttendees = new Dictionary<string, AttendeeEntity>();
CloudTableQuery<AttendeeEntity> query =
Context.CreateQuery<AttendeeEntity>(tableName).AsTableServiceQuery();
try
{
foreach (AttendeeEntity attendee in query)
{
allAttendees[attendee.Email] = attendee;
}
}
catch (Exception)
{
// No entries in table - or other exception
}
}
58

Code TableHelper.cs
public void DeleteAttendee(string email)
{
if (allAttendees == null)
ReadAllAttendees();
if (!allAttendees.ContainsKey(email))
return;
AttendeeEntity attendee = allAttendees[email];
// Delete from the cloud table
Context.DeleteObject(attendee);
Context.SaveChanges();

// Delete from the memory cache


allAttendees.Remove(email);
}

59

Code TableHelper.cs
public AttendeeEntity GetAttendee(string email)
{
if (allAttendees == null)
ReadAllAttendees();
if (allAttendees.ContainsKey(email))
return allAttendees[email];
return null;

Remember that this only works for tables (or queries on tables) that easily fit in memory
This is one of many design patterns for working with tables

60

Pseudo Code TableHelper.cs


public void UpdateAttendees(List<AttendeeEntity> updatedAttendees) {
foreach (AttendeeEntity attendee in updatedAttendees) {
UpdateAttendee(attendee, false);
}
Context.SaveChanges(SaveChangesOptions.Batch);
}
public void UpdateAttendee(AttendeeEntity attendee) {
UpdateAttendee(attendee, true);
}
private void UpdateAttendee(AttendeeEntity attendee, bool saveChanges) {
if (allAttendees.ContainsKey(attendee.Email)) {
AttendeeEntity existingAttendee = allAttendees[attendee.Email];
existingAttendee.UpdateFrom(attendee);
Context.UpdateObject(existingAttendee);
} else {
Context.AddObject(tableName, attendee);
}
if (saveChanges) Context.SaveChanges();
}
61

Application Code Cloud Tables


private void SaveButton_Click(object sender, RoutedEventArgs e)
{
// Write to table
tableHelper.UpdateAttendees(attendees);
}

Thats it! Now your tables are accessible using REST service calls or any cloud storage tool.

62

Tools Fiddler2

63

Best Practices

Picking the Right VM Size


Having the correct VM size can make a big difference in costs
Fundamental choice larger, fewer VMs vs. many smaller instances

If you scale better than linear across cores, larger VMs could save you money
Pretty rare to see linear scaling across 8 cores.
More instances may provide better uptime and reliability (more failures
needed to take your service down)
Only real right answer experiment with multiple sizes and instance counts
in order to measure and find what is ideal for you

Using Your VM to the Maximum


Remember:
1 role instance == 1 VM running Windows.
1 role instance != one specific task for your code
Youre paying for the entire VM so why not use it?

Common mistake split up code into multiple roles, each not using up CPU.
Balance between using up CPU vs. having free capacity in times of need.
Multiple ways to use your CPU to the fullest

Exploiting Concurrency
Spin up additional processes, each with a specific task or as a unit of
concurrency.
May not be ideal if number of active processes exceeds number of cores
Use multithreading aggressively
In networking code, correct usage of NT IO Completion Ports will let the
kernel schedule the precise number of threads
In .NET 4, use the Task Parallel Library
Data parallelism

Task parallelism

Finding Good Code Neighbors


Typically code falls into one or more of these categories:
Memory
Intensive

CPU
Intensive

Network IO
Intensive

Storage IO
Intensive

Find code that is intensive with different resources to live together


Example: distributed network caches are typically network- and memoryintensive; they may be a good neighbor for storage IO-intensive code

Scaling Appropriately
Monitor your application and make sure youre scaled appropriately (not over-scaled).
Spinning VMs up and down automatically is good at large scale.
Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to
keep running.
Being too aggressive in spinning down VMs can result in poor user experience.
Trade-off between risk of failure/poor user experience due to not having excess
capacity and the costs of having idling VMs.

Performance

Cost

Storage Costs
Understand an applications storage profile and how storage
billing works
Make service choices based on your app profile
E.g. SQL Azure has a flat fee while Windows Azure Tables charges per transaction.
Service choice can make a big cost difference based on your app profile

Caching and compressing. They help a lot with storage costs.

Saving Bandwidth Costs


Bandwidth costs are a huge part of any popular web apps
billing profile
Saving bandwidth costs often lead to savings in
other places
Sending fewer things over the wire often means getting fewer
things from storage
Sending fewer things means your VM has time to do other
tasks

All of these tips have the side


benefit of improving your web
apps performance and user
experience

Compressing Content
1. Gzip all output content

All modern browsers can decompress on the fly.


Compared to Compress, Gzip has much better
compression and freedom from patented algorithms
2.Tradeoff compute costs for storage size
3.Minimize image sizes
Use Portable Network Graphics (PNGs)
Crush your PNGs
Strip needless metadata
Make all PNGs palette PNGs

Uncompressed
Content

Gzip
Minify JavaScript
Minify CCS
Minify Images

Compressed
Content

Best Practices Summary


Doing less is the key to saving costs

Measure everything

Know your application profile in and out

Research Examples in the Cloud

on another set of slides

Map Reduce on Azure


Elastic MapReduce on Amazon Web Services has traditionally
been the only option for Map Reduce jobs in the web
Hadoop implementation
Hadoop has a long history and has been improved for stability
Originally Designed for Cluster Systems

Microsoft Research this week is announcing a project code


named Daytona for Map Reduce jobs on Azure
Designed from the start to use cloud primitives
Built-in fault tolerance
REST based interface for writing your own clients

Project Daytona - Map Reduce on Azure

http://research.microsoft.com/en-us/projects/azure/daytona.aspx
76

Questions and Discussion

Thank you for hosting me at the Summer School


77

BLAST (Basic Local Alignment Search Tool)

The most important software in bioinformatics


Identify similarity between bio-sequences

Computationally intensive

Large number of pairwise alignment operations


A BLAST running can take 700 ~ 1000 CPU hours
Sequence databases growing exponentially
GenBank doubled in size in about 15 months.

It is easy to parallelize BLAST


Segment the input
Segment processing (querying) is pleasingly parallel

Segment the database (e.g., mpiBLAST)


Needs special result reduction processing

Large volume data


A normal Blast database can be as large as 10GB
100 nodes means the peak storage bandwidth could reach to 1TB

The output of BLAST is usually 10-100x larger than the input

Parallel BLAST engine on Azure


Query-segmentation data-parallel pattern
split the input sequences
query partitions in parallel
merge results together when done

Follows the general suggested application model


Web Role + Queue + Worker

With three special considerations


Batch job management
Task parallelism on an elastic Cloud
Wei Lu, Jared Jackson, and Roger Barga, AzureBlast: A Case Study of Developing Science Applications on the Cloud, in Proceedings of the 1st Workshop
on Scientific Cloud Computing (Science Cloud 2010), Association for Computing Machinery, Inc., 21 June 2010

A simple Split/Join pattern


Leverage multi-core of one instance
argument a of NCBI-BLAST
1,2,4,8 for small, middle, large, and extra large instance size

BLAST task

Task granularity
Large partition load imbalance
Small partition unnecessary overheads

BLAST task

Splitting task

NCBI-BLAST overhead
Data transferring overhead.

Best Practice: test runs to profiling and set size to mitigate the overhead

BLAST task

BLAST task

Value of visibilityTimeout for each BLAST task,


Essentially an estimate of the task run time.
too small repeated computation;
too large unnecessary long period of waiting time in case of the instance failure.
Best Practice:
Estimate the value based on the number of pair-bases in the partition and test-runs
Watch out for the 2-hour maximum limitation

Merging Task

Task size vs. Performance


Benefit of the warm cache effect
100 sequences per partition is the best
choice
Instance size vs. Performance
Super-linear speedup with larger size
worker instances
Primarily due to the memory capability.
Task Size/Instance Size vs. Cost
Extra-large instance generated the best
and the most economical throughput
Fully utilize the resource

BLAST task
BLAST task
Splitting task

BLAST task

Merging Task

Worker

BLAST task

Web Role
Web
Portal

Job Management Role


Scaling Engine

Worker

Job registration
Job Scheduler

Web
Service

Global
dispatch
queue

Job Registry

NCBI databases

Azure Table
Database
updating Role

Blast databases,
temporary data, etc.)

Azure Blob

Worker

ASP.NET program hosted by a web role


instance
Submit jobs
Track jobs status and logs

Authentication/Authorization based on
Live ID
The accepted job is stored into the job
registry table
Fault tolerance, avoid in-memory states

Job Portal
Web
Portal

Scaling Engine
Job registration
Job Scheduler

Web
Service

Job Registry

R. palustris as a platform for H2 production


Eric Shadt, SAGE

Sam Phattarasukol Harwood Lab, UW

Blasted ~5,000 proteins (700K sequences)


Against all NCBI non-redundant proteins: completed in 30 min
Against ~5,000 proteins from another strain: completed in less than 30 sec

AzureBLAST significantly saved computing time

Discovering Homologs
Discover the interrelationships of known protein sequences

All against All query


The database is also the input query
The protein database is large (4.2 GB size)
Totally 9,865,668 sequences to be queried

Theoretically, 100 billion sequence comparisons!

Performance estimation
Based on the sampling-running on one extra-large Azure instance
Would require 3,216,731 minutes (6.1 years) on one desktop

One of biggest BLAST jobs as far as we know


This scale of experiments usually are infeasible to most scientists

Allocated a total of ~4000 instances

8 deployments of AzureBLAST

Each deployment has its own co-located storage service

Divide 10 million sequences into multiple segments

475 extra-large VMs (8 cores per VM), four datacenters, US (2), Western and North Europe

Each will be submitted to one deployment as one job for execution


Each segment consists of smaller partitions

When load imbalances, redistribute the load manually


6
2
5
0

62

62
6
2

6
2

6
2
5
0

Total size of the output result is ~230GB


The number of total hits is 1,764,579,487

Started at March 25th, the last task completed on April 8th (10 days compute)
But based our estimates, real working instance time should be 6~8 day
Look into log data to analyze what took place

6
2
5
0

62

62
6
2

6
2

6
2
5
0

A normal log record should be


3/31/2010 6:14
3/31/2010 6:25
3/31/2010 6:25
3/31/2010 6:44
3/31/2010 6:44
3/31/2010 7:02

RD00155D3611B0
RD00155D3611B0
RD00155D3611B0
RD00155D3611B0
RD00155D3611B0
RD00155D3611B0

Executing the task 251523...


Execution of task 251523 is done, it took 10.9mins
Executing the task 251553...
Execution of task 251553 is done, it took 19.3mins
Executing the task 251600...
Execution of task 251600 is done, it took 17.27 mins

Otherwise, something is wrong (e.g., task failed to complete)


3/31/2010 8:22

RD00155D3611B0

Executing the task 251774...

3/31/2010 9:50

RD00155D3611B0

Executing the task 251895...

3/31/2010 11:12

RD00155D3611B0

Execution of task 251895 is done, it took 82 mins

North Europe Data Center, totally 34,256 tasks processed

All 62 compute nodes lost tasks


and then came back in a group.
This is an Update domain

~ 6 nodes in one group

~30 mins

West Europe Datacenter; 30,976 tasks are completed, and job was killed

35 Nodes experience blob


writing failure at same
time

A reasonable guess: the


Fault Domain is working

MODISAzure :
Computing Evapotranspiration (ET) in The Cloud

You never miss the water till the well has run dry
Irish Proverb

Evapotranspiration (ET) is the release of water to the atmosphere by evaporation from


open water bodies and transpiration, or evaporation through plant membranes, by plants.
+
=
( + 1 + )
Penman-Monteith (1964)
ET = Water volume evapotranspired (m3 s-1 m-2)
= Rate of change of saturation specific humidity with air temperature.(Pa K-1)
v = Latent heat of vaporization (J/g)
Rn = Net radiation (W m-2)
cp = Specific heat capacity of air (J kg-1 K-1)
a = dry air density (kg m-3)
q = vapor pressure deficit (Pa)
ga = Conductivity of air (inverse of ra) (m s-1)
gs = Conductivity of plant stoma, air (inverse of rs) (m s-1)
= Psychrometric constant ( 66 Pa K-1)

Lots of inputs: big data reduction


Some of the inputs are not so simple

Estimating resistance/conductivity across a


catchment can be tricky

FLUXNET curated
sensor dataset
(30GB, 960 files)

Climate
classification
~1MB (1file)

Vegetative
clumping
~5MB (1file)

NCEP/NCAR
~100MB
(4K files)

NASA MODIS
imagery source
archives
5 TB (600K files)
20 US year = 1 global year

FLUXNET curated
field dataset
2 KB (1 file)

Data collection (map) stage


Downloads requested input tiles
from NASA ftp sites
Includes geospatial lookup for
non-sinusoidal tiles that will
contribute to a reprojected
sinusoidal tile

Reprojection (map) stage


Converts source tile(s) to
intermediate result sinusoidal tiles
Simple nearest neighbor or spline
algorithms

Derivation reduction stage


First stage visible to scientist
Computes ET in our initial use

Source Imagery Download Sites

Request
Queue

...
Download
Queue

Source
Metadata

Data Collection Stage

Scientists

AzureMODIS
Service Web Role Portal

Scientific
Results
Download

Reprojection
Queue

Science
results

Reprojection Stage

Derivation Reduction Stage

Analysis Reduction Stage

Analysis reduction stage


Optional second stage visible to
scientist
Enables production of science
analysis artifacts such as maps,
tables, virtual sensors

Reduction #1
Queue

Reduction #2
Queue

http://research.microsoft.com/en-us/projects/azure/azuremodis.aspx

<PipelineStage>Job Queue
Persist

<PipelineStage>
Request

MODISAzure Service
(Web Role)
Service Monitor
(Worker Role)

Parse & Persist

<PipelineStage>JobStatus

<PipelineStage>TaskStatus

Dispatch
<PipelineStage>Task Queue

ModisAzure Service is the Web


Role front door
Receives all user requests
Queues request to appropriate
Download, Reprojection, or
Reduction Job Queue

Service Monitor is a dedicated


Worker Role
Parses all job requests into tasks
recoverable units of work
Execution status of all jobs and
tasks persisted in Tables

Service Monitor
(Worker Role)

Parse & Persist

<PipelineStage>TaskStatus

Dispatch
<PipelineStage>Task Queue

GenericWorker
(Worker Role)
<Input>Data Storage

All work actually done by a Worker Role


Dequeues tasks created by the
Service Monitor
Retries failed tasks 3 times
Maintains all task status

Sandboxes science or other


executable
Marshalls all storage from/to Azure
blob storage to/from local Azure
Worker instance files

Reprojection Request

Job Queue

Service Monitor
(Worker Role)

Persist

Parse & Persist


Dispatch

Task Queue
Reprojection Data
Storage

ReprojectionJobStatus

ReprojectionTaskStatus

Query this table to get the


list of satellite scan times
that cover a target tile

GenericWorker
(Worker Role)

Each entity specifies a


single reprojection task (i.e.
a single tile)

Points to
ScanTimeList

Each entity specifies a


single reprojection job
request

SwathGranuleMeta

Swath Source
Data Storage

Query this table to get


geo-metadata (e.g.
boundaries) for each swath
tile

Computational costs
driven by data scale and
need to run reduction
multiple times
Storage costs driven by
data scale and 6 month
project duration
Small with respect to
the people costs even
at graduate student
rates !

Source Imagery Download Sites


Request
Queue

...
Download
Queue

Source
Metadata

Data Collection Stage


400-500 GB
60K files
10 MB/sec
$50 upload
$450 storage 11 hours
<10 workers

Scientific
Results
Download

Reprojection
Queue

Reprojection Stage
400 GB
45K files
$420 cpu
3500 hours
$60 download 20-100
workers

Derivation Reduction Stage


5-7 GB
5.5K files
$216 cpu
1800 hours
$1 download
20-100
$6 storage
workers
Reduction #1
Queue

Total: $1420

Scientists

AzureMODIS
Service Web Role Portal

Analysis Reduction Stage


<10 GB
~1K files
$216 cpu
1800 hours
$2 download 20-100
$9 storage
workers
Reduction #2
Queue

Clouds are the largest scale computer centers ever constructed and have the
potential to be important to both large and small scale science problems.
Equally import they can increase participation in research, providing needed
resources to users/communities without ready access.
Clouds suitable for loosely coupled data parallel applications, and can
support many interesting programming patterns, but tightly coupled lowlatency applications do not perform optimally on clouds today.
Provide valuable fault tolerance and scalability abstractions
Clouds as amplifier for familiar client tools and on premise compute.
Clouds services to support research provide considerable leverage for both
individual researchers and entire communities of researchers.

Vous aimerez peut-être aussi