Académique Documents
Professionnel Documents
Culture Documents
DEUTSCH-FRANZSISCHE SOMMERUNIVERSITT
FR NACHWUCHSWISSENSCHAFTLER 2011
CLOUD COMPUTING :
DFIS ET OPPORTUNITS
CLOUD COMPUTING :
HERAUSFORDERUNGEN UND MGLICHKEITEN
Walnut
3%
Cinamon
3%
3%
Vanilla
23%
Other
29%
Vanilla
33%
Stratiatella
10%
Tiramisu
3%
Cherry
3%
Chocolate
13%
Chocolate Chip
4%
Malaga
3%
Amarena
3%
Mango
3%
Strawberry
10%
Pistachio
7%
Banana Coffee
3%
3%
Neapolitan
4%
Chocolate
11%
Butter Pecan
7%
Machines Running
IIS / ASP.NET
Machines Running
Windows Services
Machines Running
SQL Server
Machines Running
IIS / ASP.NET
Machines Running
Web Role Instances
Windows Services
Machines Running
SQL Server
Worker Role
Instances
Azure Storage
Blob / Queue / Table
SQL Azure
Key Components
Fabric Controller
Compute
Web Roles
Worker Roles
VM Roles
Storage
Blobs
Tables
Entity storage
Queues
Binary objects
Role coordination
SQL Azure
Key Components
Fabric Controller
Key Components
Fabric Controller
Its job is to provision, deploy, monitor, and maintain applications in data centers
Applications have a shape and a configuration.
Role types
Role VM sizes
External and internal endpoints
Local storage
Key Components
Fabric Controller
State transitions
Current State
Goal State
Does what is needed to reach and maintain the goal state
Utility compute
Windows Server 2008
Background processing
Each role can define an amount of local storage.
Protected space on the local drive, considered volatile storage.
May communicate with outside services
Azure Storage
SQL Azure
Other Web services
Can expose external and internal endpoints
Customized Role
You own the box
How it works:
Download Guest OS to Server 2008 Hyper-V
Customize the OS as you need to
Upload the differences VHD
Azure runs your VM role using
Base OS
Differences VHD
Application Hosting
Service Definition
Service Configuration
GUI
Double click on Role Name in Azure Project
2. Allocate resources
3. Prepare nodes
1. Place role images on nodes
2. Configure settings
3. Start roles
Storage
Drive
- Use standard file system APIs
Tables
- Non-relational, but with few scale limits
- Use SQL Azure for relational data
Queues
- Facilitate loosely-coupled, reliable, systems
GetBlob
Get whole blob or a specific range
DeleteBlob
CopyBlob
SnapshotBlob
LeaseBlob
Storage Partitioning
Understanding partitioning is key to understanding performance
Every data object has a partition key
Server Busy
Container Name
Blob Name
image
annarbor/bighouse.jpg
image
foxborough/gillette.jpg
video
annarbor/bighouse.jpg
PartitionKey (CustomerId)
RowKey
(RowKind)
Name
CreditCardNumber
Customer-John Smith
John Smith
xxxx-xxxx-xxxx-xxxx
Order 1
Customer-Bill Johnson
Order 3
OrderTotal
$35.12
Bill Johnson
xxxx-xxxx-xxxx-xxxx
$10.00
Queue
Message
jobs
Message1
jobs
Message2
workflow
Message1
Scalability Targets
Storage Account
Capacity Up to 100 TBs
Transactions Up to a few thousand requests per second
Bandwidth Up to a few hundred megabytes per second
To go above these numbers, partition between multiple storage accounts and partitions
When limit is hit, app will see 503 server busy: applications should implement exponential backoff
RowKey
RowKey
(Title)
(Title)
Timestamp
Timestamp
ReleaseDate
ReleaseDate
Action
Action
2009
Action
Action
The
The Bourne
Bourne Ultimatum
Ultimatum
2007
2007
Animation
Open Season 2
Open Season 2
The Ant Bully
The Ant Bully
2009
2009
2006
2006
PartitionKey
(Category)
Comedy
Timestamp
ReleaseDate
Comedy
RowKey
(Title)
Office Space
Space
Office
1999
1999
SciFi
SciFi
X-Men
X-Men Origins:
Origins: Wolverine
Wolverine
2009
2009
Defiance
Defiance
2008
2008
Animation
Animation
Animation
War
War
See http://www.microsoftpdc.com/2009/SVC09
and http://azurescope.cloudapp.net
for more information
Maximum
rowsinina aresponse
response
Maximumof
of1000
1000 rows
At the
endofofpartition
partition range
At the
end
rangeboundary
boundary
Tables Recap
Select PartitionKey and RowKey
that help scale
Avoid Append only patterns
Always Handle
continuation tokens
OR predicates are not optimized
Implement back-off
strategy for retries
Server busy
Load balance partitions to meet traffic needs
Load on single partition has exceeded the limits
Queues
Their Unique Role in Building Reliable, Scalable Applications
Want roles that work closely together, but are not bound together.
Tight coupling leads to brittleness
This can aid in scaling and performance
Queue Terminology
Message Lifecycle
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: application/xml
Date: Tue,
09 Dec 2008 21:04:30 GMT
PutMessage
Server: Nephos Queue Service Version 1.0 Microsoft-HTTPAPI/2.0
Msg 1
RemoveMessage
GetMessage
(Timeout)
Worker Role
Queue
Consumers
C1
P2
4
0
33
0
1
01
01
C2
P1
2. GetMessage(Q, 30 s) msg 2
44
1. GetMessage(Q, 30 s) msg 1
Consumers
P2
4
0
33 0
21
2
1
P1
2. GetMessage(Q, 30 s) msg 2
3. C2 consumed msg 2
4. DeleteMessage(Q, msg 2)
7. GetMessage(Q, 30 s) msg 1
45
1. GetMessage(Q, 30 s) msg 1
5. C1 crashed
1
1
C1
C2
Consumers
C1
P2
4
0
33 0
1
32
1
P1
2. Dequeue(Q, 30 sec) msg 2
3. C2 consumed msg 2
4. Delete(Q, msg 2)
7. Dequeue(Q, 30 sec) msg 1
8. C2 crashed
46
C2
6. msg1 visible 30s after Dequeue
9. msg1 visible 30s after Dequeue
Queues Recap
Make message
processing idempotent
No
No need
need to
to deal
deal with
with failures
failures
Invisible
Invisible messages
messages result
result in
in out
out of
of order
order
Enforce
Enforce threshold
threshold on
on messages
messages dequeue
dequeue count
count
Messages > 8KB
Batch messages
Garbage collect orphaned blobs
Dynamically
Dynamically increase/reduce
increase/reduce workers
workers
http://blogs.msdn.com/windowsazurestorage/
http://azurescope.cloudapp.net
A Quick Exercise
Code AccountInformation.cs
public class AccountInformation
{
private static string storageKey = tHiSiSnOtMyKeY";
return credentials;
}
}
}
}
50
Code BlobHelper.cs
public class BlobHelper
{
private static string defaultContainerName = "school";
private CloudBlobClient client = null;
private CloudBlobContainer container = null;
private void InitContainer()
{
if (client == null)
client = new CloudStorageAccount(AccountInformation.Credentials, false).CreateCloudBlobClient();
container = client.GetContainerReference(defaultContainerName);
container.CreateIfNotExist();
BlobContainerPermissions permissions = container.GetPermissions();
permissions.PublicAccess = BlobContainerPublicAccessType.Container;
container.SetPermissions(permissions);
}
51
Code BlobHelper.cs
public void WriteFileToBlob(string filePath)
{
if (client == null || container == null)
InitContainer();
FileInfo file = new FileInfo(filePath);
52
Code BlobHelper.cs
public string GetBlobText(string blobName)
{
if (client == null || container == null)
InitContainer();
CloudBlob blob = container.GetBlobReference(blobName);
try
{
return blob.DownloadText();
}
catch (Exception)
{
// The blob probably does not exist or there is no connection available
return null;
}
}
53
buff.AppendLine("LastName,FirstName,Email,Birthday,NativeLanguage,FavoriteIceCream,YearsInPhD,Graduated");
foreach (AttendeeEntity attendee in attendees)
{
buff.AppendLine(attendee.ToCsvString());
}
blobHelper.WriteStringToBlob("SummerSchoolAttendees.txt", buff.ToString());
}
Code - TableEntities
using Microsoft.WindowsAzure.StorageClient;
public class AttendeeEntity : TableServiceEntity
{
public string FirstName { get; set; }
public string LastName { get; set; }
public string Email { get; set; }
public DateTime Birthday { get; set; }
public string FavoriteIceCream { get; set; }
public int YearsInPhD { get; set; }
public bool Graduated { get; set; }
55
Code - TableEntities
public void UpdateFrom(AttendeeEntity other)
{
FirstName = other.FirstName;
LastName = other.LastName;
Email = other.Email;
Birthday = other.Birthday;
FavoriteIceCream = other.FavoriteIceCream;
YearsInPhD = other.YearsInPhD;
Graduated = other.Graduated;
UpdateKeys();
}
public void UpdateKeys()
{
PartitionKey = "SummerSchool";
RowKey = Email;
}
56
Code TableHelper.cs
public class TableHelper {
private CloudTableClient client = null;
private TableServiceContext context = null;
private Dictionary<string,AttendeeEntity> allAttendees = null;
private string tableName = "Attendees";
private CloudTableClient Client {
get {
if (client == null)
client = new CloudStorageAccount(AccountInformation.Credentials, false).CreateCloudTableClient();
return client;
}
}
private TableServiceContext Context {
get {
if (context == null)
context = Client.GetDataServiceContext();
return context;
} } }
57
Code TableHelper.cs
private void ReadAllAttendees()
{
allAttendees = new Dictionary<string, AttendeeEntity>();
CloudTableQuery<AttendeeEntity> query =
Context.CreateQuery<AttendeeEntity>(tableName).AsTableServiceQuery();
try
{
foreach (AttendeeEntity attendee in query)
{
allAttendees[attendee.Email] = attendee;
}
}
catch (Exception)
{
// No entries in table - or other exception
}
}
58
Code TableHelper.cs
public void DeleteAttendee(string email)
{
if (allAttendees == null)
ReadAllAttendees();
if (!allAttendees.ContainsKey(email))
return;
AttendeeEntity attendee = allAttendees[email];
// Delete from the cloud table
Context.DeleteObject(attendee);
Context.SaveChanges();
59
Code TableHelper.cs
public AttendeeEntity GetAttendee(string email)
{
if (allAttendees == null)
ReadAllAttendees();
if (allAttendees.ContainsKey(email))
return allAttendees[email];
return null;
Remember that this only works for tables (or queries on tables) that easily fit in memory
This is one of many design patterns for working with tables
60
Thats it! Now your tables are accessible using REST service calls or any cloud storage tool.
62
Tools Fiddler2
63
Best Practices
If you scale better than linear across cores, larger VMs could save you money
Pretty rare to see linear scaling across 8 cores.
More instances may provide better uptime and reliability (more failures
needed to take your service down)
Only real right answer experiment with multiple sizes and instance counts
in order to measure and find what is ideal for you
Common mistake split up code into multiple roles, each not using up CPU.
Balance between using up CPU vs. having free capacity in times of need.
Multiple ways to use your CPU to the fullest
Exploiting Concurrency
Spin up additional processes, each with a specific task or as a unit of
concurrency.
May not be ideal if number of active processes exceeds number of cores
Use multithreading aggressively
In networking code, correct usage of NT IO Completion Ports will let the
kernel schedule the precise number of threads
In .NET 4, use the Task Parallel Library
Data parallelism
Task parallelism
CPU
Intensive
Network IO
Intensive
Storage IO
Intensive
Scaling Appropriately
Monitor your application and make sure youre scaled appropriately (not over-scaled).
Spinning VMs up and down automatically is good at large scale.
Remember that VMs take a few minutes to come up and cost ~$3 a day (give or take) to
keep running.
Being too aggressive in spinning down VMs can result in poor user experience.
Trade-off between risk of failure/poor user experience due to not having excess
capacity and the costs of having idling VMs.
Performance
Cost
Storage Costs
Understand an applications storage profile and how storage
billing works
Make service choices based on your app profile
E.g. SQL Azure has a flat fee while Windows Azure Tables charges per transaction.
Service choice can make a big cost difference based on your app profile
Compressing Content
1. Gzip all output content
Uncompressed
Content
Gzip
Minify JavaScript
Minify CCS
Minify Images
Compressed
Content
Measure everything
http://research.microsoft.com/en-us/projects/azure/daytona.aspx
76
Computationally intensive
BLAST task
Task granularity
Large partition load imbalance
Small partition unnecessary overheads
BLAST task
Splitting task
NCBI-BLAST overhead
Data transferring overhead.
Best Practice: test runs to profiling and set size to mitigate the overhead
BLAST task
BLAST task
Merging Task
BLAST task
BLAST task
Splitting task
BLAST task
Merging Task
Worker
BLAST task
Web Role
Web
Portal
Worker
Job registration
Job Scheduler
Web
Service
Global
dispatch
queue
Job Registry
NCBI databases
Azure Table
Database
updating Role
Blast databases,
temporary data, etc.)
Azure Blob
Worker
Authentication/Authorization based on
Live ID
The accepted job is stored into the job
registry table
Fault tolerance, avoid in-memory states
Job Portal
Web
Portal
Scaling Engine
Job registration
Job Scheduler
Web
Service
Job Registry
Discovering Homologs
Discover the interrelationships of known protein sequences
Performance estimation
Based on the sampling-running on one extra-large Azure instance
Would require 3,216,731 minutes (6.1 years) on one desktop
8 deployments of AzureBLAST
475 extra-large VMs (8 cores per VM), four datacenters, US (2), Western and North Europe
62
62
6
2
6
2
6
2
5
0
Started at March 25th, the last task completed on April 8th (10 days compute)
But based our estimates, real working instance time should be 6~8 day
Look into log data to analyze what took place
6
2
5
0
62
62
6
2
6
2
6
2
5
0
RD00155D3611B0
RD00155D3611B0
RD00155D3611B0
RD00155D3611B0
RD00155D3611B0
RD00155D3611B0
RD00155D3611B0
3/31/2010 9:50
RD00155D3611B0
3/31/2010 11:12
RD00155D3611B0
~30 mins
West Europe Datacenter; 30,976 tasks are completed, and job was killed
MODISAzure :
Computing Evapotranspiration (ET) in The Cloud
You never miss the water till the well has run dry
Irish Proverb
FLUXNET curated
sensor dataset
(30GB, 960 files)
Climate
classification
~1MB (1file)
Vegetative
clumping
~5MB (1file)
NCEP/NCAR
~100MB
(4K files)
NASA MODIS
imagery source
archives
5 TB (600K files)
20 US year = 1 global year
FLUXNET curated
field dataset
2 KB (1 file)
Request
Queue
...
Download
Queue
Source
Metadata
Scientists
AzureMODIS
Service Web Role Portal
Scientific
Results
Download
Reprojection
Queue
Science
results
Reprojection Stage
Reduction #1
Queue
Reduction #2
Queue
http://research.microsoft.com/en-us/projects/azure/azuremodis.aspx
<PipelineStage>Job Queue
Persist
<PipelineStage>
Request
MODISAzure Service
(Web Role)
Service Monitor
(Worker Role)
<PipelineStage>JobStatus
<PipelineStage>TaskStatus
Dispatch
<PipelineStage>Task Queue
Service Monitor
(Worker Role)
<PipelineStage>TaskStatus
Dispatch
<PipelineStage>Task Queue
GenericWorker
(Worker Role)
<Input>Data Storage
Reprojection Request
Job Queue
Service Monitor
(Worker Role)
Persist
Task Queue
Reprojection Data
Storage
ReprojectionJobStatus
ReprojectionTaskStatus
GenericWorker
(Worker Role)
Points to
ScanTimeList
SwathGranuleMeta
Swath Source
Data Storage
Computational costs
driven by data scale and
need to run reduction
multiple times
Storage costs driven by
data scale and 6 month
project duration
Small with respect to
the people costs even
at graduate student
rates !
...
Download
Queue
Source
Metadata
Scientific
Results
Download
Reprojection
Queue
Reprojection Stage
400 GB
45K files
$420 cpu
3500 hours
$60 download 20-100
workers
Total: $1420
Scientists
AzureMODIS
Service Web Role Portal
Clouds are the largest scale computer centers ever constructed and have the
potential to be important to both large and small scale science problems.
Equally import they can increase participation in research, providing needed
resources to users/communities without ready access.
Clouds suitable for loosely coupled data parallel applications, and can
support many interesting programming patterns, but tightly coupled lowlatency applications do not perform optimally on clouds today.
Provide valuable fault tolerance and scalability abstractions
Clouds as amplifier for familiar client tools and on premise compute.
Clouds services to support research provide considerable leverage for both
individual researchers and entire communities of researchers.