Vous êtes sur la page 1sur 262

Table of contents

Introduction:........................................................................................................................................11
Route53...........................................................................................................................................11
Simple Email Service.....................................................................................................................12
Identity and Access Management...................................................................................................12
Simple Storage Service...................................................................................................................12
Elastic Compute Cloud...................................................................................................................13
Elastic Block Store.........................................................................................................................13
CloudWatch....................................................................................................................................13
AWS components................................................................................................................................14
Amazon cluster....................................................................................................................................14
Storage.................................................................................................................................................15
Databases.............................................................................................................................................16
Administration and security................................................................................................................16
Networking..........................................................................................................................................16
Analytics..............................................................................................................................................16
Application services............................................................................................................................16
Deployment and management.............................................................................................................17
Mobile services....................................................................................................................................17
How does AWS work?.........................................................................................................................17
Who uses AWS?..................................................................................................................................17
Why use AWS?....................................................................................................................................18
Scalability and adaptability of AWS....................................................................................................18
AWS’s security and reliability.............................................................................................................18
Analysis...............................................................................................................................................26
Azure Databricks.................................................................................................................................26
Azure Stream Analytics.......................................................................................................................26
SQL Data Warehouse..........................................................................................................................26
HDInsight............................................................................................................................................26
Data Factory........................................................................................................................................26
Data Analytics Lake............................................................................................................................26
Event Hubs..........................................................................................................................................26
Power Embedded BI............................................................................................................................26
Azure Analysis Services......................................................................................................................26
R Server for HDInsight.......................................................................................................................27
Data Catalog........................................................................................................................................27
Azure Data Storage Lake....................................................................................................................27
Azure Data Explorer............................................................................................................................27
Data base.............................................................................................................................................27
SQL Server on virtual machines..........................................................................................................27
SQL Azure Database...........................................................................................................................27
Azure Cosmos DB...............................................................................................................................27
SQL Data Warehouse..........................................................................................................................27

1
Data Factory........................................................................................................................................27
Cache Azure for Redis.........................................................................................................................27
SQL Server Database Stretch..............................................................................................................28
storage Table........................................................................................................................................28
Azure database to PostgreSQL............................................................................................................28
Azure Database for MariaDB..............................................................................................................28
Azure Database for MySQL................................................................................................................28
Azure database migration service........................................................................................................28
Calculation...........................................................................................................................................28
virtual machines..................................................................................................................................28
Virtual Machine Scale Sets..................................................................................................................28
Azure Kubernetes Service (AKS).......................................................................................................28
Azure Functions..................................................................................................................................28
Fabric Service......................................................................................................................................29
App Service.........................................................................................................................................29
container Instances..............................................................................................................................29
batch....................................................................................................................................................29
Azure Batch AI....................................................................................................................................29
SQL Server on virtual machines..........................................................................................................29
cloud services......................................................................................................................................29
SAP HANA on large bodies Azure......................................................................................................29
Web Apps.............................................................................................................................................29
mobile apps..........................................................................................................................................29
Apps APIs............................................................................................................................................29
Linux virtual machines........................................................................................................................30
Azure CycleCloud...............................................................................................................................30
containers.............................................................................................................................................30
Azure Kubernetes Service (AKS).......................................................................................................30
Azure Functions..................................................................................................................................30
Fabric Service......................................................................................................................................30
App Service.........................................................................................................................................30
container Instances..............................................................................................................................30
container Registry................................................................................................................................30
Web Apps.............................................................................................................................................30
mobile apps..........................................................................................................................................30
Apps APIs............................................................................................................................................31
Web App for Containers......................................................................................................................31
DevOps................................................................................................................................................31
Azure DevOps.....................................................................................................................................31
Azure Pipelines....................................................................................................................................31
Azure Boards.......................................................................................................................................31
Azure Rest...........................................................................................................................................31
Azure Artifacts....................................................................................................................................31
Azure Test Plans..................................................................................................................................31

2
Azure Dev / Test Lab...........................................................................................................................31
Integrations of DevOps tools...............................................................................................................31
multimedia data...................................................................................................................................32
Content Distribution Network.............................................................................................................32
Media Services....................................................................................................................................32
encoding..............................................................................................................................................32
Streaming live and on demand............................................................................................................32
Azure Media Player.............................................................................................................................32
Content protection...............................................................................................................................32
Digital Analytics..................................................................................................................................32
video Indexer.......................................................................................................................................32
Management and Governance.............................................................................................................32
Azure backup.......................................................................................................................................32
Azure Site Recovery............................................................................................................................32
Azure Advisor......................................................................................................................................33
Scheduler.............................................................................................................................................33
Automation..........................................................................................................................................33
Traffic Manager...................................................................................................................................33
Azure Monitor.....................................................................................................................................33
Network Watcher.................................................................................................................................33
Azure Service Health...........................................................................................................................33
Microsoft Azure Portal........................................................................................................................33
Azure Resource Manager....................................................................................................................33
Cloud Shell..........................................................................................................................................33
Mobile Azure application....................................................................................................................33
Azure Policy........................................................................................................................................34
Expenses management........................................................................................................................34
Azure Managed Applications..............................................................................................................34
Azure Migrate......................................................................................................................................34
Azure Blueprint...................................................................................................................................34
IA + Machine Learning.......................................................................................................................34
Azure Batch AI....................................................................................................................................34
Bot Azure Service................................................................................................................................34
Azure Databricks.................................................................................................................................34
Search Azure........................................................................................................................................34
Bing Automatic Suggestion.................................................................................................................34
Custom Search Bing............................................................................................................................35
Entity Search Bing...............................................................................................................................35
Bing Image Search..............................................................................................................................35
Search Bing News...............................................................................................................................35
Bing Spell Checking............................................................................................................................35
Search Bing Videos.............................................................................................................................35
Bing Visual Search..............................................................................................................................35
Bing Web search..................................................................................................................................35

3
cognitive Services................................................................................................................................35
Computer Vision..................................................................................................................................35
Content Moderator..............................................................................................................................36
personalized speech.............................................................................................................................36
personalized vision..............................................................................................................................36
Virtual machines Science Data............................................................................................................36
Emotion...............................................................................................................................................36
Face.....................................................................................................................................................36
Azure Machine Learning Service........................................................................................................36
Machine Learning Studio....................................................................................................................36
Microsoft Genomics............................................................................................................................36
Speech Conversation Translation Translator.......................................................................................36
Language Understanding.....................................................................................................................36
linguistic analysis................................................................................................................................37
QnA Maker..........................................................................................................................................37
Speaker recognition.............................................................................................................................37
voice translation..................................................................................................................................37
Speech Recognition.............................................................................................................................37
Text Analysis.......................................................................................................................................37
Vocal synthesis....................................................................................................................................37
Text translation Text Translator...........................................................................................................37
video Indexer.......................................................................................................................................37
Identity.................................................................................................................................................37
Azure Active Directory.......................................................................................................................37
Azure Information Protection..............................................................................................................38
Azure Active Directory Domain Services...........................................................................................38
Azure Active Directory B2C...............................................................................................................38
Integration...........................................................................................................................................38
Event Grid...........................................................................................................................................38
Logic Apps..........................................................................................................................................38
API Management.................................................................................................................................38
Bus Service..........................................................................................................................................38
Internet of Things................................................................................................................................38
Azure Functions..................................................................................................................................38
Azure IoT Hub.....................................................................................................................................38
Azure IoT Edge...................................................................................................................................39
Azure IoT Central................................................................................................................................39
Accelerators Azure IoT solution..........................................................................................................39
Azure Sphere.......................................................................................................................................39
Azure Time Series Insights..................................................................................................................39
Azure Maps.........................................................................................................................................39
Event Grid...........................................................................................................................................39
Windows 10 IoT Core Services...........................................................................................................39
Azure Machine Learning Service........................................................................................................39

4
Machine Learning Studio....................................................................................................................39
Azure Stream Analytics.......................................................................................................................40
Logic Apps..........................................................................................................................................40
Hubs notification.................................................................................................................................40
Azure Cosmos DB...............................................................................................................................40
API Management.................................................................................................................................40
Azure Digital Twins............................................................................................................................40
Migration.............................................................................................................................................40
Azure Site Recovery............................................................................................................................40
Expenses management........................................................................................................................40
Azure database migration service........................................................................................................40
Azure Migrate......................................................................................................................................40
Data Box..............................................................................................................................................41
Networking..........................................................................................................................................41
Content Distribution Network.............................................................................................................41
ExpressRoute.......................................................................................................................................41
Azure DNS..........................................................................................................................................41
Virtual Network...................................................................................................................................41
Traffic Manager...................................................................................................................................41
Load Balancer......................................................................................................................................41
VPN gateway.......................................................................................................................................41
Gateway Application...........................................................................................................................41
Azure DDoS Protection.......................................................................................................................41
Network Watcher.................................................................................................................................42
Azure Firewall.....................................................................................................................................42
Virtual WAN........................................................................................................................................42
Azure Front Door Service...................................................................................................................42
Mobile.................................................................................................................................................42
App Service.........................................................................................................................................42
Azure Maps.........................................................................................................................................42
Hubs notification.................................................................................................................................42
Web Apps.............................................................................................................................................42
mobile apps..........................................................................................................................................42
Apps APIs............................................................................................................................................42
Mobile Azure application....................................................................................................................43
Visual Studio Application Center........................................................................................................43
Xamarin...............................................................................................................................................43
Web App for Containers......................................................................................................................43
development tools................................................................................................................................43
Visual Studio.......................................................................................................................................43
Visual Studio Code..............................................................................................................................43
Software Development Kits (SDKs)...................................................................................................43
Azure DevOps.....................................................................................................................................43
CLI.......................................................................................................................................................43

5
Azure Pipelines....................................................................................................................................43
Azure Lab Services.............................................................................................................................44
Azure Dev / Test Lab...........................................................................................................................44
Integration of development tools.........................................................................................................44
security................................................................................................................................................44
Azure Active Directory.......................................................................................................................44
Azure Information Protection..............................................................................................................44
Azure Active Directory Domain Services...........................................................................................44
Key Vault.............................................................................................................................................44
Security Center....................................................................................................................................44
hardware security module (HSM) dedicated Azure............................................................................44
VPN gateway.......................................................................................................................................44
Gateway Application...........................................................................................................................45
Azure DDoS Protection.......................................................................................................................45
Storage.................................................................................................................................................45
Storage.................................................................................................................................................45
Azure backup.......................................................................................................................................45
StorSimple...........................................................................................................................................45
Azure Data Storage Lake....................................................................................................................45
Blob storage.........................................................................................................................................45
Disk Storage........................................................................................................................................45
managed disks.....................................................................................................................................45
Queue Storage.....................................................................................................................................45
File storage..........................................................................................................................................46
Data Box..............................................................................................................................................46
Avere vFXT Azure..............................................................................................................................46
Storage Explorer..................................................................................................................................46
archive storage.....................................................................................................................................46
Azure NetApp Files.............................................................................................................................46
Web......................................................................................................................................................46
App Service.........................................................................................................................................46
Content Distribution Network.............................................................................................................46
Search Azure........................................................................................................................................46
Hubs notification.................................................................................................................................46
API Management.................................................................................................................................47
Web Apps.............................................................................................................................................47
mobile apps..........................................................................................................................................47
Apps APIs............................................................................................................................................47
Web App for Containers......................................................................................................................47
Azure SignalR Service........................................................................................................................47
Games...........................................................................................................................................104
What is Amazon Athena: a complete overview............................................................................107
Amazon Athena Features..............................................................................................................108
Amazon Athena Use Cases...........................................................................................................111

6
What is Amazon Athena: pricing..................................................................................................112
What is Amazon CloudSearch?.........................................................................................................112
Overview of Amazon CloudSearch Benefits.....................................................................................113
Overview of Amazon CloudSearch Features....................................................................................115
How Much Does Amazon CloudSearch Cost?..................................................................................116
Amazon EMR....................................................................................................................................130
Processing big data with Amazon EMR.......................................................................................131
Architecture............................................................................................................................................132
Setup.......................................................................................................................................................132
AWS Region.................................................................................................................................132
1. Create VPC....................................................................................................................................133
2. Create KeyPair..............................................................................................................................133
SSH...............................................................................................................................................134
3. Create IAM Role...........................................................................................................................134
4. Create AWS Elasticsearch Domain...............................................................................................137
Elasticsearch Version....................................................................................................................137
5. Start SearchBlox IndexServer via Amazon Marketplace..............................................................142
Integrate with IAM Role...............................................................................................................147
SSH into SearchBlox IndexServer....................................................................................................148
Increase RAM memory for SearchBlox in AWS..........................................................................151
Kibana and Amazon Elasticsearch Service............................................................................................153
Amazon Kinesis................................................................................................................................161
When Should I Use Amazon Aurora and When Should I use RDS MySQL?..................................163
An introduction to Amazon RDS.............................................................................................163
So, Aurora or RDS MySQL?...................................................................................................164
Performance considerations................................................................................................164
Capacity Planning...............................................................................................................165
Replication..........................................................................................................................165
Monitoring...........................................................................................................................166
Costs....................................................................................................................................166
TL;DR.................................................................................................................................167
Drawbacks and Alternatives to DynamoDB.....................................................................................168
Rules of the Game.............................................................................................................................169
Let the Games Begin!........................................................................................................................170
YCSB Workload A, Uniform Distribution........................................................................................172
Real Life Use-case: Zipfian Distribution..........................................................................................173
The Dress that Broke DynamoDB.....................................................................................................176
Scylla vs DynamoDB – Single (Hot) Partition.................................................................................177
Additional Factors.............................................................................................................................177
Cross-region Replication and Global Tables................................................................................177
Explicit Caching is Expensive and Bad for You...........................................................................180
Freedom........................................................................................................................................180
No Limits......................................................................................................................................181
Total Cost of Ownership (TCO)........................................................................................................181

7
ElastiCache versus self-hosted Redis on EC2...................................................................................184
The Practical Comparison: ElastiCache Vs. Self-hosted Redis on EC2...........................................185
Deep Diving into the Practicalities of ElastiCache and Self-hosted Redis on EC2..........................187
ElastiCache: Supports Fully Managed Redis and Memcached...............................................187
ElastiCache: Scales Automatically According to Requirements.............................................187
ElastiCache: Instances with More Than One vCPU Cannot Utilize All the Cores..................187
Self Hosted Redis on EC2: Allows You to Update Latest Version ASAP...............................187
Self Hosted Redis on EC2: Provides the Freedom to Modify Configurations........................187
Self Hosted Redis on EC2: Unavailability of Pertinent Metrics Makes Maintenance Tedious188
Self Hosted Redis on EC2: Instance Limitations.....................................................................188
What is AWS EC2 ?..........................................................................................................................188
Why AWS EC2 ?...............................................................................................................................189
Let’s understand the types of EC2 Computing Instances:.................................................................191
EBS-optimized Instances..............................................................................................................193
Security in AWS EC2 .......................................................................................................................196
Auto Scaling......................................................................................................................................196
AWS EC2 Pricing..............................................................................................................................197
AWS EC2 Use Case..........................................................................................................................198
QLDB, BlockChain Database with Amazon webservices................................................................209
Similar Problems, With a Key Difference.........................................................................................209
The Benefits of Blockchain, But for Closed Systems.......................................................................210
Amazon S3.............................................................................................................................................211
How Amazon S3 works................................................................................................................211
Amazon S3 features......................................................................................................................212
Amazon S3 storage classes...........................................................................................................212
Working with buckets...................................................................................................................213
Protecting your data......................................................................................................................213
Comparing AWS vs Azure vs Google Cloud Platforms For Enterprise App Development..............215
Amazon Web Services..................................................................................................................215
Features.........................................................................................................................................215
Pricing...........................................................................................................................................215
Advantages...................................................................................................................................216
Microsoft Azure............................................................................................................................216
Features.........................................................................................................................................216
Pricing...........................................................................................................................................216
Advantages...................................................................................................................................217
Google Cloud Platform.................................................................................................................217
Features.........................................................................................................................................217
Pricing...........................................................................................................................................217
Advantages...................................................................................................................................218
Google Cloud Platform:....................................................................................................................218
Cloud IAM:......................................................................................................................................218
Evolution of Identity and Access Management.................................................................................219
Why Cloud IAM?..............................................................................................................................220

8
The Fundamentals of Google Compute Engine (GCE).........................................................................226
Virtual Machine (VM) Instances..................................................................................................226
Machine Types..............................................................................................................................226
Custom machine types..................................................................................................................227
Disks.............................................................................................................................................227
Persistent Disks.............................................................................................................................228
Local SSD.....................................................................................................................................229
Images...........................................................................................................................................229
Zones............................................................................................................................................230
What if I choose a zone and want it changed afterward?.............................................................231
Networking & Firewall.................................................................................................................231
Availability Policy........................................................................................................................232
Preemptibility...............................................................................................................................232
Automatic Restart.........................................................................................................................233
On host maintenance.....................................................................................................................233
Other Options...............................................................................................................................233
Accessing the VM.........................................................................................................................233
Pricing...........................................................................................................................................234
Serverless Showdown: AWS Lambda vs Firebase Google Cloud Functions....................................238
Function Creation — Lambda........................................................................................................240
Function Creation — Google Cloud Functions.............................................................................241
Deployment...................................................................................................................................243
Testing — Lambda.........................................................................................................................243
Testing — Google Cloud Functions...............................................................................................243
Pricing...........................................................................................................................................244
Conclusion....................................................................................................................................245
Cloud Shell:.......................................................................................................................................247
What is Google Cloud Shell.........................................................................................................247
What does it come with?..........................................................................................................247
Tips and Tricks.............................................................................................................................247
1. Running a web server (with auto-HTTPS for FREE!).........................................................247
2. Get extra power with “boost mode”.....................................................................................248
3. Edit your files with a GUI....................................................................................................248
4. Upload/Download files........................................................................................................248
5. Persist binary/program installations.....................................................................................248
Bonus: Open in Cloud Shell....................................................................................................248
Firebase Components:.......................................................................................................................249
Firebase.........................................................................................................................................249
A Brief History.........................................................................................................................249
Firebase Services.....................................................................................................................249
Realtime Database........................................................................................................................251
Authentication...............................................................................................................................251
Firebase Cloud Messaging (FCM)................................................................................................252
Firebase Database Query..............................................................................................................253

9
How to Store Data? => Firebase Storage.....................................................................................253
Firebase Test Labs........................................................................................................................254
Instrumentation Test.................................................................................................................254
Remote Config..............................................................................................................................254
Firebase App Indexing..................................................................................................................254
Firebase Dynamic Links...............................................................................................................255
Firestore........................................................................................................................................255
Improved Querying and Data Structure...................................................................................255
Query with Firestore................................................................................................................256
Better Scalability......................................................................................................................256
Multi-Region Database............................................................................................................256
Different Pricing Model...........................................................................................................257
Google Cloud Console:.....................................................................................................................258
OpenShift on OpenStack 1-2-3: Bringing IaaS and PaaS Together..................................................259
Overview...........................................................................................................................................259
OpenShift and the Case for OpenStack.............................................................................................260
OpenShift integration with OpenStack..............................................................................................260
OpenShift on OpenStack Architectures.............................................................................................261
Resource vs AutoScaling Groups.................................................................................................262
Non-HA........................................................................................................................................262
HA................................................................................................................................................263
Deploying OpenStack........................................................................................................................264
Deploying OpenShift on OpenStack 1-2-3.......................................................................................266
Step 1............................................................................................................................................266
Step Two.......................................................................................................................................268
Step Three.....................................................................................................................................270
Optional.............................................................................................................................................271
Pivotal Clound Foundry vs Kubernetes: Choose the best way to deploy native cloud applications 273
“Application” PaaS vs. “Container” PaaS.........................................................................................274
Pivotal Cloud Foundry......................................................................................................................275
Features.........................................................................................................................................276
Installation and Usability..............................................................................................................277
Best Use Cases..............................................................................................................................277
Kubernetes.........................................................................................................................................277
Features.........................................................................................................................................278
Installation and Usability..............................................................................................................279
Best Use Cases..............................................................................................................................279
Best of Both Worlds: Cloud Foundry Container Runtime................................................................279
Conclusion:........................................................................................................................................280
Bibliography:.....................................................................................................................................281

10
Introduction:
Since the Cloud Computing had been deployed to the professional world, There were many interfaces
to deal with the end user needs and other stakeholders: Those platforms begun with amazon web
services, Google Cloud Platfrom Microsoft Azure IBM Blue Mix and other Red Hat PaaS such as
OpenShift and
OpenStack, The Open Source world also have its platform named CloudFoundry.

Over the last couple of years, the popularity of the “cloud computing” has grown dramatically and
along with it so has the dominance of Amazon Web Services (AWS) in the market. Unfortunately, AWS
doesn’t do a great job of explaining exactly what AWS is, how its pieces work together, or what typical
use cases for its components may be. This post is an effort to address this by providing a whip around
overview of the key AWS components and how they can be effectively used.
Great, so what is AWS? Generally speaking, Amazon Web Services is a loosely coupled collection of
“cloud” infrastructure services that allows customers to “rent” computing resources. What this means is
that using AWS, you as the client are able to flexibly provision various computing resources on a “pay
as you go” pricing model. Expecting a huge traffic spike? AWS has you covered. Need to flexibly store
between 1 GB or 100 GB of photos? AWS has you covered. Additionally, each of the components that
makes up AWS is generally loosely coupled meaning that they can work independently or in concert
with other AWS resources.

Since AWS components are loosely coupled, you’d be able to mix and match only what you need but
here is an overview of the key services.

Route53

What is it? Route53 is a highly available, scalable, and feature rich domain name service (DNS) web
service. What a DNS service does is translate a domain name like “setfive.com” into an IP address like
64.22.80.79 which allows a client’s computer to “find” the correct server for a given domain name. In
addition, Route53 also has several advanced features normally only available in pricey enterprise DNS
solutions. Route53 would typically replace the DNS service provided by your registrar like GoDaddy
or Register.com.

Should you use it? Definitely. Allow it isn’t free, after last year’s prolonged GoDaddy outage it’s clear
that DNS is a critical component and using a company that treats it as such is important.

11
Simple Email Service

What is it? Simple Email Service (SES) is a hosted transactional email service. It allows you to easily
send highly deliverable emails using a RESTful API call or via regular SMTP without running your
own email infrastructure.

Should you use it? Maybe. SES is comparable to services like SendGrid in that it offers a highly
deliverable email service. Although it is missing some of the features that you’ll find on SendGrid, its
pricing is attractive and the integration is straightforward. We normally use SES for application emails
(think “Forgot your password”) but then use MailChimp or SendGrid for marketing blasts and that
seems to work pretty well.

Identity and Access Management

What is it? Identity and access management (IAM) provides enhanced security and identity
management for your AWS account. In addition, it allows you to enable “multi factor” authentication to
enhance the security of your AWS account.

Should you use it? Definitely. If you have more than 1 person accessing your AWS account using IAM
will allow everyone to get a separate account with fine grained permissions. Multi factor authentication
is also critically important since a compromise at the infrastructure level would be catastrophic for
most businesses. Read more about IAM here.

Simple Storage Service

What is it? Simple storage service (S3) is a flexible, scalable, and highly available storage web
service. Think of S3 like having an infinitely large hard drive where you can store files which are then
accessible via a unique URL. S3 also supports access control, expiration times, and several other useful
features. Additionally, the payment model for S3 is “pay as you go” so you’ll only be billed for the
amount of data you store and how much bandwidth you use to transfer it in and out.

Should you use it? Definitely. S3 is probably the most widely used AWS service because of its
attractive pricing and ease of use. If you’re running a site with lots of static assets (images, CSS assets,
etc.), you’ll probably get a “free” performance boost by hosting those assets on S3. Additionally, S3 is
an ideal solution for incremental backups, both data and code. We use S3 extensively, usually for

12
hosting static files, frequently backing up MySQL databases, and backing up git repositories. The new
AWS S3 Console also makes administering S3 and using it non-programmatically much easier.

Elastic Compute Cloud

What is it? Elastic Compute Cloud (EC2) is the central piece of the AWS ecosystem. EC2 provides
flexible, on-demand computing resources with a “pay as you go” pricing model. Concretely, what this
means is that you can “rent” computing resources for as long as you need them and process any
workload on the machines you’ve provisioned. Because of its flexibility, EC2 is an attractive
alternative to buying traditional servers for unpredictable workloads.

Should you use it? Maybe. Whether or not to use EC2 is always a controversial discussion because the
complexity it introduces doesn’t always justify its benefits. As a rule of thumb, if you have
unpredictable workloads like sporadic traffic using EC2 to run your infrastructure is probably a
worthwhile investment. However, if you’re confident that you can predict the resources you’ll need you
might be better served by a “normal” VPS solution like Linode.

Elastic Block Store

What is it? Elastic block store (EBS) provides persist storage volumes that attach to EC2 instances to
allow you to persist data past the lifespan of a single EC2. Due to the architecture of elastic compute
cloud, all the storage systems on an instance are ephemeral. This means that when an instance is
terminated all the data stored on that instance is lost. EBS addresses this issue by providing persistent
storage that appears on instances as a regular hard drive.

Should you use it? Maybe. If you’re using EC2, you’ll have to weigh the choice between using only
ephemeral instance storage or using EBS to persist data. Beyond that, EBS has well documented
performance issues so you’ll have to be cognizant of that while designing your infrastructure.

CloudWatch

What is it? CloudWatch provides monitoring for AWS resources including EC2 and EBS. CloudWatch
enables administrators to view and collect key metrics and also set a series of alarms to be notified in
case of trouble. In addition, CloudWatch can aggregate metrics across EC2 instances which provides
useful insight into how your entire stack is operating.

13
Should you use it? Probably. CloudWatch is significantly easier to setup and use than tools
like Nagios but its also less feature rich. We’ve had some success coupling CloudWatch
with PagerDuty to provide alerts in case of critical service interruptions. You’ll probably need
additional monitoring on top of CloudWatch but its certainly a good baseline to start with.

AWS full form is Amazon Web Services. Previously a factory would typically build an electricity plant
and use it for their purposes. Then power experts would manage electricity plants to provide reliable
power supply at a very low cost to these factories as a whole. The electricity could be generated with
greater efficiency and the price in this model is also low. AWS cloud follows a similar model where
instead of building large scale infrastructures, companies can opt for cloud services where they can get
all infrastructure they could ever need.

AWS is a growing cloud computing platform which has a significant share of Cloud Computing with
respect to its competitors. AWS is geographically diversified into regions to ensure system robustness
and outages. In Japan, Eastern USA, two locations in Western USA, Brazil, Ireland, Singapore, and
Australia regions there are central hubs in place. There are over 50 services like application services,
networking, storage, mobile, management, compute and many others which are available for the client
easily.
This 2-minute video will familiarize you with the concepts of AWS :
To enterprises, start-ups; services can quickly be deployed without these firms needing much capital.
As AWS is closely collaborating with GE, Pinterest and MLB the cloud clients can pin, power and play
with the features in AWS cloud. Let’s now dig into the components of AWS.

AWS components
To assess the cloud computing capabilities of AWS we have to first look into the core components of
the cloud. There are various components of AWS but we are elucidating on only key components.

Amazon cluster
Also known as Amazon compute, AWS has mainly EC2 (Elastic cloud compute) and ELB (Elastic load
balancing) as the lead computing services. It is due to the virtue of these instances that companies can
scale up or down based upon need. System admins and developers use EC2 instances to get hold and
boot the computing instances in the cloud. The pricing is based on usage. The first timers to AWS get
around 750 hours of EC2 per month for the first year. But beyond that they have three pricing models
like on-demand, spot instance and reserve instance.

14
Depending on location, size, complexity and storage requirements on-demand prices range from $0.13
to $4.60.
Reserve instance pricing is where the users are expected to reserve the instance well in advance in the
range of one to three years. AWS offers upto 75% discount on on-demand pricing when users reserve
the cloud instances.
Spot instance pricing lets users bid on compute instances that are not used. Spot prices differ based on
usage, time of day, week or month.
For less human intervention and fault tolerance, AWS ELB distributes the applications widely
throughout the EC2 instances. The ELB service is free within 15GB of data processing and 750 hours
of monthly service for a year. Larger loads are charged on an hourly basis and each GB transferred.

“When I get into complex customer situations that leverage combinations of AWS services, AWS
Certification has allowed me to immediately add value.”
– Ryan Fackett, Director, Foghorn Consulting, Advanced APN Consulting Partner, AWS Certiifed
Solutions Architect – Professional

Storage
Amazon’s Simple Storage Service (S3), Elastic block storage (EBS) and CloudFront are the three
storage choices of Amazon. Storage in AWS is provided through pay-as-you-go model. Amazon S3 is
a storage offering of AWS that can store any amount of storage which is required. It is used for various
reasons like content storage, backup, archiving and disaster recovery, and also data analysis storage/
Along with free EC2 instance for the first year, AWS also offers 5GB of cloud storage and 20,000 GET
requests and 5,000 PUT requests from S3 free for the first year. After first year the pricing is $.0300 for
1GB upto 1TB per month. EBS is very helpful in scaling the EC2 instances. Pricing is based on
geographic regions like the disk technology used and the GBs of provisioned storage required.
CloudFront is a great storage option for developers and business organizations which facilitates low
latency and high data transfer speeds.

Databases
Along with in-memory caching and data warehousing facility in the range of petabytes AWS also scales
relational and NoSQL databases. DynamoDB is the NoSQL database which offers high scale, low cost

15
storage. Using EC2 and EBS, users can operate on their own databases in AWS. Relational Database
Service(RDS) and Amazon Redshift are the two database services from AWS.
To operate and scale MySQL, Oracle, SQLServer or PostgreSQL servers on AWS, Amazon RDS is
used. Based on the instance hours and storage amount, RDS pricing is used. Redshift is a data
warehouse service through which users can store data in columns rather than in rows. Pricing is based
on the instance hours like $0.25 per hour.

Administration and security


AWS Directory Service directly links AWS clouds to on-premises. CloudWatch monitors cloud
resources of AWS. AWS CloudTrail records API calls for user AWS accounts. CloudTrial does this for
no charge at all.

Networking
Amazon VPC (Virtual Private Cloud) provides a versatile networking capability in AWS meaning it
provides built-in security and a private cloud. VPC comes free with EC2. AWS Direct Connect
Service lets users directly connect to the cloud bypassing the internet. It is priced on an hourly basis.

Analytics
AWS offers services for data analytics on all fronts like Hadoop, orchestration and real-time streaming
and data warehousing. EMR (Elastic MapReduce) is the analytics facilitator which is used by the
Businesses, data analysts, researchers and developers to process data chunks. Pricing is done on an
hourly basis. Redshift also provides some analytics capabilities.

Application services
To automate workflow between different services Amazon SQS (Simple Queue Service) is used. A
dedicated queue is present which is used in storing messages. The service is free upto 1 million
messages per month and after that $0.50 is charged for every million messages.
SWS (Simple Workflow Service) is a task management and co-ordination service for AWS. 10,000
activity tasks, 30,000 workflow days and 1,000 initiated executions for a year are free for users. Above
that per workflow users pay around $0.0001.

Deployment and management


Elastic Beanstalk uses Java, .NET, PHP, Node.js, Python and Ruby to deploy and scale web
applications. Application health and log files can be easily monitored. CloudFormation helps
businesses and developers gather and provision important AWS resources.

16
Mobile services
Amazon Cognito and Mobile Analytics are two popular AWS mobile services. Cognito IDs users and
syncs data across their mobile devices. Upto 10GB of cloud sync storage and 10 lakh sync operations
per month are free here. Beyond that users are liable to pay around $0.15 for every 10,000 operations.
Usage data within 60 minutes is delivered by Mobile Analytics which tracks applications at scale. Upto
one million events usage is free and above that the pricing is $1 for every million.

How does AWS work?


You basically need to sign into an account with the AWS. This would require credit card details from
your part. Upon creating the account you can start exploring AWS Management Console and view their
services. Amazon hosts in its site 10-minute tutorials in launching a Virtual Machine with Amazon
EC2, using S3 how you can upload files to the cloud and also using DynamoDB how you can create
and query a NoSQL database. These are only from compute section and you can view the same how-to
videos on database, developer AWS tools, messaging, storage and content delivery and much more.

“If we wouldn’t have gone through AWS training then our progress would have much slower.
There would have been a lot of pitfalls.”
– Christian Boehm, Head, Data center infrastructure, Siemens

Who uses AWS?


The pay-as-you-go model is a great benefit for those who can’t afford an in-house infrastructure.
Startups especially face this situation. They are usually cash strapped and seek cloud services to fulfill
their infrastructure requirements. Amazon says that the active users of its cloud are over 10 lakh.

Small, medium firms and startups make up majority of that space with enterprise users being just
around 1 lakh.

Why use AWS?


EC2 units give world class performance at an hourly rate and 90% of the time traditional hardware is
not used by large corporates. Huge maintenance is required for that kind of hardware which has to be
considered also. During peak hours there is a chance that the hardware may not be sufficient in
providing competent service. Hence if the organization had shifted to AWS all of these woes could
essentially come to an end. Companies needn’t worry over the maintenance and the cost involved with

17
it. No matter how much demand there is, AWS can scale to that level. AWS is also helpful in big data
analytics. In AWS code deployment can be achieved continuously as DevOps processes are expertly
supported.

Scalability and adaptability of AWS


Scalability and adaptability are some terms which define AWS. Building a business from scratch is
tough. AWS lightens the load in this regard as it provides all the tools which the companies can use to
get started on the cloud. The cost of migration is low where enterprises can migrate their services from
their existing infrastructure to that of the AWS. As we have said Netflix is the best example to this as
almost every instance of the company has been migrated to the AWS cloud.

As the AWS services can be flexibly used, customers needn’t worry over about their computing usage.
The usage can be very high or very low and the AWS will scale for whichever way the company wants.
This high adaptability is what AWS stands for truly.

AWS’s security and reliability


Compared to a company website that is self-hosted, AWS is more secure. Dozens of data centers across
the world are continuously monitored and has a vigilant team watching over it. As the data centers are
spread out, a natural disaster or outage at one location won’t affect other data centers. If you think that
is bad then consider this. How safe do you think the data is if all of it is centered in one location where
many people can get their hands on it. This is the case with most of the enterprises. The location of
these data centers in the case of AWS are kept top secret and any issues pertaining them will be cleared
promptly.

The reason why Amazon is so huge is because of AWS along with its retail arm. We have already said
that AWS offers upto 75% discount when the instances are reserved in advance. We all know that AWS
is in huge profit and is growing rapidly. This itself says that inspite of providing services at such
discounted rates AWS is able to rake that much in profits then imagine how many users are massively
using it. Did you know AWS’s IaaS cloud is 10 times greater than the 14 competitors of AWS
combined? Stats like this are only possible because of strong capability which AWS provides. Even
then large enterprises are ever hesitant to make transition to AWS as we said earlier mainly pinpointing
security issues. But again we would like to reiterate the case of Netflix which grew tremendously using
AWS. Therefore enterprises can leverage on AWS to achieve such a feat in their own game too. If you
own an enterprise then use AWS for infrastructure, if you are a developer, engineer, data professional
etc try out the AWS free instances at least, if you are a stock trader then buy the stocks of Amazon
before large enterprises realize the potential in AWS.

18
19
Analysis
Amazon AthenaS3 data queries in SQL
Amazon CloudSearchmanaged search service
Amazon EMRhosted Hadoop framework
Amazon ElasticSearch ServiceExecution and ElasticSearch cluster design
Amazon KinesisWorking with streaming data in real time
Amazon Streaming Managed KafkaApache Kafka fully managed service
Amazon Redshiftfast data storage, easy and affordable
Amazon QuickSightFast service analysis activities
AWS Data PipelineService Orchestration workflow periodic data-driven
AWS GluePreparation and Loading Data
AWS Lake FormationCreate a secure data lake within days

Application Integration

AWS Step FunctionsCoordination of distributed applications


Amazon MQMessage Broker managed to ActiveMQ
Amazon Simple Notification Service (SNS)Posts Topics managed to Pub / Sub
Amazon Simple Queue Service (SQS) managed queues messages
AWS AppSyncCreating applications-oriented data with offline capabilities and real-time

AR VR
Amazon SumerianCreate and run AR and VR applications

AWS cost management

Cost AWS ExplorerAnalyze your expenses and your use of AWS


AWS BudgetsCreate custom budgets and cost of use
Usage Report and cost AWSSee complete information on your costs and your use
Reports on Reserved InstancesGain more control over your Reserved Instances

Blockchain

Amazon Managed BlockchainCreate and manage scalable block chain networks


Amazon Quantum Ledger Database (QLDB)records database managed entirely

Business applications
Alexa for BusinessOptimize your organization with Alexa
Amazon ChimeMeetings, video calls and chat made easy
Amazon WorkDocsstorage and sharing service for business
Amazon WorkmailEmail and secured and managed planning

Calculation

Amazon EC2virtual servers in the cloud

20
Amazon EC2 Auto ScalingTo scale computing capacity to meet demand
Amazon Elastic Container Registry Storing and Retrieving Docker pictures
Amazon Elastic Container ServiceRun and manage Docker containers
Amazon Elastic Container Service for Kubernetes Run the Kubernetes system run on AWS
Amazon LightSailLaunching and managing virtual private servers
AWS BatchBatch execution of tasks at any scale
AWS Elastic BeanstalkRun and manage Web applications
AWS FargateRun containers without management servers or clusters
AWS LambdaRun your code in response to events.
AWS OutpostsRun AWS services on website
AWS Serverless Application Repository Discovery, deployment and publication of server applications
without
Elastic Load Balancing (ELB)incoming traffic distribution to multiple targets
VMware Cloud on AWSDevelop a hybrid cloud without special hardware

Customer engagement

Amazon ConnectCloud Call Center


Amazon PinpointInvolve your users by email, SMS and push messages
Amazon Simple Email Service (SES)Sending and receiving emails

Databases

Amazon Aurorarelational database managed high performance


Amazon DynamoDBNoSQL database operated
Amazon ElastiCachecached in system
Amazon Neptune Database Service fully managed graphics
Amazon Quantum Ledger Database (QLDB)records database managed entirely
Amazon RDSrelational database service managed for MySQL, PostgreSQL, Oracle, SQL Server and MariaDB
Amazon RDS on VMwareAutomate the management of the database website
Amazon Redshiftfast data storage, easy and affordable
Amazon TimestreamBasic fully managed historical data
AWS Database Migration Servicedatabase migration with minimal interruptions

Streaming and workstation applications

Amazon WorkspacesWorkstation Service


Amazon AppStream 2.0Streaming secure desktop applications on a browser

Developer Tools

21
AWS CodestarDevelop and deploy applications AWS
Amazon CorrettoDistribution ready for production OpenJDK
AWS Cloud9Write, execute and debug code on a cloud IDE
AWS CodeBuildCreate and test code
AWS CodeCommitStore code in private Git repositories
AWS CodeDeployAutomate the deployment code
AWS CodePipelineStart software using the Streaming
Command Line Interface AWSunified tool to manage AWS services
Tools and SDKs AWSTools and SDKs for AWS
AWS X-RayAnalysis and debug your applications
Tech Game

Amazon GameLiftsimple, fast and economical hosting game servers dedicated


Amazon LumberyardA 3D game engine free and multiplatform, the fully accessible source code and
integrated with AWS and Twitch

IoT

AWS IoT Coredevices Cloud Connect


Amazon FreeRTOSIoT operating system for microcontrollers
AWS Greengrasslocal computing capacity, messaging and synchronization devices
IoT AWS 1-ClickCreating one click of a trigger AWS Lambda
AWS IoT AnalyticsAnalysis for IoT devices
AWS button IoTDash programmable button for the cloud
AWS IoT Device DefenderSecurity management for IoT devices
AWS IoT Device ManagementEmbark, organize and manage remote devices IoT
AWS IoT EventsDetection and response IoT events
AWS IoT SiteWiseCollector and IoT data interpreter
AWS IoT Things GraphEasily connect devices and web services
AWS Partner Device CatalogCatalog selected equipment IoT supports AWS
machine Learning

Amazon SageMakerCreate, train and deploy models Machine Learning wide


Amazon ComprehendDetection information and relationships in a text
Amazon Elastic Inferenceinference accelerated deep learning
Amazon ForecastIncrease the accuracy of your forecasts through machine learning
Amazon LexCreating chatbots to voice and text
Personalize AmazonRecommendations designed in real time in your applications
Amazon PollyConverting text to realistic audio recording
Amazon RekognitionAnalysis of images and videos
Amazon SageMaker Ground TruthCreate accurate datasets for machine learning training
Amazon TextractExtract text and document data
Amazon Translatefluid and natural language translation

22
Amazon TranscribeSpeech Recognition
Deep Learning AWS AMIQuickly Start Deep Learning on EC2
AWS DeepLensVideo camera with profound learning
AWS DeepRacerRace car scale 1 / 18th autonomously managed by ML
AWS InferentiaChip inference machine learning
Apache MXNet on AWSDeep Learning scalable and high performance
TensorFlow on AWSMachine Intelligence Open Source Library
Management and Governance

Amazon CloudWatchWatch for resources and applications


AWS Auto ScalingScale multiple resources to meet demand
AWS CloudFormationCreate and manage resources using models
AWS CloudTrailFollow the activity of users and the use of APIs
Command Line Interface AWSunified tool to manage AWS services
AWS SetupFollow the inventory of resources and changes
AWS Control TowerSet up and manage a multiple accounts compliant and secure environment
mobile application AWS consoleAccess resources, wherever you are.
AWS License ManagerTrack, manage and monitor license usage
AWS Management ConsoleWeb User Interface
AWS Managed ServicesManagement of infrastructure operations for AWS
AWS OpsWorksAutomate operations with Chef and Puppet
AWS Personal Health DashboardCustomized display of the status of AWS services
AWS Service CatalogCreate and use standardized products
AWS Systems ManagerDiscuss an operational approach and take action
AWS Trusted AdvisorOptimize performance and security
AWS Well-Architected toolRevise and improve your workloads
multimedia services

Amazon Elastic Transcodersimple and scalable transcoding media files


Kinesis Amazon Video StreamsProcessing and analysis of video streams
AWS Elemental MediaConnectreliable live video and secure transport
AWS Elemental MediaConvertConvert the file of video content
AWS Elemental MediaLiveConvert video content direct
AWS Elemental MediaPackageGetting package and video editing
AWS Elemental MediaStoremultimedia storage and simple HTTP origin
AWS Elemental MediaTailorMonetization and personalized video
Migration and transfer
AWS migration HubFollow transfers from a single place
AWS Application Discovery ServiceDiscover the on-premise applications to streamline migration
AWS Database Migration Servicedatabase migration with minimal interruptions
AWS DataSyncData transfer quick and easy online
AWS Server Migration Service Migrating Local servers to AWS
AWS Snow Familyphysical apparatus for migrating data to and out of AWS

23
AWS Transfer for SFTPSFTP fully managed service
Mobile

AWS AmplifyCreate and deploy mobile applications and web


Amazon API GatewayCreation, deployment and management API
Amazon PinpointPush notifications for mobile apps
AWS AppSyncmobile data applications in real time and offline
AWS Device FarmTest your Android apps, Fire OS and iOS on real cloud devices

Networking and content delivery

Amazon VPCisolated cloud resources


Amazon API GatewayCreation, deployment and management API
Amazon CloudFrontGlobal Content Delivery Network
Amazon Route 53scalable Domain Name System
Amazon VPC PrivateLinkSecure access to services hosted on AWS
AWS App MeshMonitor and control the microservices
AWS Cloud MapApplication Resource Registry for microservices
AWS Direct Connectdedicated network connection to AWS
AWS Global AcceleratorImprove availability and application performance
AWS Transit GatewayTo easily scale of VPCs and Account
robotics

AWS RoboMakerDevelop, test and deploy robotic applications

Satellite

AWS Ground StationGround Station fully managed as a service

Security and identity compliance

AWS Identity and Access Management (IAM) Manage user access and encryption keys
Amazon Cloud DirectoryCreate flexible cloud-native directories
Amazon CognitoIdentity Management for your players
Amazon GuardDutyManaged service threat detection
Amazon InspectorAnalysis of Application Security
Amazon MacieDiscover, organize and protect your data
AWS ArtifactOn-demand access to compliance reports AWS
AWS Certificate ManagerAllocation, management and SSL / TLS certificate deployment
AWS CloudHSMKey equipment storage for regulatory compliance
AWS Directory ServiceHost and manage Active Directory
AWS Firewall ManagerCentral policy management of firewalls
AWS Key Management ServiceCreating and supervised control of encryption keys

24
AWS OrganizationsManagement policies of several AWS accounts
AWS Manager SecretsRotate, management and extraction of secrets
AWS Security Hubsecurity and Unified Compliance Center
AWS ShieldProtection against DDoS
AWS Single Sign-OnSingle Sign On (SSO) cloud
AWS WAFFiltering malicious Web traffic

Storage

Amazon Simple Storage Service (S3)scalable cloud storage


Amazon Elastic Block Store (EBS)storage volumes by EC2 block
Amazon Elastic File System (EFS)fully managed file system for EC2
Amazon FSx for Lusterintensive file system fully managed computing
Amazon FSx for Windows File ServerSystem fully managed Windows native file
Amazon GlacierArchive Storage in the cloud at low cost
AWS Snow Familyphysical apparatus for migrating data to and out of AWS
AWS Storage GatewayHybrid Storage Integration

25
Azure

 Analysis
Read more

 Azure Databricks
Platform for rapid, simple and collaborative-based Apache Spark

 Azure Stream Analytics


Real-time processing of data streams from millions of IoT devices

 SQL Data Warehouse


Elastic Data Warehousing as a Service with features for businesses

 HDInsight
Stock up Hadoop clusters, Spark R Server, HBase and Storm cloud

 Data Factory
The integration of hybrid data across the enterprise easier

 Data Analytics Lake


Distributed Analysis Service facilitating Big Data

 Event Hubs
Get millions of devices telemetry data

 Power Embedded BI
Incorporate data visualizations fully interactive and vivid in your applications

 Azure Analysis Services


professional analysis engine as a service

26
 R Server for HDInsight
predictive analytics, machine learning and statistical modeling for Big Data

 Data Catalog
Get more value from your corporate data assets

 Azure Data Storage Lake


Functionality Data Lake secure, massively scalable Azure Blob Storage based on

 Azure Data Explorer


fast and highly scalable data mining service

 Data base
Read more

 SQL Server on virtual machines


Accommodation business applications SQL Server in the cloud

 SQL Azure Database


relational database SQL Database managed as a service

 Azure Cosmos DB
Multi-database, globally distributed and available at any scale

 SQL Data Warehouse


Elastic Data Warehousing as a Service with features for businesses

 Data Factory
The integration of hybrid data across the enterprise easier

 Cache Azure for Redis


Feed the applications with access to high-speed data and low latency

27
 SQL Server Database Stretch
Extend dynamically local SQL Server databases on Azure

 storage Table
Storage NoSQL key values using semi-structured data sets

 Azure database to PostgreSQL


PostgreSQL database managed service for application developers

 Azure Database for MariaDB


MariaDB database managed services for application developers

 Azure Database for MySQL


database service for MySQL managed application developers

 Azure database migration service


Simplify the migration of local databases in the cloud

 Calculation
Read more

 virtual machines
Stock up Windows and Linux virtual machines in seconds

 Virtual Machine Scale Sets


Manage and set to scale up to thousands of Windows and Linux virtual machines

 Azure Kubernetes Service (AKS)


Simplify deployment, management and operations of Kubernetes

 Azure Functions
Treat events with code serverless

28
 Fabric Service
Develop microservices and orchestrate containers on Windows or Linux

 App Service
Quickly create powerful cloud applications for the web and mobile devices

 container Instances
Easily perform on Azure containers without server management

 batch
Plan tasks and management of calculations across the cloud

 Azure Batch AI
Experiment and easily train your deep learning models and artificial intelligence in parallel and across

 SQL Server on virtual machines


Accommodation business applications SQL Server in the cloud

 cloud services
Build applications and cloud APIs infinitely scalable and highly available

 SAP HANA on large bodies Azure


Run bulky SAP HANA workloads from one cloud provider HyperScale

 Web Apps
Quickly create and deploy across critical web applications

 mobile apps
Create and host the main application for any mobile

 Apps APIs
Generate and easily using cloud API

29
 Linux virtual machines
Stock up virtual machines for Ubuntu, Red Hat and others

 Azure CycleCloud
Create, manage, operate and optimize HPC clusters and Big Compute, whatever the scale

 containers

 Azure Kubernetes Service (AKS)


Simplify deployment, management and operations of Kubernetes

 Azure Functions
Treat events with code serverless

 Fabric Service
Develop microservices and orchestrate containers on Windows or Linux

 App Service
Quickly create powerful cloud applications for the web and mobile devices

 container Instances
Easily perform on Azure containers without server management

 container Registry
Store and manage the container images on all kinds Azure deployment

 Web Apps
Quickly create and deploy across critical web applications

 mobile apps
Create and host the main application for any mobile

30
 Apps APIs
Generate and easily using cloud API

 Web App for Containers


Easily deploy and run containerized web applications that evolve with your business

 DevOps

 Azure DevOps
Services enables teams to share code, track tasks and deliver software

 Azure Pipelines
Create, test and deploy continuously on the platform and the cloud of your choice

 Azure Boards
Schedule and track tasks your team and discuss about them

 Azure Rest
Access an unlimited number of private Git repositories hosted in the cloud for your project

 Azure Artifacts
Create, share and hosting packages with your team

 Azure Test Plans


Test and deliver with confidence with a resource kit for manual and exploratory testing

 Azure Dev / Test Lab


Quickly create environments with templates and reusable artifacts

 Integrations of DevOps tools


Use your favorite tools DevOps with Azure

31
 multimedia data
 Content Distribution Network
Ensure the distribution of reliable and secure content with broad general

 Media Services
Rip, store and stream audio and large-scale video

 encoding
studio type encoding across the cloud

 Streaming live and on demand


Complete the delivery of content on all devices at a scale relevant to business needs

 Azure Media Player


A single layer for all your reading needs

 Content protection
Provide secure content using AES, PlayReady, Widevine and of Fairplay

 Digital Analytics
Learn with video files with voice and video services

 video Indexer
Pull insights of your videos

 Management and Governance


 Azure backup
Simplify data protection and protect against rançongiciels

 Azure Site Recovery


Your business never stops thanks to the integrated disaster recovery service

32
 Azure Advisor
Your personalized recommendation engine on best practices Azure

 Scheduler
Execute tasks as simple or complex periodic planning

 Automation
Simplify management of cloud by automating processes

 Traffic Manager
Route the incoming traffic ensuring high performance and availability

 Azure Monitor
total observability of applications, infrastructure and network

 Network Watcher
diagnostic solution and network performance monitoring

 Azure Service Health


Personalized advice and support when you have problems with the Azure Services

 Microsoft Azure Portal


Build, manage and monitor all Azure products in a single console

 Azure Resource Manager


Simplify how you manage your applications resources

 Cloud Shell
Simplify the administration of Azure with a shell browser based

 Mobile Azure application


Stay connected to your Azure resources anywhere, anytime

33
 Azure Policy
Implement corporate governance and standards to scale your resources Azure

 Expenses management
Optimize your spending on cloud while maximizing the potential of the cloud

 Azure Managed Applications


Simplify the managements of the cloud offers

 Azure Migrate
Detect, assess, size and easily migrate your local virtual machines to Azure

 Azure Blueprint
Enable rapid and reproducible creation of regulated environments

 IA + Machine Learning

 Azure Batch AI
Experiment and easily train your deep learning models and artificial intelligence in parallel and across

 Bot Azure Service


Intelligent automated service and serverless that adapts on demand

 Azure Databricks
Platform for rapid, simple and collaborative-based Apache Spark

 Search Azure
Search as fully managed services

 Bing Automatic Suggestion


Integrate your applications of smart options automatic suggestion for research

34
 Custom Search Bing
simple search tool to use, without advertising and commercial quality that can provide the desired
results

 Entity Search Bing


Enrich your experience by identifying and multiplying the volume of information on entities from the
Internet

 Bing Image Search


Search for images and get complete results

 Search Bing News


Search news and get complete results

 Bing Spell Checking


Detect and correct spelling errors in your application

 Search Bing Videos


Search for videos and get complete results

 Bing Visual Search


Get rich insights to create compelling graphics applications on the device of your choice.

 Bing Web search


Access enhanced information when browsing search billions of web documents

 cognitive Services
Add intelligent API functionality for contextual applications

 Computer Vision
Uncover relevant information from images

 Content Moderator
automated moderation of images, text and videos

35
 personalized speech
Overcoming barriers to speech recognition, such as speech style, background noise and vocabulary

 personalized vision
Easily customize your computer vision models to fit your use case

 Virtual machines Science Data


Preconfigured environment enriched for the development of artificial intelligence

 Emotion
Customize the user experience with emotion recognition

 Face
Detect, analyze, organize and identify faces in your photos

 Azure Machine Learning Service


Submit artificial intelligence to all with a platform from end to end, scalable and approved which
includes testing and management models.

 Machine Learning Studio


Generate easily deploy Secure management and predictive analytics solutions

 Microsoft Genomics
Power sequencing and genomics research information

 Speech Conversation Translation Translator


Easily perform oral translation in real time with a simple REST API call

 Language Understanding
Train your applications to include the commands of your users

 linguistic analysis
Simplify complex language concepts and parse text with the API linguistic analysis

 QnA Maker

36
Distill information through conversational style of answers in which it is easy to navigate

 Speaker recognition
Identify and verify the speakers by their voice

 voice translation
Easily integrate voice translation in real time to your application

 Speech Recognition
The API Voice recognition is part of voice services Cognitive Azure Services

 Text Analysis
Easily evaluate the feelings and issues to understand what customers want

 Vocal synthesis
Convert the speech to text to create more natural interfaces accessible

 Text translation Text Translator


Easily perform an automatic translation with a simple REST API call

 video Indexer
Pull insights of your videos

 Identity

 Azure Active Directory


Synchronize local directories and enable SSO

 Azure Information Protection


Maximize the protection of sensitive information, anywhere and continually

 Azure Active Directory Domain Services


Join virtual machines Azure for a domain without domain controller

37
 Azure Active Directory B2C
Identity management and access of consumers to the cloud

 Integration

 Event Grid
Enjoy reliable delivery of large-scale event

 Logic Apps
Automate access your data and use of these in different clouds without writing code

 API Management
Publish API safely and on a large scale for developers, partners and employees

 Bus Service
Connect to private and public cloud environments

 Internet of Things

 Azure Functions
Treat events with code serverless

 Azure IoT Hub


Connect, monitor and manage billions of resources IoT

 Azure IoT Edge


Extend intelligence and analytics to cloud perimeter devices

 Azure IoT Central


Experience the simplicity of SaaS for the IoT, even without any knowledge of cloud

38
 Accelerators Azure IoT solution
Create fully customized solutions using templates for common scenarios IoT

 Azure Sphere
Connect securely powered devices microcontrollers (MCUs) silicon Cloud

 Azure Time Series Insights


Explore and analyze time-series data from IoT devices

 Azure Maps
The API simple and secure geolocation provide geospatial context to data

 Event Grid
Enjoy reliable delivery of large-scale event

 Windows 10 IoT Core Services


Long-term support for the operating system and services to manage service updates and assess the
integrity of devices

 Azure Machine Learning Service


Submit artificial intelligence to all with a platform from end to end, scalable and approved which
includes testing and management models.

 Machine Learning Studio


Generate easily deploy Secure management and predictive analytics solutions

 Azure Stream Analytics


Real-time processing of data streams from millions of IoT devices

 Logic Apps
Automate access your data and use of these in different clouds without writing code

 Hubs notification
Send push notifications to any platform from a main application

39
 Azure Cosmos DB
Multi-database, globally distributed and available at any scale

 API Management
Publish API safely and on a large scale for developers, partners and employees

 Azure Digital Twins


Develop IoT spatial intelligence solutions for new generation

 Migration

 Azure Site Recovery


Your business never stops thanks to the integrated disaster recovery service

 Expenses management
Optimize your spending on cloud while maximizing the potential of the cloud

 Azure database migration service


Simplify the migration of local databases in the cloud

 Azure Migrate
Detect, assess, size and easily migrate your local virtual machines to Azure

 Data Box
secure and consolidated Appliance for Azure data transfer

 Networking
 Content Distribution Network
Ensure the distribution of reliable and secure content with broad general

 ExpressRoute
private network dedicated fiber optic Azure

40
 Azure DNS
Host your DNS domain in Azure

 Virtual Network
Getting private network service with possibility of connection to local data centers

 Traffic Manager
Route the incoming traffic ensuring high performance and availability

 Load Balancer
Provide high availability and optimum network performance for your applications

 VPN gateway
Establish a secure connectivity between local

 Gateway Application
Create secure front-end Web servers, scalable and high availability in Azure

 Azure DDoS Protection


Protect your applications against DDoS (Distributed Denial of Service, Distributed Denial of Service)

 Network Watcher
diagnostic solution and network performance monitoring

 Azure Firewall
Features native firewall, with high availability and integrated cloud unlimited scalability, and
maintenance-free

 Virtual WAN
Optimize and automate connectivity between branches via Azure

 Azure Front Door Service


Scalable delivery point improvement in terms of global security, web based applications Microservice

41
 Mobile

 App Service
Quickly create powerful cloud applications for the web and mobile devices

 Azure Maps
The API simple and secure geolocation provide geospatial context to data

 Hubs notification
Send push notifications to any platform from a main application

 Web Apps
Quickly create and deploy across critical web applications

 mobile apps
Create and host the main application for any mobile

 Apps APIs
Generate and easily using cloud API

 Mobile Azure application


Stay connected to your Azure resources anywhere, anytime

 Visual Studio Application Center


Build, test, publish and follow continuous applications

 Xamarin
Create faster cloud mobile applications

 Web App for Containers


Easily deploy and run containerized web applications that evolve with your business

42
 development tools

 Visual Studio
powerful and flexible environment for developing applications in the cloud

 Visual Studio Code


A powerful and lightweight code editor for cloud development

 Software Development Kits (SDKs)


Recover Software Development Kits (SDKs) and command-line tools you need

 Azure DevOps
Services enables teams to share code, track tasks and deliver software

 CLI
Create, deploy, diagnose and manage scalable multiplatform applications and services

 Azure Pipelines
Create, test and deploy continuously on the platform and the cloud of your choice

 Azure Lab Services


Configure laboratories for classes, testing, development and testing activities, and other scenarios

 Azure Dev / Test Lab


Quickly create environments with templates and reusable artifacts

 Integration of development tools


Use normal development tools, including Eclipse, IntelliJ and Maven with Azure

 security
 Azure Active Directory
Synchronize local directories and enable SSO

43
 Azure Information Protection
Maximize the protection of sensitive information, anywhere and continually

 Azure Active Directory Domain Services


Join virtual machines Azure for a domain without domain controller

 Key Vault
Keep keys and other secrets and keep in control

 Security Center
Unify security management features and enable advanced protection against threats across hybrid cloud
workloads

 hardware security module (HSM) dedicated Azure


Manage hardware security modules that you use in the cloud

 VPN gateway
Establish a secure connectivity between local

 Gateway Application
Create secure front-end Web servers, scalable and high availability in Azure

 Azure DDoS Protection


Protect your applications against DDoS (Distributed Denial of Service, Distributed Denial of Service)

 Storage
 Storage
Cloud storage, durable, highly available and scalable

 Azure backup
Simplify data protection and protect against rançongiciels

 StorSimple
Lower costs than with a hybrid cloud storage solution for businesses

44
 Azure Data Storage Lake
Functionality Data Lake secure, massively scalable Azure Blob Storage based on

 Blob storage
Object-based storage REST for unstructured data

 Disk Storage
persistent disks and secure options that support virtual machines

 managed disks
Persistent storage, secure disk for virtual machines Azure

 Queue Storage
Changing your applications depending on traffic

 File storage
File shares using the SMB 3.0 standard

 Data Box
secure and consolidated Appliance for Azure data transfer

 Avere vFXT Azure


Run high-performance workloads based on files in the cloud

 Storage Explorer
View and interact with Azure storage resources

 archive storage
Best price sector for storing infrequently used data

 Azure NetApp Files


Sharing files with powerful file management system for hybrid network (NFS)

45
 Web

 App Service
Quickly create powerful cloud applications for the web and mobile devices

 Content Distribution Network


Ensure the distribution of reliable and secure content with broad general

 Search Azure
Search as fully managed services

 Hubs notification
Send push notifications to any platform from a main application

 API Management
Publish API safely and on a large scale for developers, partners and employees

 Web Apps
Quickly create and deploy across critical web applications

 mobile apps
Create and host the main application for any mobile

 Apps APIs
Generate and easily using cloud API

 Web App for Containers


Easily deploy and run containerized web applications that evolve with your business

 Azure SignalR Service


Add realtime web features with ease
BlueMix

46
Data base

Data Store for memcached


Experimental
Data Store for memcached object caching is a cloud services consistent with the memcached API.

mongodb
Experimental
Obsolete
This Service is no longer available. Please search for Compose services in the catalog INSTEAD hand.

mysql
Experimental
Obsolete
This Service is no longer available. Please search for Compose services in the catalog INSTEAD hand.

postgresql
Experimental
Obsolete
This Service is no longer available. Please search for Compose services in the catalog INSTEAD hand.

XPages NoSQL Database


Experimental
Create an IBM Notes NSF database to store your data Domino XPages.
Developer Tools

Accessibility Automated Test


Experimental
Integrate automated accessibility auditing and reporting capabilities into your deployment DevOps
processes.

Digital Content Checker


Experimental
Automated accessibility verification of HTML and EPUB documents.
Security and identity

IBM Identity Mixer


Experimental
IBM Identity Mixer Bluemix Service
Starter Kits

XPages Web Starter


Experimental

47
Create a sample implementation That Demonstrates how to use an IBM XPages Application connected
to an IBM XPages NoSQL Database Service.
Web and Mobile

rabbitmq
Experimental
Obsolete
This Service is no longer available. Please search for Compose services in the catalog INSTEAD hand.

repeat
Experimental
Obsolete
This Service is no longer available. Please search for Compose services in the catalog INSTEAD hand.
Web and Application
Cost and Asset Management
Experimental
Hybrid Cloud Cost and Asset Management service broker

Historical Instrument Analytics


Experimental
Leverage IBM Algorithmics sophisticated financial models to price and evaluate-financial securities for
historical dates.

Analytics instrument
Experimental
Leverage IBM Algorithmics sophisticated financial models to price and compute analytics are financial
securities.

Investment Portfolio
Experimental
Maintain a record of your investment portfolios through time.

portfolio Optimization
Experimental
Construct gold rebalance investment portfolios is based investor goals, mandates, and preferences.

Predictive Market Scenarios


Experimental
Create conditional scenarios to model how, Given has changed to a subset of the factors Broader set of
market factors are expected to change.

Real-Time Payments
Experimental
Manage participants, tokens and containers, and initiate and Receive real time payments.

48
Simulated Historical Instrument Analytics
Experimental
Leverage IBM Algorithmics sophisticated financial models to price and compute analytics are financial
securities for a historical day, under a scenario.

Simulated Instrument Analytics


Experimental
Leverage IBM Algorithmics sophisticated financial models to price and compute analytics are financial
securities under Given a scenario.

Google Cloud Platform

cloud computing

With global Google Cloud computing, create and evolve faster than ever before on one of the most
important private networks and the fastest in the world.

49
VM adaptable and high performance

App Engine

PaaS service for applications and backends

Kubernetes Engine

50
Run container applications

GKE On-Prem ALPHA

Prepare your applications to the cloud and migrate them at your own pace

Cloud Functions

serverless computing platform based on events

Knative

51
Components for creating software Kubernetes native and modern cloud-based

VM protected BETA

Virtual machines on reinforced GCP

Container security

Secure your containers environment on GCP

Kubernetes Engine

52
fluid and scalability on demand

arrow_forward Databases

Cloud SQL

MySQL and PostgreSQL database service

Cloud Bigtable

53
NoSQL database service oriented column

Spanner cloud

strategic and scalable relational database service

Cloud Datastore

Document Database Service NoSQL

Cloud MemoryStore BETA

54
completely managed memory data storage service

Cloud Firestore BETA

Store of Web and mobile applications data globally

Firebase Realtime Database

Storing and synchronizing real-time data

Spanner cloud

55
strategic and scalable relational database service

WATCH THE VIDEOLEARN MOREarrow_forward

Management Tools arrow_forwardStackdriver

Monitoring and management for services, containers, applications and infrastructure

monitoring

Monitoring for applications on AWS and GCP

56
Monitoring Service EARLY ACCESS

Stackdriver Service Monitoring for Istio and Google App Engine

Logging

Logging for applications on AWS and GCP

error Reporting

Identifies application errors and helps you analyze

57
Trace

Identify bottlenecks affecting production performance

Debugger

Consider the behavior of your code into production

Profile

low impact profiling processor and memory segments to reduce latency

58
transparent Service Level Indicators

Watch for Google Cloud services and their effect on your workloads

Cloud Deployment Manager

Manage your cloud resources with simple templates to use

Cloud Console

The integrated management console GCP

59
Cloud Shell

Management command line on any browser

Cloud Mobile App

Manage GCP services from your mobile device

Cloud Billing API

Automate invoice management on GCP

60
Cloud API

Programming Interfaces for all GCP services

API Platform and Ecosystems

Platform Apigee API

Develop, secure, deploy and monitor your API wherever they are

Apigee Healthcare APIX

61
Accelerate the development of new digital services based API FHIR

Apigee Open Banking APIX

Accelerate the opening of banking services and the application of Directive PSD2

Apigee Sense

Intelligent detection behaviors to protect APIs attacks

Analytics API

62
Insights on operational and business metrics for API

API Monetization

flexible and easy to use solution to take advantage of APIs

Cloud Endpoints

Develop, deploy and manage APIs on GCP

Developer Portal

63
Put a key platform in hand and self-service available to developers and API teams

Cloud API Healthcare

exploitable medical insights managed through secure API

arrow_forward storageCloud Storage

Object Storage with setting device hides globally

persistent Disk

64
blocks of storage for VM instances

Cloud Storage for Firebase

Store and stream content with ease

Cloud Filestore BETA

high performance file storage

Migration

65
Data transfer

command line tools for developers to transfer data over the network

Transfer Appliance

Rack Storage Server to send large amounts of data to Google Cloud

Transfer Service Cloud Storage

Transfer data between different storage cloud services like AWS S3 and Google Cloud Storage

66
BigQuery data transfer service

fully managed data import for Google BigQuery

Networking arrow_forwardVirtual Private Cloud (VPC)

Network VPC for your GCP resources

Cloud Load Balancing

Balancing adaptable load and high performance

67
Cloud Armor

Protect your services against DoS attacks and Web

Cloud CDN

content delivery on the global Google network

Cloud Interconnect

Connect directly to the device network GCP

68
Cloud DNS

Reliable DNS, resilient and low latency

Network service levels

Optimize performance and savings in your network

telemetry network

depth telemetry network to ensure the security of your services

69
Developer Tools arrow_forwardCloud SDK

CLI for PCM products and services

container Registry

Store, manage and secure your images Docker containers

Cloud Build

Develop, test, and deploy streaming

70
Cloud Source Repositories

A unique place where your team can simultaneously store, manage and monitor the code

Cloud Tools for IntelliJ

Debug cloud applications into production in IntelliJ

Cloud Tools for PowerShell

Complete control of the cloud from Windows PowerShell

71
Cloud Tools for Visual Studio

Deploy Visual Studio applications GCP

Cloud Tools for Eclipse

Deploy Eclipse projects GCP

Gradle Plugin App Engine

Use Gradle to your App Engine projects

72
Maven Plug-in App Engine

Use Maven to your App Engine projects

Cloud Test Lab

test infrastructure on demand for Android apps

Firebase Crashlytics

Prioritize and fix stability problems faster

73
Monitoring Kubernetes with Stackdriver

Include metrics, newspapers, events and metadata from Kubernetes and Prometheus

Internet of Things

Cloud IoT Core

Connect and manage your devices safely

74
TPU Edge EARLY ACCESS

custom ASIC designed to perform inferences periphery

Cloud IoT Edge ALPHA

Make the AI Google features available on the outskirts

multimedia solutions

Anvato

75
Read live video and on demand on any device

Zync Render

Offer your reports directly from your 3D modeling tools, quickly and inexpensively

CLOUD COMPUTING RELATED PRODUCTS

Apigee

API control and visibility throughout your organization and across multiple clouds

76
Firebase

Create better mobile applications, improve the quality of your applications and grow your business

data analysis and machine learning

77
Data AnalysisBigQuery

Data Warehouse fully managed and highly adaptable with integrated ML

Cloud Dataflow

real time processing data by batch and flow

Cloud Dataproc

78
Hadoop Spark and managed service

Cloud Datalab

Explore, analyze and visualize large datasets

Cloud Dataprep

cloud data service to explore, clean and prepare data for analysis

Cloud Pub / Sub

79
Ingest event stream wherever you are, at any scale

Cloud Compose

A fully managed workflow orchestration service designed from Apache Airflow

Genomics

Amplify your research with Google Genomics

Analytics Suite 360 *

80
business analysis for optimized customer experience

Google Data Studio * BETA

Optimize your strategic decisions thanks to an advanced data visualization

Firebase Performance Monitoring

Analyze the performance of your application

BigQuery

81
Data Warehouse fully managed and highly adaptable with integrated ML

WATCH THE VIDEOLEARN MOREarrow_forward

AI and machine learning arrow_forwardCloud AutoML BETA

Train easily customized ML models with high quality

Cloud TPU

Train and run ML models faster than ever

82
Cloud Machine Learning Engine

Create high-quality models and deploy them into production

Talent Cloud Solution

Put the AI to serve your recruitment needs

Dialogflow Enterprise Edition

Create conversations interfaces on multiple devices and platforms

83
Cloud Natural Language

Get information from unstructured text

Cloud Speech-to-Text

audio transcriptions based on machine learning

Cloud Text-to-Speech

Speech based on machine learning

84
Cloud Translation

instantly translate text from one language to another

Cloud Vision

Get insights from images through machine learning

Cloud Video Intelligence

Extract metadata from videos

85
Firebase Predictions BETA

Define dynamic groups of users based on an expected behavior

Deep Learning Cloud VM Image BETA

VM preconfigured for deep learning applications

Cloud AutoML BETA

Train high quality custom models ML, effortlessly and regardless of your level of knowledge

86
WATCH THE VIDEOLEARN MOREarrow_forward

Identity and Security

Protect the identity of your users and help them to achieve your policy, regulatory and business goals
using Google Cloud security solutions.

Cloud Identity

Easily manage user identities, devices and applications from a single console

security arrow_forward

87
Cloud IAM

precise management of authentication and access

Firebase Authentication

Login Free and easy cross-platform

Cloud Identity-Aware Proxy

Secure access to applications deployed on GCP through identity information and context

88
Cloud API Data Loss Prevention

Find and delete sensitive data

Key Security Enforcement

Require the use of security keys to avoid phishing

Titan Security Key

Prevent account hacks related to phishing attacks

89
Cloud HSM ALPHA

Protect encryption keys using a security module hardware fully managed service

VPC Service Controls ALPHA

Define secure access zones for sensitive data in the Google Cloud Platform services

Cloud Key Management Service

Manage encryption keys on GCP

90
Resource Manager

Manage hierarchically resources GCP

Cloud Security Command Center ALPHA

comprehensive platform for managing security and risk data for GCP

Cloud Security Scanner

Automatically analyze your App Engine applications

91
Access Transparency

Get visibility into your cloud provider through near real-time logs

binary authorization ALPHA

Kubernetes Engine

Collaboration and Productivity

Réimaginez your working methods. The G Suite tools used by the teams themselves allow them to
collaborate, create and innovate together iterations more quickly.

92
Suite G

An integrated suite of secure applications for collaboration and productivity, and native cloud provided
by Google IA

Communication

Gmail

The intelligent and secure messaging for modern enterprises

diary

93
online calendars designed for teamwork

Hangouts Chat

Secure Messaging team

Hangouts Meet

Videoconferencing made simple

Google+

94
secure enterprise social network

Control

admin

G Management Suite Enterprise

Vault

Archiving and eDiscovery for email, files and discussions

95
Managing Mobile Devices

Mobile device management for Android, iOS and Windows, among others

Creation

Docs

collaborative real-time editing

Sheets

96
rapid creation and online advanced spreadsheets

Slides

collaborative creation of effective presentations

Forms

Creating surveys and forms with ease

Sites

97
easy to create team sites

Keep

Gather ideas and stay organized

Access

Drive

File sharing and cloud storage, safe

98
Cloud Search

performing research in G Suite

RELATED PRODUCTS RELATED TO PRODUCTIVITY

Hire Google

Recruit faster. Hire is a collaborative recruiting application that integrates seamlessly with G Suite

Google Maps Platform

99
Create immersive experiences locating and developing your business with comprehensive real-time
data

Maps

Give the world your users with custom maps and Street View

routes

Help users reach their destination with complete data and real-time traffic information

100
Places

Help users discover the world with detailed information on over 100 million points of interest

race Sharing

Integrate Google Maps to your application taxi services / Hybrid to provide reliable routes in real time

Games

Create immersive games and true to life, anywhere in the world

101
Resource Tracking

Take advantage of precise global positioning data and updated in real time for your fleet, your
employees and your devices

 Games

Create immersive games and true to life with updated global data

Browser, hardware and OS

102
Specially designed for mobile workers and international, browser, meeting hardware tools, devices and
the Google Cloud OS help your team stay connected.

Chrome Enterprise

Easily manage Chromebooks through the Chrome OS and the Chrome Browser

Android Enterprise

Deploy safely smart devices, operating systems and business applications

Jamboard

103
A collaborative digital whiteboard to visualize your ideas

Hangouts Meet Equipment

An efficient and fast video conferencing system for your meeting rooms

104
What is Amazon Athena: a complete overview
Amazon Athena is probably the most promising of the services announced last week in
Las Vegas. In fact, big data was one of the main topics discussed at re:Invent 2016, together
with AI and IoT. We gathered a lot of information on Athena at the special session led by
Rahul Pathak, general manager of Amazon EMR at AWS. In this post, I will cover Athena’s
main features, use cases, and pricing details.

What is Amazon Athena? It is an interactive query service that makes it easy to directly
analyze data on Amazon S3 using standard SQL. It means that you can store structured data
on S3 and query that data as you’d do with an SQL database. Athena is serverless,
meaning that there is no infrastructure to manage, no setup, servers, or data warehouses.
The power of S3 storage is fully unleashed by the new Athena query engine without the need
for maintenance. No infrastructure or administration is required: You can just create a table,
load some data, and start querying.

105
As mentioned during the session, Athena complements Amazon Redshift and Amazon
EMR.

Amazon Athena Features


Athena is backed by Presto, an open source distributed SQL query engine that allows
you to run interactive analytic queries against data sources of all sizes, ranging from
gigabytes to petabytes. Create Table statements or DDL (Data Definition Language) written in
Apache Hive, which is meant to facilitate reading, writing, and managing large and distributed
datasets. Hive supports SQL, but also allows concepts such external tables and data
partitioning. Your metadata—such as table definitions, column names, etc.—is stored in the
Athena metadata store.
As with any standard DBMS (Database Management System), Athena supports complex
joins, nested queries, and window functions. Complex data types, such as arrays and
struts, are also supported. Partitioning is easy to achieve by any key, including date and

106
time custom keys. Of course, you can connect to Athena with your favourite SQL client.
You can store data in the form of objects with several file formats:

Text files, CSV, war logs

Apache web logs

JSON

Compressed files

Columnar formats, such as Apache Parquet or Apache ORC

Eventually, you may want to use Hive CTAS or Spark to convert data to ORC and PARQUET
formats.

As soon as you perform a query you will obtain a data stream directly from Amazon S3, just
as if you were querying a real SQL database. Queries can be executed both through APIs or

107
from the AWS Console. By using the AWS Console, you will also get the query running time
and the amount of data scanned, in bytes.

With Amazon Athena, you won’t have to worry about scaling, performance, and
maintenance. You will have enough compute resources to get fast, interactive query
performance. Athena will automatically execute queries in parallel over petabytes of data.
Therefore, most results will come back within seconds. This is made possible because Athena
uses warm compute pools across multiple Availability Zones.

As Rahul Pathak pointed out, Amazon Athena is really fast:

Athena is tuned for performance.

Queries are automatically parallelized.

You can get a results to stream directly from the console.

You can store query results in Amazon S3.

In my personal opinion, performance is still an open concern, as no benchmarks for big


datasets have been publicly released, although we got very interesting performance
results during the full session. The presenter used Apache Parquet format and, with just
20 lines of PySpark code running on EMR, we converted 1 TB of textual data into 130 GB
of Apache Parquet data. This approach also optimized space occupation and query time,
resulting in much lower costs.

108
Finally, the built-in integration with Amazon QuickSight allows you to visualize your data.

Amazon Athena Use Cases


During the session, Rahul Pathak presented two common use cases where Athena could be
a game changer:

Log storage and analysis

Data warehouse for events

In such scenarios, the need to store gigabytes or petabytes of structured data can be a
real problem. Accessing that data in a fast, easy, and secure way is even more difficult,
painful, and time-consuming. Athena is focused on solving these problems by mixing together
the power of Amazon S3 storage and the SQL query language. This allows you to operate on

109
your data easily and without worrying about scaling. Indeed, you will get results within
seconds, even on very large datasets.

What is Amazon Athena: pricing


Athena’s pricing is very simple: You pay only for the queries you run and you will be charged
$5 per TB of scanned data from Amazon S3.

DDL statements (CREATE, ALTER, DROP), partitioning queries, and failed queries are
completely free. If you cancel a query, you will be charged only for the scanned data up to
that point. Of course, you can reduce costs by using compression, columnar formats,
and partitions. With such techniques, Athena will have to scan fewer data from Amazon S3.

In practice, there is no charge directly related to computation itself, so you can always
estimate the total cost purely based on the amount of data that you need to work with.

What is Amazon CloudSearch?


Amazon CloudSearch is a simple, scalable, and reliable managed search service that operates within
the Amazon Web Services cloud environment. With this search service, users will be able to create a
search domain or solution, configure it to define how it will index searchable data and process search
requests from websites and applications, and automatically scale it up or down based on the amount of
data being indexed and the amount of search traffic being handled by the search domain or solution.
Setting up a search domain or solution using Amazon CloudSearch is a breeze. When they create a
search domain or solution, the search service also helps them efficiently provision all the needed
resources for it to operate well which include data, software, and hardware. They can also configure the
domain easily using other tools from Amazon Web Services such as the AWS Management Console,
AWS CLI, and AWS SDKs. As they configure the search domain, they can customize indexing options,
define text processing options, and specify the number of search instance replicas and search index
partitions that the search engine will use as a basis when it scales the domain.
In addition, Amazon CloudSearch has a feature called Multi-Availability Zone option. This feature
allows users to deploy their search domain on a separate location within the same region so that when a
service disruption happens in the original location of the domain, search requests can still be

110
processed. When the Multi-AZ option is enabled, they will have two availability zones for their domain
and the required resources, updates, and instances are provided to the domain in both availability zones.

Overview of Amazon CloudSearch Benefits


Deliver A Low Latency And High Throughput Search Performance
Amazon CloudSearch is designed to help users improve the search capabilities of their applications and
services. To do this, the managed search service allows them to set up a search domain or solution that
can deliver a search performance with low latency and can handle high throughput.
Set Up And Automatically Provision A Search Domain
The managed search service ensures that whatever search domain users set up using its system must be
provisioned with the needed resources such as the data that the search domain will index and make
searchable and the software and hardware components that will make the domain function properly.
With Amazon CloudSearch, users no longer have to worry about provisioning their search domain. This
is because once they upload their data, the service will automatically provision the required resources
taking into account whatever configurations they defined for the domain.
Handle Indexing And Search Request Processing Using Search Instances
A search domain also uses search instances to index data and process search requests. These are server
instances that utilize a fixed amount of RAM and CPU resources, and a search domain can have one or
more search instances depending on how large the amount data or the number of documents it needs to
index and how high the volume of search traffic and search requests it requires to handle. Thus, search
instances are classified based on their size or capacity to index data and process search requests.
Scale Up Your Domain As Your Search Index Grows
Search instances are very important because Amazon CloudSearch relies heavily on them every time it
scales a search domain up or down. In fact, this one of the most powerful capabilities of the search
service. Amazon CloudSearch can automatically scale up or down a search domain based on the
amount of data contained in the search index. This means if users uploaded a larger amount of data, the
search index will grow. As a result, the search service scales the domain up by using another type of
search instance which is large enough to accommodate the data and process their indexing. Suppose the
search index continues to grow wherein the amount of data already exceeds the capacity of the current
search instance, it will again scale up the domain using a much larger search instance.
Replicate The Search Instances And Partition The Search Index
But what if the largest available search instance can no longer handle the amount of data in the search
index? Users don’t have to worry about that. Amazon CloudSearch has features called replication and
partitioning. It can replicate search instances in the event that the largest search instance can’t process
anymore the amount of data in the search index. This feature works in conjunction with partitioning.
Partitioning is the process of splitting the search index into separate pieces or partitions. The higher the
number of partitions the higher also is the number of search instances that need to be replicated and

111
deployed. In case the amount of data in the search index is reduced, this time Amazon CloudSearch
scales down the domain either by using a smaller search instance type or reducing the number of
partitions.
Manage Search Traffic
In addition, Amazon CloudSearch also scales up or down a search domain based on the amount of
search traffic. So if there is a large volume of search requests or queries, thereby increasing the amount
of search traffic, the search service will automatically replicate the search instance used for processing
those requests and partition the search index. In this example, there is one search instance replica and
one search index partition. When the instance replica can no longer handle the search traffic, the
partition will be replicated as well as the search instance, increasing the number of partitions and
instances in the domain.
Search Domain Configuration Made Easy
Amazon CloudSearch’s architecture is comprised of three services, namely configuration service,
document service, and search service. These services define how users can interact with Amazon
CloudSearch. The configure service permits them to configure their search domain and one way of
doing that is by defining how data are being indexed. They can set up different indexing options that
enable them to map their data and indicate what data can be searched and retrieved from the search
index.
Configure Scaling Options
The configuration service also allows users to customize the scaling of their domain. Here, they can
prescale the search domain by specifying the type of search instance that will be used as well how
many times instance replication and index partition will happen whenever they are importing a large
volume of data or expecting a spike in the search traffic.
Control The Ranking Of Search Results
With Amazon CloudSearch, the ranking of search results can be controlled. Through the aid of
numerical expressions, users will be able to define factors and even combine them, and associate scores
with them to ensure that the most relevant documents and data are ranked higher in the search results.
For instance, they can set up a numerical expression for calculating rank scores based on how often a
term within a document is being searched and how popular the document is.
Document Service
Meanwhile, if users want to modify the data in their search index, they can take advantage of the
document service. This service allows them to import data into their domain. Every data they send to
the domain is stored and represented as a document which has a unique ID and index fields. The index
fields organize all the specific data that will be indexed and made accessible from the search results.
Innovative Ways To Process Search Queries
Amazon CloudSearch’s architecture makes it possible to process different types and forms of search
queries and return search results in various ways. It can drill down into specific data within index
fields, generate facet information, support complex Boolean searches, and parse search queries.

112
Overview of Amazon CloudSearch Features
 Create a Search Domain
 Automated Provisioning and Maintenance of Search Domain
 Search Instances
 Scale Based on Search Index Data and Search Traffic
 Replication of Search Instances
 Partitioning of Search Index
 Multi-Availablity Zone Option
 Configuration Service
 Indexing Options
 Index Fields
 Text Analysis Schemes
 Availability Options
 Scaling Options
 Suggesters
 Rank Search Results through Numeric Expressions
 Document Service
 Change Searchable Data
 Search Service
 Unique Search HTTP Endpoint for every Domain
 Rich Query Language
 Search Features
 Free Text, Boolean, and Faceted Search
 Field Weighting
 Geospatial Search
 Support for 34 Languages

113
If you are considering Amazon CloudSearch it might also be beneficial to check out other
subcategories of Best Site Search Solutions gathered in our base of B2B software reviews.
Organizations have unique needs and requirements and no software application can be ideal in such a
condition. It is useless to try to find an ideal out-of-the-box software app that meets all your business
needs. The wise thing to do would be to customize the solution for your unique needs, employee skill
levels, budget, and other aspects. For these reasons, do not hasten and subscribe to well-publicized
popular solutions. Though these may be widely used, they may not be the perfect fit for your unique
requirements. Do your research, investigate each short-listed platform in detail, read a few Amazon
CloudSearch reviews, call the vendor for explanations, and finally settle for the application that
provides what you need.

How Much Does Amazon CloudSearch Cost?


Amazon CloudSearch Pricing Plans:
Free trial
US East (N. Virigina) search.m1.small
$0.059 per hour
US East (N. Virigina) search.m3.medium
$0.094 per hour
US East (N. Virigina) search.m3.large
$0.188 per hour
US East (N. Virigina) search.m3.xlarge
$0.376 per hour
US East (N. Virigina) search.m3.2xlarge
$0.752 per hour
US West (Northern California) search.m1.small
$0.063 per hour
US West (Northern California) search.m3.medium
$0.104 per hour
US West (Northern California) search.m3.large
$0.208 per hour
US West (Northern California) search.m3.xlarge

114
$0.416 per hour
US West (Northern California) search.m3.2xlarge
$0.832 per hour
US West (Oregon) search.m1.small
$0.059 per hour
US West (Oregon) search.m3.medium
$0.094 per hour
US West (Oregon) search.m3.large
$0.188 per hour
US West (Oregon) search.m3.xlarge
$0.376 per hour
US West (Oregon) search.m3.2xlarge
$0.752 per hour
Asia Pacific (Seoul) search.m4.large
$0.255 per hour
Asia Pacific (Seoul) search.m4.xlarge
$0.511 per hour
Asia Pacific (Seoul) search.m4.2xlarge
$1.023 per hour
Asia Pacific (Singapore) search.m1.small
$0.078 per hour
Asia Pacific (Singapore) search.m3.medium
$0.132 per hour
Asia Pacific (Singapore) search.m3.large
$0.264 per hour
Asia Pacific (Singapore) search.m3.xlarge
$0.528 per hour
Asia Pacific (Singapore) search.m3.2xlarge
$1.056 per hour

115
Asia Pacific (Sydney) search.m1.small
$0.078 per hour
Asia Pacific (Sydney) search.m3.medium
$0.132 per hour
Asia Pacific (Sydney) search.m3.large
$0.264 per hour
Asia Pacific (Sydney) search.m3.xlarge
$0.528 per hour
Asia Pacific (Sydney) search.m3.2xlarge
$1.056 per hour
Asia Pacific (Tokyo) search.m1.small
$0.082 per hour
Asia Pacific (Tokyo) search.m3.medium
$0.136 per hour
Asia Pacific (Tokyo) search.m3.large
$0.272 per hour
Asia Pacific (Tokyo) search.m3.xlarge
$0.544 per hour
Asia Pacific (Tokyo) search.m3.2xlarge
$1.088 per hour
EU (Franfurt) search.m3.medium
$0.112 per hour
EU (Franfurt) search.m3.large
$0.224 per hour
EU (Franfurt) search.m3.xlarge
$0.448 per hour
EU (Franfurt) search.m3.2xlarge
$0.896 per hour
EU (Ireland) search.m1.small

116
$0.063 per hour
EU (Ireland) search.m3.medium
$0.104 per hour
EU (Ireland) search.m3.large
$0.208 per hour
EU (Ireland) search.m3.xlarge
$0.416 per hour
EU (Ireland) search.m3.2xlarge
$0.832 per hour
South America (Sao Paolo) search.m1.small
$0.078 per hour
South America (Sao Paolo) search.m3.medium
$0.128 per hour
South America (Sao Paolo) search.m3.large
$0.256 per hour
South America (Sao Paolo) search.m3.xlarge
$0.512 per hour
South America (Sao Paolo) search.m3.2xlarge
$1.024 per hour
Batch Uploads
$0.10 per 1,000 Batch
IndexDocuments Requests
$0.98 per GB
US East (N. Virigina) Data Transfer In
$0.000 per GB
US East (N. Virigina) Data Transfer Out
$0.090 per GB/First 10TB/ month
US West (Northern California) Data Transfer In
$0.000 per GB

117
US West (Northern California) Data Transfer Out
$0.090 per GB/First 10TB/ month
US West (Oregon) Data Transfer In
$0.000 per GB
US West (Oregon) Data Transfer Out
$0.090 per GB/First 10TB/ month
Asia Pacific (Seoul) Data Transfer In
$0.000 per GB
Asia Pacific (Seoul) Data Transfer Out
$0.126 per GB/First 10TB/ month
Asia Pacific (Singapore) Data Transfer In
$0.000 per GB
Asia Pacific (Singapore) Data Transfer Out
$0.120 per GB/First 10TB/ month
Asia Pacific (Sydney) Data Transfer In
$0.000 per GB
Asia Pacific (Sydney) Data Transfer Out
$0.140 per GB/First 10TB/ month
Asia Pacific (Tokyo) Data Transfer In
$0.000 per GB
Asia Pacific (Tokyo) Data Transfer Out
$0.140 per GB/First 10TB/ month
EU (Frankfurt) Data Transfer In
$0.000 per GB
EU (Frankfurt) Data Transfer Out
$0.090 per GB/First 10TB/ month
EU (Ireland) Data Transfer In
$0.000 per GB
EU (Ireland) Data Transfer Out

118
$0.090 per GB/First 10TB/ month
South America (Sao Paolo) Data Transfer In
$0.000 per GB
South America (Sao Paolo) Data Transfer Out
$0.250 per GB/First 10TB/ month
With no set-up fees and upfront commitments, Amazon CloudSearch doesn’t offer an enterprise pricing
plan. Instead, it is using a pay-as-you-go pricing method wherein you only need to pay for your search
instance usage which pricing is calculated on an hourly basis, the total number of the batch documents
you uploaded to the search domain, the amount of data you stored in the search domain when you
explicitly made IndexDocuments requests/calls, and the amount of data you transferred in and
out Amazon CloudSearch.
In addition, all of the charges are billed on a monthly basis and vary depending on which region you
belong. Here are the details:
Search Instances
US East (N. Virigina)
 search.m1.small – $0.059 per hour
 search.m3.medium – $0.094 per hour
 search.m3.large – $0.188 per hour
 search.m3.xlarge -$0.376 per hour
 search.m3.2xlarge – $0.752 per hour
US West (Northern California)
 search.m1.small – $0.063 per hour
 search.m3.medium – $0.104 per hour
 search.m3.large – $0.208 per hour
 search.m3.xlarge – $0.416 per hour
 search.m3.2xlarge – $0.832 per hour
US West (Oregon)
 search.m1.small – $0.059 per hour
 search.m3.medium – $0.094 per hour
 search.m3.large – $0.188 per hour
 search.m3.xlarge – $0.376 per hour

119
 search.m3.2xlarge – $0.752 per hour
Asia Pacific (Seoul)
 search.m4.large – $0.255 per hour
 search.m4.xlarge – $0.511 per hour
 search.m4.2xlarge – $1.023 per hour
Asia Pacific (Singapore)
 search.m1.small – $0.078 per hour
 search.m3.medium – $0.132 per hour
 search.m3.large – $0.264 per hour
 search.m3.xlarge – $0.528 per hour
 search.m3.2xlarge – $1.056 per hour
Asia Pacific (Sydney)
 search.m1.small – $0.078 per hour
 search.m3.medium – $0.132 per hour
 search.m3.large – $0.264 per hour
 search.m3.xlarge – $0.528 per hour
 search.m3.2xlarge – $1.056 per hour
Asia Pacific (Tokyo)
 search.m1.small – $0.082 per hour
 search.m3.medium – $0.136 per hour
 search.m3.large – $0.272 per hour
 search.m3.xlarge – $0.544 per hour
 search.m3.2xlarge – $1.088 per hour
EU (Franfurt)
 search.m3.medium – $0.112 per hour
 search.m3.large – $0.224 per hour
 search.m3.xlarge – $0.448 per hour
 search.m3.2xlarge – $0.896 per hour

120
EU (Ireland)
 search.m1.small – $0.063 per hour
 search.m3.medium – $0.104 per hour
 search.m3.large – $0.208 per hour
 search.m3.xlarge – $0.416 per hour
 search.m3.2xlarge – $0.832 per hour
South America (Sao Paolo)
 search.m1.small – $0.078 per hour
 search.m3.medium – $0.128 per hour
 search.m3.large – $0.256 per hour
 search.m3.xlarge – $0.512 per hour
 search.m3.2xlarge – $1.024 per hour
Pricing is per instance-hour consumed for each search instance, from the time the instance is launched
until it is terminated. Each partial instance-hour consumed is billed as a full hour.
When you enable the Multi-AZ option for enhanced data durability and availability, Amazon
CloudSearch provisions and maintains additional search instances in a different Availability Zone.
Search traffic is distributed across all of the instances and the instances in either zone are capable of
handling the full load in the event of a service disruption. When you enable the Multi-AZ option, you
are charged for the additional search instance hours used at the regular rates for the applicable region.
Previous Generation Search Instances
Below are the prices for previous generation instances, which are only available for existing search
domains. All newly created domains will be provisioned with the higher performance newer generation
instance options.
US East (N. Virigina)
 search.m1.large – $0.236 per hour
 search.m2.xlarge – $0.306 per hour
 search.m2.2xlarge – $0.613 per hour
US West (Northern California)
 search.m1.large – $0.257 per hour
 search.m2.xlarge – $0.344 per hour
 search.m2.2xlarge – $0.688 per hour

121
US West (Oregon)
 search.m1.large – $0.236 per hour
 search.m2.xlarge – $0.306 per hour
 search.m2.2xlarge – $0.613 per hour
Asia Pacific (Singapore)
 search.m1.large – $0.315 per hour
 search.m2.xlarge – $0.370 per hour
 search.m2.2xlarge – $0.740 per hour
Asia Pacific (Sydney)
 search.m1.large – $0.315 per hour
 search.m2.xlarge – $0.370 per hour
 search.m2.2xlarge – $0.740 per hour
Asia Pacific (Tokyo)
 search.m1.large – $0.328 per hour
search.m2.xlarge – $0.359 per hour
search.m2.2xlarge – $0.719 per hour
EU (Ireland)
 search.m1.large – $0.257 per hour
 search.m2.xlarge – $0.344 per hour
 search.m2.2xlarge – $0.688 per hour
South America (Sao Paolo)
 search.m1.large – $0.315 per hour
 search.m2.xlarge – $0.404 per hour
 search.m2.2xlarge – $0.806 per hour
Pricing is per instance-hour consumed for each search instance, from the time the instance is launched
until it is terminated. Each partial instance-hour consumed is billed as a full hour.
When you enable the Multi-AZ option for enhanced data durability and availability, Amazon
CloudSearch provisions and maintains additional search instances in a different Availability Zone.
Search traffic is distributed across all of the instances and the instances in either zone are capable of
handling the full load in the event of a service disruption. When you enable the Multi-AZ option, you
are charged for the additional search instance hours used at the regular rates for the applicable region.

122
Batch Uploads
You are billed for the total number of document batches uploaded to your search domain. Uploaded
documents are automatically indexed.
 $0.10 per 1,000 Batch Upload Requests (the maximum size for each batch is 5 MB)
IndexDocuments Requests
When you make configuration changes to your index, for example by adding a field, you will need to
rebuild the index. To do this, you use the AWS Management Console, command line tools, AWS SDKs,
or APIs to issue an IndexDocuments request. The charge for this request is:
 $0.98 per GB of data stored in your search domain
Amazon CloudSearch may occasionally issue these calls for you. For example, as you add data to your
domain, Amazon CloudSearch may proactively rebuild your index to improve query performance. You
will not be charged in this case, and others, where you do not explicitly call IndexDocuments.
Data Transfer
The pricing below is based on data transferred “in” and “out” of Amazon CloudSearch.
US East (N. Virigina)
Data Transfer In
 All Data Transfer In – $0.000 per GB
Data Transfer Out
 First 10 TB / month – $0.090 per GB
 Next 40 TB / month – $0.085 per GB
 Next 100 TB / month – $0.070 per GB
US West (Northern California)
Data Transfer In
 All Data Transfer In – $0.000 per GB
Data Transfer Out
 First 10 TB / month – $0.090 per GB
 Next 40 TB / month – $0.085 per GB
 Next 100 TB / month – $0.070 per GB
US West (Oregon)
Data Transfer In

123
 All Data Transfer In – $0.000 per GB
Data Transfer Out
 First 10 TB / month – $0.090 per GB
 Next 40 TB / month – $0.085 per GB
 Next 100 TB / month- $0.070 per GB
Asia Pacific (Seoul)
Data Transfer In
 All Data Transfer In – $0.000 per GB
Data Transfer Out
 First 10 TB / month – $0.126 per GB
 Next 40 TB / month – $0.122 per GB
 Next 100 TB / month – $0.117 per GB
Asia Pacific (Singapore)
Data Transfer In
 All Data Transfer In – $0.000 per GB
Data Transfer Out
 First 10 TB / month- $0.120 per GB
 Next 40 TB / month – $0.085 per GB
 Next 100 TB / month – $0.082 per GB
Asia Pacific (Sydney)
Data Transfer In
 All Data Transfer In – $0.000 per GB
Data Transfer Out
 First 10 TB / month – $0.140 per GB
 Next 40 TB / month – $0.135 per GB
 Next 100 TB / month – $0.130 per GB
Asia Pacific (Tokyo)
Data Transfer In

124
 All Data Transfer In – $0.000 per GB
Data Transfer Out
 First 10 TB / month – $0.140 per GB
 Next 40 TB / month – $0.135 per GB
 Next 100 TB / month – $0.130 per GB
EU (Frankfurt)
Data Transfer In
 All Data Transfer In – $0.000 per GB
Data Transfer Out
 First 10 TB / month – $0.090 per GB
 Next 40 TB / month – $0.085 per GB
 Next 100 TB / month – $0.070 per GB
EU (Ireland)
Data Transfer In
 All Data Transfer In – $0.000 per GB
Data Transfer Out
 First 10 TB / month – $0.090 per GB
 Next 40 TB / month – $0.085 per GB
 Next 100 TB / month – $0.070 per GB
South America (Sao Paolo)
Data Transfer In
 All Data Transfer In – $0.000 per GB
Data Transfer Out
 First 10 TB / month $0.250 per GB
 Next 40 TB / month $0.230 per GB
 Next 100 TB / month $0.210 per GB
Data transferred between Amazon CloudSearch and AWS services in the same region is free.
Data transferred between Amazon CloudSearch and AWS services in different regions will be charged
as Internet Data Transfer on both sides of the transfer.

125
For traffic sent between Amazon CloudSearch and Amazon EC2 instances in the same region, you are
only charged for the Data Transfer in and out of the Amazon EC2 instances, and standard Amazon EC2
Regional Data Transfer charges apply. For additional information, please visit the official website of
AWS and check the EC2 pricing.
You can always see the resources you’re consuming in Amazon CloudSearch via the Account Activity
page on the AWS website, the AWS Management Console, CloudSearch command line tools, or
CloudSearch APIs.

Amazon EMR


Amazon Elastic MapReduce (EMR) is an Amazon Web Services (AWS) tool for big
data processing and analysis. Amazon EMR offers the expandable low-
configuration service as an easier alternative to running in-house cluster
computing.

Amazon EMR is based on Apache Hadoop, a Java-based programming framework that supports
the processing of large data sets in a distributed computing environment. MapReduce is a
software framework that allows developers to write programs that process massive amounts of
unstructured data in parallel across a distributed cluster of processors or stand-alone computers.
It was developed at Google for indexing web pages and replaced their original indexing algorithms
and heuristics in 2004.

Amazon EMR processes big data across a Hadoop cluster of virtual servers on
Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).

126
The elastic in EMR's name refers to its dynamic resizing ability, which allows it to
ramp up or reduce resource use depending on the demand at any given time.

Processing big data with Amazon EMR

Amazon EMR is used for data analysis in log analysis, web indexing, data
warehousing, machine learning, financial analysis, scientific simulation,
bioinformatics and more. EMR also supports workloads based on Apache Spark,
Presto and Apache HBase -- the latter of which integrates with Hive and Pig for
additional functionality.

SearchBlox for Amazon Elasticsearch Service is an enterprise search platform for the AWS Cloud thats
uses the Amazon Elasticsearch Service, the fully managed and scalable Elasticsearch service available
on Amazon Web Services (AWS). SearchBlox for Amazon Elasticsearch Service can crawl, index and
search content across multiple datasources including file systems, websites, databases and applications.

Architecture
This service consists of two types of SearchBlox servers that are available through the AWS
marketplace. The first is SearchBlox IndexServer. The SearchBlox IndexServer can crawl and index
content in over 40 document formats including PDFs, HTML and Microsoft Word, Excel, Powerpoint
directly into Amazon Elasticsearch Service. The second type of server is the SearchBlox

127
SearchServer. The SearchBlox SearchServer provides ready-to-use, fully customizable search front-
ends including faceted search for the indexes created by the SearchBlox IndexServer in the Amazon
Elasticsearch Service.

Setup
AWS Region
Please make sure to select the same AWS Region in all the steps mentioned below. For example, we
have chosen "us-east-1" for creating elasticsearch, SearchBlox IndexServer , SearchBlox SearchServer,
etc.

1. Create VPC
Create a VPC, which needs to be mentioned while creating a SearchBlox IndexServer at AWS
Marketplace.

128
2. Create KeyPair
Create a Key Pair, and store it safely to access your AWS instance.

129
SSH
Use the key pair to SSH to the AWS instance. If you are using Windows, use puttygen to convert the
pem file to ppk file. Use this ppk file to connect to the instance using putty.

3. Create IAM Role


Create an IAM role called SearchBlox_AmazonES with an AmazonESFullAccess Policy, as shown in
the screenshot. This role has to be configured after creating the SearchBlox IndexServer (and search
server, if available) instance.

130
131
132
4. Create AWS Elasticsearch Domain
1.Give the Domain name and select Elasticsearch version 5.1.
Elasticsearch Version
SearchBlox currently supports only Elasticsearch 5.1 on Amazon Elasticsearch Service.

133
2. Give the number of instances (between 1 and 20) and select the instance type as
c4.xlarge.elasticsearch.

3. The EBS Volume size can be set to 150GB or higher.

134
4. You can specify the start hour where Amazon AWS takes a snapshot of the cluster. Please
specify the UTC time in the field.

5. You can specify access to and from a specific domain, i.e., index and search servers, by giving
the private IPs of those servers. Select Allow access to the domain from the specific IP(s).

135
6. Specify the comma-separated IPs.

7. Review and create Elasticsearch domain.


Elasticsearch Service Dashboard will have the domains created after 10 to 15 minutes.

8. After configuring and connecting SearchBlox IndexServer (check the next section) you can

136
 View Cluster health
 View Status of Indices
 View the mappings of fields within the indices
 Monitor the status of the Elasticsearch service

137
5. Start SearchBlox IndexServer via Amazon Marketplace.
Go to the AWS Marketplace: https://aws.amazon.com/marketplace.
Search for SearchBlox and select IndexServer. For cluster setup, create SearchBlox SearchServer after
creating SearchBlox IndexServer.

138
Check and click continue, which will take you to the page below:

139
Select the VPC created in earlier step.

Select the Key Pair created earlier and launch the instance.

140
141
Go to EC2 Dashboard.

142
Integrate with IAM Role
This is an important step where we integrate IAM role with SearchBlox IndexServer.
Right-click the Server Instance, then go to Instance Settings -> Attach Replace IAM Role.

Select and save the role to the instance.

143
SSH into SearchBlox IndexServer
 SSH into the SearchBlox IndexServer instance using the user ec2-user and the pem or ppk file.
 Change user to jetty.
 Shell
CopyCopiedsudo su - jetty
sudo su - jetty

144
 Edit /srv/jetty/sb/webapps/searchblox/WEB-INF/elasticsearch.yml to update the properties for
AWS ES domain as follows:
 YAML
CopyCopiedsearchblox.aws.region: us-east-1 searchblox.aws.url: https://search-XXXXXX.us-east-
1.es.amazonaws.com
searchblox.aws.region: us-east-1
searchblox.aws.url: https://search-XXXXXX.us-east-1.es.amazonaws.com

The aws.region is the region selected while creating SearchBlox IndexServer and the Elasticsearch
instance, which will also be available in the AWS URL in Elasticsearch. The aws.url is the endpoint
specified in the Elasticsearch instance.

145
 Restart SearchBlox as follows:
 Shell
CopyCopiedservice jetty restart
service jetty restart

 Access the SearchBlox Admin Console at https://xxxx:8443/searchblox/admin/main.jsp where


xxxx is the Public DNS of the SearchBlox IndexServer instance.
 Access the SearchBlox Search URLs as follows:
SearchBlox Basic Search URL: https://xxxx:8443/searchblox/search.jsp
SearchBlox Faceted Search url : https://xxxx:8443/searchblox/plugin/index.html
where xxxx is is the Public DNS of the SearchBlox SearchServer instance.

146
Increase RAM memory for SearchBlox in AWS.
After logging on as a jetty user using the following command:
sudo su - jetty
Go to edit /etc/default/jetty file and give the memory parameters in JAVA_OPTIONS. The content of
the jetty file is given below:
12G refers 12 GB memory has been allocated to SearchBlox
 Text
CopyCopiedJAVA_OPTIONS="-server -Xms12G -Xmx12G -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70 -Djetty.http.host=0.0.0.0" JETTY_HOME=/srv/jetty
JETTY_RUN=/srv/jetty/run JETTY_USER=jetty TMPDIR=/srv/jetty/temp
JETTY_BASE=/srv/jetty/sb
JAVA_OPTIONS="-server -Xms12G -Xmx12G -XX:
+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70
-Djetty.http.host=0.0.0.0"
JETTY_HOME=/srv/jetty
JETTY_RUN=/srv/jetty/run
JETTY_USER=jetty
TMPDIR=/srv/jetty/temp
JETTY_BASE=/srv/jetty/sb

147
Kibana and Amazon Elasticsearch Service
 Data indexed, as well as logs, are stored in the Elasticsearch domain. To view the logs, you can
map the Elasticsearch index named sbindexlog in Kibana and search for the entries.
The Kibana link will be available in the Domain dashboard. Refer to screenshot below:

148
 Click the link and access Kibana.

149
 Adding log indexes in Kibana.
The two logs that can be added in Kibana are sbindexlog and sbstatuslog. You can add both logs
in one index pattern.

150
Alternatively, you can create a separate index pattern for each log.

151
152
You can also query the logs based on URL, timestamp, etc.

153
 It is also possible to delete indexes via Kibana. Go to Dev Tools in the left-hand menu. * To
delete the Elasticsearch indices, click Get to Work .

154
155
Amazon Kinesis

Amazon Kinesis is an Amazon Web Service (AWS) for processing big data in real
time.

156
Kinesis is capable of processing hundreds of terabytes per hour from high volumes
of streaming data from sources such as operating logs, financial transactions and
social media feeds. According to Amazon, Kinesis fills a gap left by Hadoop and
other technologies that process data in batches, but that don't enable real-time
operational decisions about constantly streaming data. That capability, in turn,
simplifies the process of writing apps that rely on data that must be processed in
real time.

Amazon Kinesis integrates with Amazon Redshift, Amazon Dynamo Database and
Amazon Simple Storage Service (Amazon S3), as well as with many third-party
products. Customers are billed on the standard AWS pay-as-you-go plan, with
payments based on the amount of data processed and the way in which the
information is packaged.

157
When Should I Use Amazon Aurora and When Should I use
RDS MySQL?

Now that Database-as-a-service (DBaaS) is in high demand, there is one question regarding AWS
services that cannot always be answered easily : When should I use Aurora and when RDS MySQL?

DBaaS cloud services allow users to use databases without configuring physical hardware and
infrastructure, and without installing software. I’m not sure if there is a straightforward answer, but
when trying to find out which solution best fits an organization there are multiple factors that should be
taken into consideration. These may be performance, high availability, operational cost, management,
capacity planning, scalability, security, monitoring, etc.

158
There are also cases where although the workload and operational needs seem to best fit to one
solution, there are other limiting factors which may be blockers (or at least need special handling).

In this blog post, I will try to provide some general rules of thumb but let’s first try to give a short
description of these products.

What we should really compare is the MySQL and Aurora database engines provided by Amazon RDS.

An introduction to Amazon RDS

Amazon Relational Database Service (Amazon RDS) is a hosted database service which provides
multiple database products to choose from, including Aurora, PostgreSQL, MySQL, MariaDB, Oracle,
and Microsoft SQL Server. We will focus on MySQL and Aurora.

With regards to systems administration, both solutions are time-saving. You get an environment ready
to deploy your application and if there are no dedicated DBAs, RDS gives you great flexibility for
operations like upgrades or backups. For both products, Amazon applies required updates and the latest
patches without any downtime. You can define maintenance windows and automated patching (if
enabled) will occur within them. Data is continuously backed up to S3 in real time, with no
performance impact. This eliminates the need for backup windows and other, complex or not, scripted
procedures. Although this sounds great, the risk of vendor lock-in and the challenges of enforced
updates and client-side optimizations are still there.

So, Aurora or RDS MySQL?

Amazon Aurora is a relational, proprietary, closed-source database engine, with all that that implies.

RDS MySQL is 5.5, 5.6 and 5.7 compatible and offers the option to select among minor releases.
While RDS MySQL supports multiple storage engines with varying capabilities, not all of them are
optimized for crash recovery and data durability. Until recently, it was a limitation that Aurora was only
compatible with MySQL 5.6 but it’s now compatible with both 5.6 and 5.7 too.

So, in most cases, no significant application changes are required for either product. Keep in mind that
certain MySQL features like the MyISAM storage engine are not available with Amazon
Aurora. Migration to RDS can be performed using Percona XtraBackup.

159
For RDS products shell access to the underlying operating system is disabled and access to MySQL
user accounts with the “SUPER” privilege isn’t allowed. To configure MySQL variables or manage
users, Amazon RDS provides specific parameter groups, APIs and other special system procedures
which be used. If you need to enable remote access this article will help you do
so https://www.percona.com/blog/2018/05/08/how-to-enable-amazon-rds-remote-access/

Performance considerations

Although Amazon RDS uses SSDs to achieve better IO throughput for all its database services,
Amazon claims that the Aurora is able to achieve a 5x performance boost than standard MySQL and
provides reliability out of the box. In general, Aurora seems to be faster, but not always.

For example, due to the need to disable the InnoDB change buffer for Aurora (this is one of the keys
for the distributed storage engine), and that updates to secondary indexes must be write through, there
is a big performance penalty in workloads where heavy writes that update secondary indexes are
performed. This is because of the way MySQL relies on the change buffer to defer and merge
secondary index updates. If your application performs a high rate of updates against tables with
secondary indexes, Aurora performance may be poor. In any case, you should always keep in mind that
performance depends on schema design. Before taking the decision to migrate, performance should be
evaluated against an application specific workload. Doing extensive benchmarks will be the subject of
a future blog post.

Capacity Planning

Talking about underlying storage, another important thing to take into consideration is that with Aurora
there is no need for capacity planning. Aurora storage will automatically grow, from the minimum of
10 GB up to 64 TiB, in 10 GB increments, with no impact on database performance. The table size
limit is only constrained by the size of the Aurora cluster volume, which has a maximum of 64
tebibytes (TiB). As a result, the maximum table size for a table in an Aurora database is 64 TiB. For
RDS MySQL, the maximum provisioned storage limit constrains the size of a table to a maximum size
of 16 TB when using InnoDB file-per-table tablespaces.

Replication

Replication is a really powerful feature of MySQL (like) products. With Aurora, you can provision up
to fifteen replicas compared to just five in RDS MySQL. All Aurora replicas share the same underlying

160
volume with the primary instance and this means that replication can be performed in milliseconds as
updates made by the primary instance are instantly available to all Aurora replicas. Failover is
automatic with no data loss on Amazon Aurora whereas the replicas failover priority can be set.

An explanatory description of Amazon Aurora’s architecture can be found in Vadim’s post written a
couple of years ago https://www.percona.com/blog/2015/11/16/amazon-aurora-looking-deeper/

The architecture used and the way that replication works on both products shows a really significant
difference between them. Aurora is a High Availablity (HA) solution where you only need to attach a
reader and this automatically becomes Multi-AZ available. Aurora replicates data to six storage nodes
in Multi-AZs to withstand the loss of an entire AZ (Availability Zone) or two storage nodes without
any availability impact to the client’s applications.

On the other hand, RDS MySQL allows only up to five replicas and the replication process is slower
than Aurora. Failover is a manual process and may result in last-minute data loss. RDS for MySQL is
not an HA solution, so you have to mark the master as Multi-AZ and attach the endpoints.

Monitoring

Both products can be monitored with a variety of monitoring tools. You can enable automated
monitoring and you can define the log types to publish to Amazon CloudWatch. Percona Monitoring
and Management (PMM) can also be used to gather metrics.

Be aware that for Aurora there is a limitation for the T2 instances such that Performance Schema can
cause the host to run out of memory if enabled.

Costs

Aurora instances will cost you ~20% more than RDS MySQL. If you create Aurora read replicas then
the cost of your Aurora cluster will double. Aurora is only available on certain RDS instance sizes.
Instances pricing details can be found here and here.

Storage pricing may be a bit tricky. Keep in mind that pricing for Aurora differs to that for RDS
MySQL. For RDS MySQL you have to select the type and size for the EBS volume, and you have to be
sure that provisioned EBS IOPs can be supported by your instance type as EBS IOPs are restricted by

161
the instance type capabilities. Unless you watch for this, you may end up having EBS IOPs that cannot
be really used by your instance.

For Aurora, IOPs are only limited by the instance type. This means that if you want to increase IOPs
performance on Aurora you should proceed with an instance type upgrade. In any case, Amazon will
charge you based on the dataset size and the requests per second.

That said, although for Aurora you pay only for the data you really use in 10GB increments if you want
high performance you have to select the correct instance. For Aurora, regardless of the instance type,
you get billed $0.10 per GB-month and $0.20 per 1 million requests so if you need high performance
the cost maybe even more than RDS MySQL. For RDS MySQL storage costs are based on the EBS
type and size.

Percona provides support for RDS services and you might be interested in these cases studies:

Lookout Uses Percona’s Cloud Expertise to Reduce Footprint and Maintain Uptime

Madwire Achieves Performance Assurance for Amazon RDS Aurora Through Percona’s
Database Audit and Consultancy Services

When a more fully customized solution is required, most of our customers usually prefer the use of
AWS EC2 instances supported by our managed services offering.

TL;DR

If you are looking for a native HA solution then you should use Aurora

For a read-intensive workload within an HA environment, Aurora is a perfect match.


Combined with ProxySQL for RDS you can get a high flexibility

Aurora performance is great but is not as much as expected for write-intensive workloads
when secondary indexes exist. In any case, you should benchmark both RDS MySQL and
Aurora before taking the decision to migrate. Performance depends much on workload and
schema design

By choosing Amazon Aurora you are fully dependent on Amazon for bug fixes or upgrades

162
If you need to use MySQL plugins you should use RDS MySQL

Aurora only supports InnoDB. If you need other engines i.e. MyISAM, RDS MySQL is the
only option

With RDS MySQL you can use specific MySQL releases

Aurora is not included in the AWS free-tier and costs a bit more than RDS MySQL. If you only
need a managed solution to deploy services in a less expensive way and out of the box
availability is not your main concern, RDS MySQL is what you need

If for any reason Performance Schema must be ON, you should not enable this on Amazon
Aurora MySQL T2 instances. With the Performance Schema enabled, the T2 instance may run
out of memory

For both products, you should carefully examine the known issues and limitations listed here
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.KnownIssuesAndLimitati
ons.html and here
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.AuroraMySQL.html

163
Drawbacks and Alternatives to DynamoDB
A head-to-head database battle between Scylla and DynamoDB is a real David versus Goliath situation.
It’s Rocky Balboa versus Apollo Creed. Is it possible Scylla could deliver an unexpected knockout
punch against DynamoDB?
To be clear, Scylla is not a competitor to AWS at all. Many of our customers deploy Scylla to AWS, we
ourselves find it to be an outstanding platform, and on more than one occasion we’ve blogged about its
unique bare metal instances. Here’s further validation — our Scylla Cloud service runs on top of AWS.
But we do think we might know a bit more about building a real-time big data database, so we limited
the scope of this competitive challenge solely to Scylla versus DynamoDB, database-to-database.
Scylla is a drop-in replacement for Cassandra, implemented from scratch in C++. Cassandra itself was
a reimplementation of concepts from the Dynamo paper. So, in a way, Scylla is the “granddaughter” of
Dynamo. That means this is a family fight, where a younger generation rises to challenge an older one.
It was inevitable for us to compare ourselves against our “grandfather,” and perfectly in keeping with
the traditions of Greek mythology behind our name.
If you compare Scylla and Dynamo, each has pros and cons, but they share a common class of NoSQL
database: Column family with wide rows and tunable consistency. Dynamo and its Google counterpart,
Bigtable, were the first movers in this market and opened up the field of massively scalable services —
very impressive by all means.
Scylla is much younger opponent, just 4.5 years in age. Though Scylla is modeled on Cassandra,
Cassandra was never our end goal, only a starting point. While we stand on the shoulders of giants in
terms of existing design, our proven system programing abilities have come heavily into play and led to
performance to the level of a million operations per second per server. We recently announced feature
parity (minus transactions) with Cassandra, and also our own database-as-a-service offering, Scylla
Cloud.
But for now we’ll focus on the question of the day: Can we take on DynamoDB?

164
Rules of the Game
With our open source roots, our culture forces us to be fair as possible. So we picked a reasonable
benchmark scenario that’s supposed to mimic the requirements of a real application and we will judge
the two databases from the user perspective. For the benchmark we used Yahoo! Cloud Serving
Benchmark (YCSB) since it’s a cross-platform tool and an industry standard. The goal was to meet a
Service Level Agreement of 120K operations per second with a 50:50 read/write split (YCSB’s
workload A) with a latency under 10ms in the 99% percentile. Each database would provision the
minimal amount of resources/money to meet this goal. Each DB should be populated first with 1 billion
rows using the default, 10 column schema of YCSB.
We conducted our tests using Amazon DynamoDB and Amazon Web Services EC2 instances as
loaders. Scylla also used Amazon Web Services EC2 instances for servers, monitoring tools and the
loaders.
These tests were conducted on Scylla Open Source 2.1, which is the code base for Scylla Enterprise
2018.1. Thus performance results for these tests will hold true across both Open Source and Enterprise.
However, we use Scylla Enterprise for comparing Total Cost of Ownership
DynamoDB is known to be tricky when the data distribution isn’t uniform, so we selected uniform
distribution to test Dynamo within its sweet spot. We set 3 nodes of i3.8xl for Scylla, with replication
of 3 and quorum consistency level, loaded the 1 TB dataset (replicated 3 times) and after 2.5 hours it
was over, waiting for the test to begin.
Scylla Enterprise Amazon DynamoDB
Scylla Cluster Provisioned Capacity

 i3.8xlarge | 32 vCPU | 244 GiB | 4 x 1.9TB  160K write | 80K read (strong consistency)
NVMe  Dataset: ~1.1TB (1B partitions / size:
 3-node cluster on single DC | RF=3 ~1.1Kb)
 Dataset: ~1.1TB (1B partitions / size:  Storage size: ~1.1 TB (DynamoDB table
~1.1Kb) metrics)
 Total used storage: ~3.3TB

 Workload-A: 90 min, using 8 YCSB clients, every client runs on its own data range (125M
partitions)
 Loaders: 4 x m4.2xlarge (8 vCPU | 32 GiB RAM), 2 loaders per machine
 Scylla workloads runs with Consistency Level = QUORUM for writes and reads.
 Scylla starts with a cold cache in all workloads.
 DynamoDB workloads ran with dynamodb.consistentReads = true
 Sadly for DynamoDB, each item weighted 1.1kb – YCSB default schema, thus each write
originated in two accesses

Let the Games Begin!


We started to populate Dynamo with the dataset. However, not so fast..

165
Turns out the population stage is hard on DynamoDB. We had to slow down the population rate time
and again, despite it being well within the reserved IOPS. Sometimes we managed to populate up to 0.5
billion rows before we started to receive the errors again.
Each time we had to start over to make sure the entire dataset was saved. We believe DynamoDB needs
to break its 10GB partitions through the population and cannot do it in parallel to additional load
without any errors. The gory details:
 Started population with Provisioned capacity: 180K WR | 120K RD.
 We hit errors on ~50% of the YCSB threads causing them to die when using ≥50% of
write provisioned capacity.
 For example, it happened when we ran with the following throughputs:
 55 threads per YCSB client = ~140K throughput (78% used capacity)
 45 threads per YCSB client = ~130K throughput (72% used capacity)
 35 threads per YCSB client = ~96K throughput (54% used capacity)
After multiple attempts with various provisioned capacities and throughputs, eventually a
streaming rate was found that permitted a complete database population. Here are the results of
the population stage:

166
YCSB Workload / Scylla Open Source 2.1 (3x DynamoDB (160K WR | 80K RD)
Description i3.8xlarge) 8 YCSB clients
8 YCSB Clients
Population Overall Throughput(ops/sec): 104K Overall Throughput(ops/sec): 51.7K
100% Write Avg Load (scylla-server): ~85% Max Consumed capacity: WR 75%

Range INSERT operations (Avg): 125M INSERT operations (Avg): 125M


1B partitions (~1.1Kb) Avg. 95th Percentile Latency (ms): Avg. 95th Percentile Latency (ms):
8.4 7.5
Distribution: Avg. 99th Percentile Latency (ms): Avg. 99th Percentile Latency (ms):
Uniform 11.3 11.6
Scylla completed the population at twice the speed but more importantly, worked out of the box
without any errors or pitfalls.

YCSB Workload A, Uniform Distribution


Finally, we began the main test, the one that gauges our potential user workload with an SLA of
120,000 operations. This scenario is supposed to be DynamoDB’s sweet spot. The partitions are well
balanced and the load isn’t too high for DynamoDB to handle. Let’s see the results:
YCSB Workload / Scylla Open Source 2.1 (3x DynamoDB (160K WR | 80K RD)
Description i3.8xlarge) 8 YCSB clients
8 YCSB Clients
Workload A Overall Throughput(ops/sec): Overall Throughput(ops/sec): 120.1K
50% Read / 50% Write 119.1K Avg Load (scylla-server): ~WR 76% |
Avg Load (scylla-server): ~58% RD 76%
Range
1B partitions (~1.1Kb) READ operations (Avg): ~39.93M READ operations (Avg): ~40.53M
Avg. 95th Percentile Latency (ms): Avg. 95th Percentile Latency (ms): 12.0
Distribution: 5.0 Avg. 99th Percentile Latency (ms): 18.6
Uniform Avg. 99th Percentile Latency (ms):
7.2 UPDATE operations (Avg): ~40.53M
Duration: 90 min. Avg. 95th Percentile Latency (ms): 13.2
UPDATE operations Avg. 99th Percentile Latency (ms): 20.2
(Avg): ~39.93M
Avg. 95th Percentile Latency
(ms): 3.4
Avg. 99th Percentile Latency
(ms): 5.6
After all the effort of loading the data, DynamoDB was finally able to demonstrate its value.
DynamoDB met the throughput SLA (120k OPS). However, it failed to meet the latency SLA of 10ms
for 99%, but after the population difficulties we were happy to get to this point.

167
Scylla on the other hand, easily met the throughput SLA, with only 58% load and latency. That was 3x-
4x better than DynamoDB and well below our requested SLA. (Also, what you don’t see here is the
huge cost difference, but we’ll get to that in a bit.)
We won’t let DynamoDB off easy, however. Now that we’ve seen how DynamoDB performs with its
ideal uniform distribution, let’s have a look at how it behaves with a real life use-case.

Real Life Use-case: Zipfian Distribution


A good schema design goal is to have the perfect, uniform distribution of your primary keys. However,
in real life, some keys are accessed more than others. For example, it’s common practice to use UUID
for the customer or the product ID and to look them up. Some of the customers will be more active than
others and some products will be more popular than others, so the differences in access times can go up
to 10x-1000x. Developers cannot improve the situation in the general case since if you add an
additional column to the primary key in order to improve the distribution, you may improve the
specific access but at the cost of complexity when you retrieve the full information about the
product/customer.
Keep in mind what you store in a database. It’s data such as how many people use Quora or how many
likes NBA teams have:

With that in mind, let’s see how ScyllaDB and DynamoDB behave given a Zipfian distribution access
pattern. We went back to the test case of 1 billion keys spanning 1TB of pre-replicated dataset and

168
queried it again using YCSB Zipfian accesses. It is possible to define the hot set of partitions in terms
of volume — how much data is in it — and define the percentile of access for this hot set as part from
the overall 1TB set.
We set a variety of parameters for the hot set and the results were pretty consistent – DynamoDB could
not meet the SLA for Zipfian distribution. It performed well below its reserved capacity — only 42%
utilization — but it could not execute 120k OPS. In fact, it could do only 65k OPS. The YCSB client
experienced multiple, recurring ProvisionedThroughputExceededException (code: 400)
errors, and throttling was imposed by DynamoDB.
YCSB Workload / Scylla 2.1 (3x i3.8xlarge) DynamoDB (160K WR | 80K RD)
Description 8 YCSB Clients 8 YCSB clients
Workload A Overall Throughput(ops/sec): Overall Throughput(ops/sec): 65K
50% Read / 50% Write 120.2K Avg Load (scylla-server): ~WR 42% |
Avg Load (scylla-server): ~55% RD 42%
Range: 1B partitions
READ operations (Avg): READ operations (Avg): ~21.95M
Distribution: Zipfian ~40.56M Avg. 95th Percentile Latency (ms): 6.0
Avg. 95th Percentile Latency Avg. 99th Percentile Latency (ms): 9.2
Duration: 90 min. (ms): 6.1
Avg. 99th Percentile Latency UPDATE operations (Avg): ~21.95M
Hot set: 10K partitions (ms): 8.6 Avg. 95th Percentile Latency (ms): 7.3
Hot set access: 90% Avg. 99th Percentile Latency (ms): 10.8
UPDATE operations
(Avg): ~40.56M
Avg. 95th Percentile Latency
(ms): 4.4
Avg. 99th Percentile Latency
(ms): 6.6
Why can’t DynamoDB meet the SLA in this case? The answer lies within the Dynamo model. The
global reservation is divided to multiple partitions, each no more than 10TB in size.

169
This when such a partition is accessed more often it may reach its throttling cap even though overall
you’re well within your global reservation. In the example above, when reserving 200 writes, each of
the 10 partitions cannot be queried more than 20 writes/s

The Dress that Broke DynamoDB


If you asked yourself, “Hmmm, is 42% utilization the worst I’d see from DynamoDB?” we’re afraid
we have some bad news for you. Remember the dress that broke the internet? What if you have an item

170
in your database that becomes extremely hot? To explore this, we tested a single hot partition access
and compared it.

We ran a single YCSB, working on a single partition on a 110MB dataset (100K partitions). During our
tests, we observed a DynamoDB limitation when a specific partition key exceeded 3000 read capacity
units (RCU) and/or 1000 write capacity units (WCU).
Even when using only ~0.6% of the provisioned capacity (857 OPS), the YCSB client
experienced ProvisionedThroughputExceededException (code: 400) errors, and throttling was imposed
by DynamoDB (see screenshots below).
It’s not that we recommend not planning for the best data model. However, there will always be cases
when your plan is far from reality. In the Scylla case, a single partition still performed reasonably well:
20,200 OPS with good 99% latency.

Scylla vs DynamoDB – Single (Hot) Partition


YCSB Workload / Scylla 2.1 (3x i3.8xlarge) DynamoDB (160K WR | 80K RD)
Description 8 YCSB Clients 8 YCSB clients
Workload A Overall Throughput(ops/sec): Overall Throughput(ops/sec): 857
50% Read / 50% Write 20.2K Avg Load (scylla-server): ~WR 0.6% |
Avg Load (scylla-server): ~5% RD 0.6%
Range:
Single partition (~1.1Kb) READ operations (Avg): ~50M READ operations (Avg): ~2.3M
Avg. 95th Percentile Latency Avg. 95th Percentile Latency (ms): 5.4
Distribution: Uniform (ms): 7.3 Avg. 99th Percentile Latency (ms): 10.7
Avg. 99th Percentile Latency
Duration: 90 min. (ms): 9.4 UPDATE operations (Avg): ~2.3M
Avg. 95th Percentile Latency (ms): 7.7
UPDATE operations Avg. 99th Percentile Latency (ms): 607.8
(Avg): ~50M
Avg. 95th Percentile Latency
(ms): 2.7
Avg. 99th Percentile Latency
(ms): 4.5

171
Screenshot 1: Single partition. Consumed capacity: ~0.6% -> Throttling imposed by DynamoDB

Additional Factors
Cross-region Replication and Global Tables
We compared the replication speed between datacenters and a simple comparison showed that
DynamoDB replicated in 370ms on average to a remote DC while Scylla’s average was 82ms. Since
the DynamoDB cross-region replication is built on its streaming api, we believe that when congestion
happens, the gap will grow much further into a multi-second gap, though we haven’t yet tested it.
Beyond replication propagation, there is a more burning functional difference — Scylla can easily add
regions on demand at any point in the process with a single command:
ALTER KEYSPACE mykespace WITH replication = { 'class' :
'NetworkTopologyStrategy', 'replication_factor': '3',
'<exiting_dc>' : 3, <new_dc> : 4};
In DynamoDB, on the other hand, you must define your global tables ahead of time. This imposes a
serious usability issue and a major cost one as you may need to grow the amount of deployed
datacenters over time.

172
173
Explicit Caching is Expensive and Bad for You
DynamoDB performance can improve and its high cost can be reduced in some cases when using
DAX. However, Scylla has a much smarter and more efficient embedded cache (the database nodes
have memory, don’t they?) and the outcome is far better for various reasons we described in a
recent blog post.
Freedom
This is another a major advantage of Scylla — DynamoDB locks you to the AWS cloud, significantly
decreasing your chances of ever moving out. Data gravity is significant. No wonder they’re going after
Oracle!
Scylla is an open source database. You have the freedom to choose between our community version, an
Enterprise version and our new fully managed service. Scylla runs on all major cloud providers and
opens the opportunity for you to run some datacenters on one provider and others on another provider
within the same cluster. One of our telco customers is a great example of the hybrid model — they
chose to run some of their datacenters on-premise and some on AWS.

174
Our approach for “locking-in” users is quite different — we do it solely by the means of delivering
quality and value such that you won’t want to move away from us. As of today, we have experienced
exactly zero customer churn.

No Limits
DynamoDB imposes various limits on the size of each cell — only 400kb. In Scylla you can effectively
store megabytes. One of our customers built a distributed storage system using Scylla, keeping large
blobs in Scylla with single-digit millisecond latency for them too.
Another problematic limit is the sort key amount, DynamoDB cannot hold more than 10GB items.
While this isn’t a recommended pattern in Scylla either, we have customers who keep 130GB items in a
single partition. The effect of these higher limits is more freedom in data modeling and fewer reasons
to worry.

Total Cost of Ownership (TCO)


We’re confident the judges would award every round of this battle to Scylla so far, and we haven’t even
gotten to comparing the total cost of ownership. The DynamoDB setup, which didn’t even meet the
required SLA and which caused us to struggle multiple times to even get working, costs 7 times more
than the comparable Scylla setup.

175
ElastiCache versus self-hosted Redis on EC2
Often, there comes a time when you have to choose between managed services versus self-hosted
services, especially in the cloud world. Both these services have their own set of pros and cons. But,
each of these services provide an added advantage, provided you know your use case well. This theory
holds true for ElastiCache versus self-hosted Redis on EC2.
This post compares practical and impractical pointers around these two services, so you can choose the
right service for your use case.

The Practical Comparison: ElastiCache Vs. Self-hosted Redis on


EC2
Redis is one of the leading open source, in-memory, key value store. It is a good caching tool. If you
are an AWS user, you can leverage this tool via an EC2 instance (by self-hosting) or ElastiCache.
The benefits of using ElastiCache is that AWS manages the servers hosting the Redis. Whereas, the
benefits of using self-hosted Redis on EC2 is it provides the freedom to maneuver between
configurations. There are several other such differences. Knowing these differences will equip you
enough to make the right choice.
To start with, let’s walk you through the top differences between ElastiCache and self-hosted Redis on
EC2.

176
177
Deep Diving into the Practicalities of ElastiCache and Self-
hosted Redis on EC2
ElastiCache: Supports Fully Managed Redis and Memcached
ElastiCache seamlessly deploys, runs, and scales Redis as well as MemCached in-memory data stores.
It automatically performs management tasks like software patching, setup, configuration, hardware
provisioning, failure recovery, backups etc. There’s no risk of losing workloads, as it continuously
monitors clusters. This makes it ideal for building data-intensive apps for media sharing, social
networking, gaming, Ad-Tech, finance, healthcare, IoT, etc.
ElastiCache: Scales Automatically According to Requirements
One of the most adored features of ElastiCache is its scalability feature. It can scale-out, scale-in, and
scale-up as per application demands. In addition, write and memory scaling is supported with sharding,
while replicas provide read scaling.
ElastiCache: Instances with More Than One vCPU Cannot Utilize All the Cores
Redis uses a single thread of execution for reads/writes. Only one thread/process will take care of
reads/writes in the database. This ensures that there are no deadlocks occurring due to multiple threads
of writing/reading multiple information into a disk. This is an extremely powerful feature of Redis in
terms of performance, as it removes the need to manage locks and latches. However, this one thread
can use only one core, and vCPU does all the job. So, you do not have the freedom to use multiple
CPUs. Consequently, ElastiCache instances with more than 1 CPU face wastage of extra vCPUs.
To provide better visibility into CPU utilization, Amazon introduced ‘EngineCPUUtilization’ metric
sometime during April 2018.
Self Hosted Redis on EC2: Allows You to Update Latest Version ASAP
One of the major advantages of using self hosted Redis cluster is that the you can always stay updated
with the most recent version. You can utilize the best features of the software even before the rest of the
world can actually start using it.
Self Hosted Redis on EC2: Provides the Freedom to Modify Configurations
Self hosted Redis on EC2 provides the freedom to understand its underlying functionalities and modify
the configurations as per your requirement. For example, to modify Redis configuration to continually
take snapshots, you can:
save 900 1

save 300 10

save 60 10000

Create Snapshot if there is a minimum of 1 change within 900 seconds


Create Snapshot if there are minimum 10 changes within 300 seconds
Create Snapshot if there are minimum 10000 changes within 900 seconds

178
Other configurations like “stop-writes-on-bgsave-error” and “maxmemory” are very useful config
changes. If you are looking for more tweaking details, check this list below:
 https://scaleyourcode.com/blog/article/25
 https://scaleyourcode.com/blog/article/15
 https://dzone.com/articles/redis-performance-benchmarks

Self Hosted Redis on EC2: Unavailability of Pertinent Metrics Makes Maintenance Tedious
Even though Redis on EC2 provides the freedom to maneuver in terms of configuration, it is difficult to
maintain. Monitoring the metrics is not easy. You either need to use a third party tool, like
AppDynamics, or call APIs manually to monitor the metrics from Redis. You can automate
scaling/updating/upgrading/security patches etc. using tools like Ansible, Chef, or Puppet. This is cost-
effective but effort intensive.
Self Hosted Redis on EC2: Instance Limitations
Amazon recommends users to only use HVM based EC2 instances. Only a handful of PV based
instances are available due to latency issues.

What is AWS EC2 ?

Amazon Elastic Compute Cloud, EC2 is a web service from Amazon that provides re-
sizable compute services in the cloud.

How are they re-sizable?


They are re-sizable because you can quickly scale up or scale down the number of
server instances you are using if your computing requirements change.
That brings us to our next question.

179
What is an Instance?
An instance is a virtual server for running applications on Amazon’s EC2. It can also be
understood like a tiny part of a larger computer, a tiny part which has its own Hard
drive, network connection, OS etc. But it is actually all virtual. You can have multiple
“tiny” computers on a single physical machine, and all these tiny machines are called
Instances.
Difference between a service and an Instance?
Let’s understand it this way:
 EC2 is a service along with other Amazon Web Services like S3 etc.
 When we use EC2 or any other service, we use it through an instance, e.g.
t2.micro instance, in EC2 etc.

Why AWS EC2 ?

Why not buy your own stack of servers and work independently? Because, suppose
you are a developer, and since you want to work independently you buy some servers,
you estimated the correct capacity, and the computing power is enough. Now, you have
to look after the updation of security patches every day, you have to troubleshoot any
problem which might occur at a back end level in the servers and so on. These are all
extra chores that you will be doing or maybe you will hire someone else to do these
things for you.
But if you buy an EC2 instance, you don’t have to worry about any of these things as it
will all be managed by Amazon; you just have to focus on your application. That too, at
a fraction of a cost that you were incurring earlier! Isn’t that interesting?

Let’s understand Cost Savings using an example.


Suppose instead of taking AWS EC2, we consider taking a dedicated set of servers, so,
what all we might have to face:
 Now for using these servers we have to hire an IT team which can handle them.
 Also, having a fault in the system is unavoidable, therefore we have to bear the
cost of getting it fixed, and if you don’t want to compromise on your up-time you
have to keep your systems redundant to other servers, which might become
more expensive.
 Your own purchased assets will depreciate over the period of time, however, as a
matter of fact the cost of an instance have dropped more than 50% over a 3-year

180
period, while improving processor type and speed. So eventually, moving to
Cloud is all more suggested.
 For scaling up we have to add more servers, and if your application is new and
you experience a sudden traffic, scaling up that quickly might become a
problem.
These are just a few problems and there are many others scenarios which make the
case for EC2 stronger!

Let’s understand the types of EC2 Computing Instances:


Computing is a very broad term, the nature of your task decides what kind of
computing you need.
Therefore, AWS EC2 offers 5 types of instances which are as follows:
 General Instances
 For applications that require a balance of performance and cost.
 E.g email responding systems, where you need a prompt response as
well as the it should be cost effective, since it doesn’t require much
processing.
 Compute Instances
 For applications that require a lot of processing from the CPU.
 E.g analysis of data from a stream of data, like Twitter stream
 Memory Instances
 For applications that are heavy in nature, therefore, require a lot of RAM.
 E.g when your system needs a lot of applications running in the
background i.e multitasking.
 Storage Instances
 For applications that are huge in size or have a data set that occupies a lot
of space.
 E.g When your application is of huge size.
 GPU Instances
 For applications that require some heavy graphics rendering.
 E.g 3D modelling etc.
Now, every instance type has a set of instances which are optimized for different
workloads:
 General Instances
 t2
 m4
 m3

181
 Compute Instances
 c4
 c3
 Memory Instances
 r3
 x1
 Storage Instances
 i2
 d2
 GPU Instances
 g2
Now let’s understand the kind of work that each instance is optimized for, in this AWS
EC2 Tutorial:
Burstable Performance Instances
 T2 instances are burstable instances, meaning the CPU performs at a baseline,
say 20% of its capability. When your application needs more than 20% of the
performance of the CPU, the CPU enters into a burst mode giving higher
performance for a limited amount of time, therefore work happens faster.
 You get these credits when your CPU is idle.
 Each CPU credit gives a burst of 1 minute to the CPU.
 If your CPU credits are not used they are credited to your account and they
stay there for 24 hours.
 Based on your credit balance, you can decide whether the t2 instance,
should be scaled up or down.
 These bursts happen at a cost, every time a burst happens in a CPU, CPU
credits are used.
EBS-optimized Instances
 C4, M4, and D2 instances, are EBS optimized by default, EBS means Elastic Block
Storage, which is a storage option provided by AWS in which the IOPS* rate is
quite high. Therefore, when an EBS volume is attached to an optimized instance,
single digit millisecond latencies can be achieved.
*IOPS (Input/Output Operations Per Second, pronounced eye-ops) is a performance
measurement used to characterize computer storage devices.
Cluster Networking Instances

182
 X1, M4, C4, C3, I2, G2 and D2 instances support cluster networking. Instances
launched into a common placement group are put in a logical group that
provides high-bandwidth, low latency between all the instances in the group.
 A placement group is basically a logical cluster where some select EC2
instances which are a part of that group can utilize up to 10Gbps for
single flow and 20Gbps for multi flow traffic in each direction.
 Instances which are not a part of that group are limited to 5 Gbps speed in
multi flow traffic. Cluster Networking is ideal for high performance
analytics system.
Dedicated Instances
 They are the instances that run on single-tenant hardware dedicated to a single
customer.
 They are perfect for workloads where a corporate policy or industry regulation
requires that your instance should be isolated from any other customer’s
instance, therefore they go for their own separate machines, and their instances
are isolated at the hardware level.
Let’s understand this through an example. Suppose in our company Edureka, we have the
following tasks:
 Analysis of customer’s data
 Customer’s website activity, etc. should all be monitored in real-time.
There will be times when the traffic on the website will be minimum,
therefore using a very powerful processor should not be considered, since
it will become expensive for the company because it will not be used for
every hour of the day. Hence, for this task, we might take t2 instances
because they give Burstable CPU performance i.e when the traffic will be
more the CPU performance will be increased accordingly to meet the
requirements.
 Our auto-response emailing system
 It should be quick, therefore we would require systems, where the
response time is as short as possible. This could be achieved by using EBS
optimized instances, as they offer high IOPS and hence, low latencies.
 The search engine on our website
 It should be able to sort the keywords and return relevant results,
therefore we might have 2 servers for this. One is the database and the
other server for processing the keywords. Therefore, the communication
between these servers should be at the maximum possible rate. To achieve

183
this, we can put them in a placement group and for that we have to use
Cluster Networking Instances.
 Some processes in every organisation are highly confidential
 Because these processes give us an edge over other companies, no matter
how secure the servers, maybe, some policies are still made to be sure.
Therefore, we might use Dedicated Instances for these kind of processes.
We now know about instances, let’s learn how to launch these instances?

 Login to your AWS account and click on AWS EC2.


 Under create instance, click on launch instance.
Now you have to select an Amazon Machine Image (AMI), AMIs are templates of OS
and they provide the information needed to launch an instance.
When we want to launch an instance we have to specify which AMI we want to use. It
could be Ubuntu, windows server etc.
 The AMIs could be preconfigured or you can configure it on your own according
to your requirements.
 For preconfigured AMIs you have to select it from AWS marketplace.
 For setting up your own, go to quick-start and select one.
 While configuring you will reach a point where you have to select an EBS
storage option.

Elastic Block Storage (EBS) is a persistent block level storage volumes which are used
with EC2. Here each block acts as a hard drive.
But why do we need EBS with EC2?
Just like your computer needs a hard drive, you need AWS EC2 Tutorial, AWS EC2
needs a storage volume to store the OS that your instance will be specifying. Options
for EBS are:
Provisioned IOPS: This category is for workloads which are mission critical, it
provides high IOPS rates.
General Purpose: It is for workloads which need a performance and cost balance.
Magnetic: It is for data which is accessed less frequently, and also retrieval time is
more.
 After selecting a suitable option in EBS, we give the instance a name and then we
create a security group.
 A security group acts as a firewall to control inbound and outbound traffic. Each
security group has rules according to which the traffic is governed.
 Each instance, can be assigned up to 5 security groups.

184
 Finally, in the last step the console shows all the sethertings that you have done,
you can verify and launch it.

Security in AWS EC2


To authenticate users to their instances, AWS employs a key pair method.
What is key pair?
Amazon EC2 uses public–key cryptography to encrypt and decrypt login information.
Public–key cryptography uses a public key to encrypt a piece of data, such as a
password, then the recipient uses the private key to decrypt the data. The public and
private keys are known as a key pair.
Additional Benefits: Every service from Amazon is designed keeping in mind the
customer. They claim to be the earth’s most customer-obsessed company. Having said
that let’s understand some other benefits of EC2.

Auto Scaling
Auto Scaling is a service designed by AWS EC2, which automatically launch or
terminate EC2’s instances based on user defined policies, schedules and health checks.

Elastic Load Balancing


Elastic Load Balancing (ELB) automatically distributes incoming application traffic
across multiple EC2 instances, in multiple Availability Zones.
Availability zones are basically places where amazon has set up their servers. Since
they have customers from the whole globe, they have set up multiple Availability zones
to reduce the latency.
Elastic IP Addresses are static IP addresses which are associated with your AWS
account, they can be used to mask the failure of an instance by automatically
remapping your address to another working instance in your account.

AWS EC2 Pricing


In this AWS EC2 Tutorial, let’s start with the free things first!
AWS EC2 free tier allows 750 hrs of t2.micro instance usage per month!
The free tier for EC2 is valid for 1 year from SignUp of your AWS account.
There are basically 3 pricing options in EC2:
 Spot Instances
 On Demand Instances

185
 Reserved Instances
Spot Instances is a pricing option which enables you to bid on unused EC2 instances.
The hourly price for a Spot Instance is set by AWS EC2, and it fluctuates according to
the availability of the instances in a specific Availability zone.
 Basically, you will set a price for an instance above which you do not wish to get
charged for.
 The price that you set is for per hour basis, therefore the moment the price for
that instance becomes greater than what you have set, the instance gets shut
down automatically.
On Demand Instances are used when you want to pay for the hour, with no long term
commitments and upfront payments. They are useful for applications that may have
unpredictable workloads or for test applications that are being deployed for the first
time.
Reserved Instances provide you with significant discounts as compared to On Demand
Instances. With Reserved Instances you reserve instances for a specific period of time
with three payment options:
 No Upfront
 Partial Upfront
 Full Upfront
And two term lengths:
 One Year Term
 Three Year Term
The higher the upfront payment is, the more you save money.

AWS EC2 Use Case


Next in this AWS EC2 Tutorial, let’s understand the whole EC2 instance creation
process through a use case in which we’ll be creating an Ubuntu instance for a test
environment.
 Login to AWS Management Console.

186
 Select your preferred Region. Select a region from the drop down, the selection
of the region can be done on the basis of the criteria discussed earlier in the blog.

187
 Select EC2 Service Click EC2 under Compute section. This will take you to EC2
dashboard.

188
 Click Launch Instance.
 Select an AMI : because you require a Linux instance, in the row for the basic
64-bit Ubuntu AMI, click Select.

 Choose an Instance
Select t2.micro instance, which is free tier eligible.

 Configure Instance Details.


Configure all the details and then click on add storage

189
 Add Storage

 Tag an Instance
Type a name for your AWS EC2 instance in the value box. This name, more correctly
known as tag, will appear in the console when the instance launches. It makes it easy
to keep track of running machines in a complex environment. Use a name that you can
easily recognize and remember.

190
 Create a Security Group

 Review and Launch an Instance


Verify the details that you have configured to launch an instance.

191
 Create a Key Pair & launch an Instance
Next in this AWS EC2 Tutorial, select the option ‘Create a new key pair’ and give a
name of a key pair. After that, download it in your system and save it for future use.

192
 Check the details of a launched instance.

 Converting Your Private Key Using PuTTYgen

193
PuTTY does not natively support the private key format (.pem) generated by Amazon
EC2. PuTTY has a tool called PuTTYgen, which can convert keys to the required PuTTY
format (.ppk). You must convert your private key into this format (.ppk) before
attempting to connect to your instance using PuTTY.
 Click Load. By default, PuTTYgen displays only files with the extension .ppk. To
locate your .pem file, select the option to display files of all types.

 Select your.pem file for the key pair that you specified when you launch your
instance, and then click Open. Click OK to dismiss the confirmation dialog box.
 Click Save private key to save the key in the format that PuTTY can use.
PuTTYgen displays a warning about saving the key without a passphrase. Click
Yes.
 Specify the same name for the key that you used for the key pair (for example,
my-key-pair). PuTTY automatically adds the. ppk file extension.
 Connect to EC2 instance using SSH and PuTTY
 Open PuTTY.exe

194
 In the Host Name box, enter Public IP of your instance.
 In the Category list, expand SSH.
 Click Auth (don’t expand it).
 In the Private Key file for authentication box, browse to the PPK file that you
downloaded and double-click it.
 Click Open.

 Type in Ubuntu when prompted for login ID.

195
Congratulations! You have launched an Ubuntu Instance successfully.

QLDB, BlockChain Database with Amazon webservices


One of the most common questions I get about blockchain is “What else is it good
for besides cryptocurrency?”

196
This is a fair question. Blockchain fulfills a need for distributed, immutable ledgers.
The technology brings trust to environments where falsification and counterfeiting
are especially dangerous. That’s why it was initially used for virtual money. The real
fear of counterfeiting in digital currency led to the development of a system where
change was difficult and trust built in.

Similar Problems, With a Key Difference


Other applications have the same problem, especially supply chains. For example,
when someone buys a part for an airplane, they really need to be sure it is a real
manufacturer part built to certain tolerances and using specific materials. A
counterfeit part can kill people.

There is a big difference between cryptocurrency and supply changes though.


Unlike the paper and coin money that most everybody uses, cryptocurrency is
extranational and non-sovereign. No country or even bank controls this money.
Non-sovereign currency is an open and uncontrolled environment — anyone can
make the money — and hence the trust mechanism must be distributed. In the
case of cryptocurrency, both the trustworthy ledger and the distributed nature are
important, hence blockchain.

Supply chains, however, are mostly closed. Partners, suppliers, transporters and
consumers are all generally known. The problem still exists of an unscrupulous
member of the supplier chain substituting bad products for good and the need to be
sure where and when products are, which is why the ledger is important. The
distributed part is not nearly as important because the participants are known.

The Benefits of Blockchain, But for Closed Systems


The revelation that many blockchain capabilities don't actually apply to many
blockchain applications was what drove the new AWS product, Amazon Quantum
Ledger Database, or QLDB. AWS CEO Andy Jassy introduced the product at its
recent AWS re:Invent conference. QLDB brings the immutable, cryptographically

197
verifiable aspects of blockchain to a centralized database. It is an append-only
database like blockchain but held centrally like a traditional database.

Amazon QLDB will enable many of the same applications as blockchain but with
less effort. For example, financial clearinghouses could use this technology instead
of blockchain because there is a central authority. Peer-to-peer payments is an
example of the type of financial application that could benefit from this technology.
The same holds true for tracking products from manufacturer to consumer.
Ultimately, it is in the retailers’ best interest to know where a product came from
and who it was sold to with confidence. The same could be said of tracking patient
histories in healthcare organizations.

None of these applications need to implement a distributed ledger since they are
closed systems. They do need to establish trust amongst participants. This is what
makes Amazon QLDB such a great idea. It allows companies to ingrain
trustworthiness in their applications without having to manage all the blockchain
overhead and developmental complexity.

Blockchain is a great technology but is too much for many applications. Amazon
QLDB provides just what the developer needs for certain classes of applications.

Amazon S3
Amazon Simple Storage Service (Amazon S3) is a scalable, high-speed, web-
based cloud storage service designed for online backup and archiving of data

198
and applications on Amazon Web Services. Amazon S3 was designed with a
minimal feature set and created to make web-scale computing easier for
developers

How Amazon S3 works

Amazon S3 is an object storage service, which differs from block and file cloud storage.
Each object is stored as a file with its metadata included and is given an ID number.
Applications use this ID number to access an object. Unlike file and block cloud storage, a
developer can access an object via a REST API.

The S3 cloud storage service gives a subscriber access to the same systems that Amazon
uses to run its own websites. S3 enables customers to upload, store and download
practically any file or object that is up to five terabytes (TB) in size, with the largest single
upload capped at five gigabytes (GB).

Amazon S3 features

S3 provides 99.999999999% durability for objects stored in the service and supports
multiple security and compliance certifications. An administrator can also link S3 to other
AWS security and monitoring services, including CloudTrail, CloudWatch and Macie.
There's also an extensive partner network of vendors that link their services directly to S3.

199
Data can be transferred to S3 over the public internet via access to S3 APIs. There's also
Amazon S3 Transfer Acceleration for faster movement over long distances, as well as AWS
Direct Connect for a private, consistent connection between S3 and an enterprise's own data
center. An administrator can also use AWS Snowball, a physical transfer device, to ship
large amounts of data from an enterprise data center directly to AWS, which will then
upload it to S3.

In addition, users can integrate other AWS services with S3. For example, an analyst can
query data directly on S3 either with Amazon Athena for ad hoc queries or with Amazon
Redshift Spectrum for more complex analyses.

Amazon S3 storage classes

Amazon S3 comes in three storage classes: S3 Standard, S3 Infrequent Access and Amazon
Glacier. S3 Standard is suitable for frequently accessed data that needs to be delivered with
low latency and high throughput. S3 Standard targets applications, dynamic websites,
content distribution and big data workloads.

S3 Infrequent Access offers a lower storage price for data that's needed less often, but that
must be quickly accessible. This tier can be used for backups, disaster recovery and long-
term data storage.

Amazon Glacier is the least expensive storage option in S3, but it is strictly designed for
archival storage because it takes longer to access the data. Glacier offers variable retrieval
rates that range from minutes to hours.

200
A user can also implement lifecycle management policies to curate data and move it to the
most appropriate tier over time.

Working with buckets

Amazon does not impose a limit on the number of items that a subscriber can store;
however, there are Amazon S3 bucket limitations. An Amazon S3 bucket exists within a
particular region of the cloud. An AWS customer can use an Amazon S3 API to upload
objects to a particular bucket. Customers can configure and manage S3 buckets.

Protecting your data

User data is stored on redundant servers in multiple data centers. S3 uses a simple web-
based interface -- the Amazon S3 console -- and encryption for user authentication.

S3 buckets are kept private by default, but an admin can choose to make them
publicly accessible. A user can also encrypt data prior to storage. Rights may be
specified for individual users, who will then need approved AWS credentials to
download or access a file in S3.

When a user stores data in S3, Amazon tracks the usage for billing purposes, but it
does not otherwise access the data unless required to do so by law.

201
Comparing AWS vs Azure vs Google Cloud Platforms For
Enterprise App Development
Enterprise companies around the world have made the switch from self-hosted infrastructure
to public cloud configurations. While most enterprises will always need some on-premise
technology, they are developing their applications directly in the cloud. This allows the
development teams to stay product focused, rather than having to work on the infrastructure
to support the application. By moving to the cloud, enterprises have an existing physical
infrastructure that is continuously maintained and updated. This gives them more resources
and time to dedicate to the mobile app development project at hand.
Currently, there are three main cloud platform providers that take up the majority of market
share. They are Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform
(GCP). While Azure and GCP are growing consistently, AWS remains the clear leader in
market share. Each platform has its own features and pricing that could match your mobile
application development requirements. Keep reading to how each platform compares against
each other.
Amazon Web Services
Features
The amazon cloud platform offers almost every feature under the cloud computing industry.
Their cloud services allow you to gain easy access to computing power, data storage or other
functionality necessary for app developers. AWS has many products that fall under many
categories. In addition to the features mentioned above, they offer developer tools,
management tools, mobile services and applications services. As you can imagine, the

202
application services combined with the computing and database infrastructure are critical
components to a successful enterprise mobile app development team.
Pricing
In addition to a wide range of services, the AWS cloud has adjusted the pricing of cloud
computing since inception in 2006. Their prices are very competitive with all of the other
cloud providers. The pricing for their cloud services has continued to decrease due to
competition and pricing structures. AWS offers free tiers of service for startups and
individuals. It’s an easy way to try before your buy. Moreover, development teams can
purchase servers by the second, rather than by the hour. Depending on what services the team
uses, you can certainly find a reasonable AWS price structure that is lower than the cost of all
that infrastructure investment.
You can calculate your pricing here:
 General pricing
 Free tier
 Pricing calculator
 Total Cost of Ownership [TCO]
Advantages
On top of that, the Amazon Web Services cloud platform offers developers over 15 years of
enterprise infrastructure. Since the admin teams as AWS continuously work to improve the
platform, your development team can benefit from their experience. When it comes to
management capabilities and skills, AWS has some of the best talent in the market. Of course,
you would want to choose a platform that has plenty of experience to build on.
Microsoft Azure
Features
Similar to AWS cloud services, Azure offers a full variety of solutions for app developer needs.
The platform gives you the ability to deploy and manage virtual machines as scale. You can
process and compute at whatever capacity you need within just minutes. Moreover, if your
custom software needs to run large-scale parallel batch computing, it can handle it too. This is
actually a unique feature to AWS and Azure over the Google Cloud Platform. The all
encompassing Azure features integrate into your existing systems and processes, offering
more power and capacity for enterprise development.
Pricing
When considering Azure pricing, you have to keep in mind that the costs will depend on the
types of products the development team needs. The hourly server cost can range from $0.099
per hour to $0.149 per hour. Of course, if you measure the costs by just per instance, the
prices might not seem consistent. However, the prices are pretty comparable to AWS when
you factor in the price per GB of RAM. As the main enterprise cloud service providers
compete for your business, the prices remain competitive across the board.
You can calculate your pricing here:

203
 General pricing
 Free tier
 Pricing calculator
 Total Cost of Ownership
Advantages
In addition to the full set of features and customizable pricing, the Azure platform is one of
the fastest cloud solutions available. If you are looking for a solution that excels in speed of
deployment, operation or scalability, then you might want to choose the Azure platform. They
are the leader in speed when it comes to cloud computing solutions.
Google Cloud Platform
Features
Once again, the Google Cloud Platform has a myriad of services for developers. As an
enterprise mobile app development team, you might be interested in the App Engine product.
This allows an app developer to create applications without dealing with the server. It’s a fully
managed solution for developing applications in an agile manner. Furthermore, you can
perform high level computing, storage, networking and databases with GCP. These are all
great products to use depending on the type of app development you are working on.
Although Google has a few less services than the competitors, you can find all the
requirements for mobile application development projects.
Pricing
Where GCP may fall behind in additional features, it makes up for in cost efficiency. The
platform also has pay as you go pricing, billing to the “per second” of usage. Setting GCP
apart, it offers discounts for long term usage that starts after the first month. This is great if
you need to start a new mobile app developmentproject and want to keep costs low. By
contrast, it could take over a year to get long term discounts on the other cloud service
providers. Clearly, Google is putting pressure on the competing cloud providers to keep
market prices lower.
You can calculate your pricing here:
 General pricing
 Free tier
 Pricing calculator
 Total Cost of Ownership
Advantages
As GCP continues to grow in the cloud industry, they offer another level of security. Since
Google is no stranger to enterprise level security, you can rely on their secure solutions. They
have over 500 employees that are dedicated to security protection. You will get data
encryption, multiple layers of authentication and third party validations. For developers who
need an extra buffer of security, the Google Cloud might be the best platform for you.

204
When comparing AWS vs Azure vs Google Cloud, you have many features and costs to
consider. Rather than trying to pick on solution, use enterprise cloud services that fit your
development needs. This can be a single cloud provider. Or, you can combine services from
two or three of these providers. Since the costs are relatively comparable, find the right mix of
solutions to fit your enterprise development requirements.

Google Cloud Platform:

Cloud IAM:

What is Cloud IAM? In short, it refers to the ability to manage user identities and their access to IT
resources from the cloud. Why should cloud IAM be a priority? To answer that question, let’s take a
look at the evolution of traditional identity and access management (IAM) solutions and compare them
to cloud alternatives.
Evolution of Identity and Access Management

IAM solutions have been a foundational component of IT infrastructure for many years now. In fact,
the modern era of IAM dates back to 1993, when Tim Howes and his colleagues at the University of
Michigan introduced the Lightweight Directory Access Protocol (LDAP). LDAP was designed as a
lightweight replacement to the Directory Access Protocol (DAP), which was a component of the
forerunner directory services standard known as X.500. LDAP worked so well that LDAPv3 would
become the internet standard for directory services in 1997, and directly influenced two powerful IAM
platforms: OpenLDAP™ and Microsoft® Active Directory® (AD).
Today, we know that Active Directory has been far more dominant than OpenLDAP in the IAM
market. Of course, this is primarily because Microsoft Windows® was effectively the only major

205
enterprise operating system in use in the late 1990s, when both AD and OpenLDAP were introduced.
At the time, it was common for all of the systems, applications, files, and networks in an enterprise IT
environment to be Windows-based, which gave AD a built-in advantage. In most cases, IT simply
implemented AD, and they could basically manage all of the users and IT resources in their
environment.
The IT landscape started to change when a wide variety of non-Windows resources were introduced in
the mid-2000s. This included Mac® systems, web applications like Google Apps (aka G Suite™),
Linux® servers at AWS®, Samba file servers and NAS appliances, and a lot more. Even the network
itself switched from a wired connection to WiFi. All of these changes and more have rendered legacy
solutions like AD (and OpenLDAP) far less effective in the modern enterprise. As a result, IT
administrators are now looking to cloud IAM solutions as possible alternatives.
Why Cloud IAM?

The advantages of
cloud IAM platforms are easy to recognize. For example, while legacy IAM solutions such as AD were

206
primarily focused on one platform (i.e., Windows), cloud IAM platforms such as JumpCloud®
Directory-as-a-Service® support all three major platforms (Windows, Mac, Linux). In fact, the
JumpCloud platform in particular can securely manage and connect users to virtually any IT resource –
regardless of their platform, provider, protocol, or location. More specifically, that includes systems,
applications, files, and networks, which can all be managed from a single cloud-based directory
services platform that doesn’t require anything on-prem. As a result, IT admins can enjoy a centralized
identity and access management experience delivered as a cloud-based service that spans the breadth of
their IT network.

207
Cloud Pub/Sub today, its backend messaging service that makes it
easier for developers to pass messages between machines and to
gather data from smart devices. It’s basically a scalable messaging
middleware service in the cloud that allows developers to quickly
pass information between applications, no matter where they’re
hosted. Snapchat is already using it for its Discover feature and
Google itself is using it in applications like its Cloud Monitoring
service.
Pub/Sub was in alpha for quite a while. Google first (quietly)
introduced it at its I/O developer conference last year, it never made
a big deal about the service. Until now, the service was in private
alpha, but starting today, all developers can use the service.
Using the Pub/Sub
API, developers can
create up to 10,000
topics (that’s the
entity the application
sends its messages
to) and send up to
10,000 messages per
second. Google says
notifications should
go out in under a
second “even when
tested at over 1
million messages per
second.”
The typical use cases for this service, Google says, include
balancing workloads in network clusters, implementing

208
asynchronous workflows, logging to multiple systems, and data
streaming from various devices.
During the beta period, the service is available for free. Once it
comes out of beta, developers will have to pay $0.40 per million for
the first 100 million API calls each month. Users who need to send
more messages will pay $0.25 per million for the next 2.4 billion
operations (that’s about 1,000 messages per second) and $0.05 per
million for messages above that.
Now that Pub/Sub has hit beta — and Google even announced the
pricing for the final release — chances are we will see a full launch
around Google I/O this summer.

Image
( Credits: Carlos Luna / Flickr under a CC BY 2.0 license.
o
p
e
n
s 209
i
n

Google App Engine is a Platform as a Service (PaaS) product that provides Web
app developers and enterprises with access to Google's scalable hosting and tier 1
Internet service.

The App Engine requires that apps be written in Java or Python, store data in Google BigTable
and use the Google query language. Non-compliant applications require modification to use App
Engine.

Google App Engine provides more infrastructure than other scalable hosting
services such as Amazon Elastic Compute Cloud (EC2). The App Engine also
eliminates some system administration and developmental tasks to make it easier
to write scalable applications.

Google App Engine is free up to a certain amount of resource usage. Users


exceeding the per-day or per-minute usage rates for CPU resources, storage,
number of API calls or requests and concurrent requests can pay for more of these
resources.

210
The Fundamentals of Google Compute Engine (GCE)

Google Compute Engine (GCE) is part of Google’s Infrastructure-as-a-Service (IaaS) offering, where you can build
high-performance, fault-tolerant, massively scalable compute nodes to handle your application’s needs. Virtual
Machine instances are provisioned in GCE and can be pre-packed or fully customized. In this article, we’ll cover
some of the fundamentals of VM Instances within GCE.

Virtual Machine (VM) Instances


The terms Virtual Machine and Instance are synonymous in GCP. VM Instances are comprised of on Operating
System and infrastructure resources such as CPU, Memory, Disk, and Networking. Creating VM instances can be
done using the GCP console (console.google.com), using the command line via Cloudshell or SDK, or via the REST
API.
When creating a VM instance, you have the option of choosing a predefined machine type or a custom machine type.

Machine Types
Machines types are templates of virtualized hardware that will be available to the VM instance. These resources
include the CPU, Memory, Disk capabilities, and so on.
Predefined machine types are managed by Google, and are categorized by 4 types:
Standardmachine type
Ideal for typical balanced instances with respect to RAM and CPU
Have 3.75GB of RAM per virtual CPU
High-memory machine types
Ideal for applications that require more memory

211
Have 6.5GB of RAM per virtual CPU
Shared-core machine types
These machines have one virtual CPU on a single hyper-thread of a single host CPU that is running
the instance. Ideal for non-resource intensive applications.
Very cost effective
Large machine types
Ideal for resource-intensive workloads
Up to 1TB of memory

Custom machine types


Say that none of these predefined machine types match your requirements. No worries, you can completely customize
the machine type to fit your vCPU and Memory needs. This is ideal if you have a workload that maybe requires more
processing power or memory than what is offered by the Google-provided types, or if you need GPUs. You may pay a
small premium to use custom machine types.
https://cloud.google.com/compute/docs/machine-types

Disks
After choosing a machine type which covers CPU and Memory, it’s time to choose a disk option. You have a few
options when choosing a disk type for your VM instance. The disk you choose will be your single root disk in which
your image is loaded during the boot process. Do you choose a persistent disk or a local disk?

212
Persistent Disks
Persistent disks are network-based “disks” abstracted to appear as a block device. Data is durable, meaning the data
will remain as you left it after reboots and shutdowns. Available as either a standard hard disk drive or as a solid state
drive (SSD), persistent disks are located independently of the VM instances, which means they can be detached and
reattached to other instances. You have the option to keep your disk when deleting your instance, or having it
terminated along with the instance.
Standard persistent disks
Ideal for efficient and reliable block storage
Max 64TB per instance
Only available within a single zone
SSD persistent disks
Ideal for fast and reliable block storage
Max 64TB per instance
Only available within a single zone
Other key features:
Redundancy is built-in, protecting your data from unforeseen failures.
GCE automatically encrypts all data on the persistent disk, protecting integrity with cipher keys
You can resize disks and migrate instances with zero downtime
Disks scale in performance as capacity increases
Supports snapshots

213
Local SSD
Local SSD disks are physically attached to VM instances. These will offer the highest possible IOPS and are used for
seriously intensive workloads.
Local SSD
Ideal for very high-performance local block storage
Available as SCSI or NVMe
Max 3TB (which is a total of eight 375GB disks)
Available only to a single Instance, meaning it cannot be reattached elsewhere
Only Persistent if you do not stop or terminate your instance.
Does not support snapshots
More often than not, you’ll be choosing a Standard persistent disk for your VMs. So, what’s next?

Images
Images contain a bootloader, Operating System, file system structure, and any software customizations needed for
your deployment. The image describes what actually gets loaded onto the root disk. Tons of public images are
available from Google and other authorized third-party vendors. Google Compute Engine (GCE) uses the selected
image to create a persistent boot disk for each instance.
Some public images include:
 CentOS
 Container-Optimized OS from Google
 CoreOS
 Debian
 RHEL
 SLES
 Ubuntu
 FreeBSD
 Windows Server 2008, 2012, 2016
 SQL Server on Windows Server

214
Alternatively, you can create your own custom images! Custom images can be created from a VM in your
environment that has all the necessary settings and additional software configured. You can even import custom
images from on-prem environments, or another cloud provider such as AWS.

Zones
You actually choose the zone in the very beginning of creating a new instance. We talk more about Regions and Zones
in another article, but what is important to know here is the different CPU architectures in each zone which could be a
deciding factor where you run your applications. The processor families include Broadwell, Sandy Bridge, Skylake,
Ivy Bridge, and Haswell. If you have a specific requirement for a specific processor, just visit documentation to
validate zone availability. Otherwise, just pick the zone that is closest to you or your customers.

215
What if I choose a zone and want it changed afterward?
You can absolutely move VM instances to another zone, but it will require a short outage. To do this manually, you’ll
snapshot the disk on the instance you wish to move. Next, create a new disk in the desired zone from the snapshot.
Create a new VM instance in the new zone and attach the new disk. Update any IPs and references for a clean
migration.
Alternatively, you can do this automatically with the gcloud compute instances move command.

Networking & Firewall


Each VM will have a single interface by default. This interface can be placed in a particular subnet of a particular
network with respect to the zone that you choose. You can add multiple interfaces if desired, but that is more of an
advanced topic.
By default, the primary internal IP will be set to Automatic (DHCP) and the External IP set to Ephemeral (DHCP).

216
A couple of easy checkboxes can get your VM setup with the proper firewall rules for HTTP and HTTPS traffic as
well.

Availability Policy
There are three categories under the Availability Policy when creating a new VM — Preemptibility, Automatic
Restart, and On host maintenance.

Preemptibility
Off by default, a preemptible VM is an affordable, short-lived instance ideal for batch jobs or fault-tolerant workloads.
They’re up to 80% cheaper than regular instances, so if your application can handle random the termination of VMs at
any time, then give this a look. Some common applications that use preemptible VMs are modeling or simulations,
rendering, media transcending, big data, continuous integration, and web crawling.

217
Automatic Restart
On by default, this feature of GCE can automatically restart VM instances if they are terminated due to a non-
preemption, non-user-initiated reason. Some examples are due to maintenance events or hardware/software failures.

On host maintenance
On by default, this feature of GCE can automatically migrate your VM instances to other hardware during
infrastructure maintenance periods to ensure your VMs operate with no downtime. This is called Live Migration and
is a key differentiator of GCP. Alternatively, you can set this to turn off the VM.

Other Options
There are various other options you can choose from, such as automation with startup scripts, metadata tags, SSH key,
and so on, but these are beyond the scope of fundamentals.

Accessing the VM
After creating the VM, you’ll certainly want to access it someway – but how? Well, if your VM is running Linux, you
can access it via the console through SSH, from another VM running CloudShell via the Cloud SDK, or from your
computer via SSH. If your VM is running Windows, you can use an RDP client or Powershell terminal.

218
Pricing
Lastly, let’s look at some GCE fundamentals when it comes to pricing. Google handles charges and discounts
differently than AWS and Azure. With GCP you’re always billed for the first 10 minutes and then for each minute
afterward for the life of the machine, rounded up to the nearest minute. The console will summarize this into costs per
month so you don’t have to do the per-minute math.
Discounts are extended for sustained use, meaning if you commit to keeping instances alive for some period of time, a
discount will be applied.
Another neat discount is what Google refers to as Inferred instance discounts. If you have multiple VMs of the same
machine type in the same zone, they are combined as if they were a single machine, giving you the maximum
available discount.

A sustained usage discount applies to custom machine types as well. Google Compute Engine will perform
calculations to match the best qualifying discount for usage in the month. In the example given in their
documentation, two custom machine type instances are split to provide discounts on vCPU and Memory when
stretched across the whole month instead of just 15 days.

Google Cloud Dataproc — Google’s managed Hadoop, Spark, and Flink offering. In what seems
to be a fully commoditized market at first glance, Dataproc manages to create significant
differentiated value that bodes to transform how folks think about their Hadoop workloads.
Jobs-first Hadoop+Spark, not Clusters-first
Typical mode of operation of Hadoop — on premise or in cloud — require you deploy a cluster,
and then you proceed to fill up said cluster with jobs, be it MapReduce jobs, Hive queries,
SparkSQL, etc. Pretty straightforward stuff.
The standard way of running Hadoop and Spark.
Services like Amazon EMR go a step further and let you run ephemeral clusters, enabled by
separation of storage and compute through EMRFS and S3. This means that you can discard
your cluster while keeping state on S3 after the workload is completed.
Google Cloud Platform has two critical differentiating characteristics:
 Per-minute billing (Azure has this as well)
 Very fast VM boot up times

219
When your clusters start in well under 90 seconds (under 60 seconds is not unusual), and
when you do not have to worry about wasting that hard-earned cash on your cloud provider’s
pricing inefficiencies, you can flip this cluster->jobs equation on its head. You start with a
job, and you acquire a cluster as a step in job execution.
If you have a MapReduce job, as long as you’re okay with paying the 60 second initial boot-up
tax, rather than submitting the job to an already-deployed cluster, you submit the job to
Dataproc, which creates a cluster on your behalf on-demand. A cluster is now a means to an
end for job execution.
Demonstration of my exquisite art skills, plus illustration of the jobs before clusters concept realized
with Dataproc.
Again, this is only possible with Google Dataproc, only because of:
 high granularity of billing (per-minute)
 very low tax on initial boot-up times
 separation of storage and compute (and ditching HDFS as primary store).
Operational and economic benefits are obvious and easily realized:
 Resource segregation though tenancy segregation avoids non-obvious bottlenecks and
resource contention between jobs.
 Simplicity of management — no need to actually manage the cluster or resource
allocation and priorities through things like YARN resource manager. Your
dev/stage/prod workloads are now intrinsically separate — and what a pain that is to
resolve and manage elsewhere!
 Simplicity of pricing — no need to worry about rounding up to nearest hour.
 Simplicity of cluster sizing — to get the job done faster, simply ask Dataproc to deploy
more resources for the job. When you pay per-minute, you can start thinking in terms
of VM-minutes.
 Simplicity of troubleshooting — resources are isolated, so you can’t blame the other
tenants on your problems.
I’m sure I’m forgetting others. Feel free to leave a comment here to add color. Best response
gets a collectors’ edition Google Cloud Android figurine!
Dataproc is as close as you can get to serverless and cloud-native pay-per-job with VM-based
architectures — across the entire cloud space. There’s nothing even close to it in that regard.
Dataproc does have a 10-minute minimum for pricing. Add the sub-90 second cluster creation
timer, and you rule out many relatively lightweight ad-hoc workloads. In other words, this
works for big serious batch jobs, not ad-hoc SQL queries that you want to run in under 10
seconds. I write on this topic here.(do let us know if you have a compelling use case that leaves
you asking for less than a 10-minute minimum).
The rest of the Dataproc goodies
Google Cloud doesn’t stop there. There’s a few other benefits of Dataproc that truly make your
life easier and your pockets fuller:
 Custom VMs — if you know the typical resource utilization profile of your job in terms
of CPU/RAM, you can tailor-make your own instances with that CPU/RAM profile.
This is really really cool, you guys.

220
 Preemptible VMs — I wrote on this topic recently. Google’s alternative to Spot
instances is just great. Flat 80% off, and Dataproc is smart enough to repair your jobs
in case instances go away. I beat this topic to death in the blog post, and in my biased
opinion it’s worth a read on its own.
 Best pricing in town. Google Compute Engine is the industry price leader for
comparably-sized VMs. In some cases, up t0 40% less than EC2.
 Gobs of ephemeral capacity — Yes, you can run your Spark jobs on thousands of
Preemptible VMs, and we won’t make you sign a big commitment, as this gentleman
found out (TL;DR: running 25,000 Preemptible VMs) .
 GCS is fast fast fast — When ditching HDFS in favor of object stores, what matters is
the overall pipe between storage and instances. Mr. Jornson details performance
characteristics of GCS and comparable offerings here.
Dataproc for stateful clusters
Now if you are running a stateful cluster with, say Impala and Hbase on HDFS, Dataproc is a
nice offering here too, if for some reason you don’t want to run Bigtable + BigQuery.
If you are after the biggest baddest disk performance on the market, why not go with
something that resembles RAM more than SSD in terms of performance — Google’s Local
SSD? Mr. Dinesh does a great job comparing Amazon’s and Google’s offerings here. Cliff notes
— Local SSD is really, really, really good — really.
Finally, Google’s Sustained Use Discounts automatically rewards folks who run their VMs for
longer periods of time, up to 30% off. No contracts and no commitments. And, thank
goodness, no managing your Reserved Instance bills.
You win if you use Google’s VMs for short bursts, and you win when you use Google for longer
periods.
Economics of Dataproc
We discussed how Google’s VMs are typically much cheaper through Preemptible VMs,
Custom VMs, Sustained Use Discounts, and even lower list pricing. Some folks find the
difference to be 50% cheaper!
Two things that studying Economics taught me (put down your pitchforks, I also did Math) —
the difference between soft and hard sciences, and the ability to tell a story with two-
dimensional charts.
Let’s assume a worst-case scenario, in which EMR and Dataproc VM prices are equal. We get
this chart, which hopefully requires no explanation:
Which line would you rather be on?
If you believe our good friend thehftguy’s claims that Google is 50% cheaper (after things like
Preemptible VMs, Custom VMs, Sustained Use Discounts, etc), you get this compelling chart:
Same chart, but with some more aggressive assumptions.
When you’re dishing out your precious shekels to your cloud provider, think of all this extra
blue area that you’re volunteering to pay that’s entirely spurious. This is why many of
Dataproc’s customers don’t mind paying egress from their non-Google cloud vendors to GCS!
Summary
Google Cloud has the advantage of a second-comer. Things are simpler, cheaper, and faster.
Lower-level services like instances (GCE) and storage (GCS) are more powerful and easier to
use. This, in turn, lets higher-level services like Dataproc be more effective:

221
 Cheaper — per-minute billing, Custom VMs, Preemptible VMs, sustained use
discounts, and cheaper VMs list prices.
 Faster — rapid cluster boot-up times, best-in-class object storage, best-in-class
networking, and RAM-like performance characteristics of Local SSDs.
 Easier — lots of capacity, less fragmented instance type offerings, VPC-by-default, and
images that closely follow Apache releases.
Fundamentally, Dataproc lets you think in terms of jobs, not clusters. You start with a job, and
you get a cluster as just another step in job execution. This is a very different mode of
thinking.

Serverless Showdown: AWS Lambda vs Firebase Google Cloud


Functions

If 2016 was the year of microservices, 2017 is shaping up to be the year of serverless
computing, most notably through AWS Lambda and Google Cloud Functions created through
Firebase.
Cloud Functions for Firebase were announced a month ago, bringing them into direct
competition with AWS’s offerings. This, of course, inevitably invites benchmarks and
comparisons between AWS’s and Google’s offerings. Let’s walk through the two.
Wait, what is serverless computing?
Ah, the requisite explanation.
Traditional backends have been created using monolithic servers, where a single server
may have several different responsibilities under a single codebase. Request comes in, server
executes some processing, response comes out. The same server might be responsible for
authentication, handling file uploads, and keeping track of user profiles. The key mechanic is
that if two different requests come in for two different resources, it gets handled by a single
codebase. This server might run on dedicated or virtualized machinery (or several machines!),
and persistently runs over the span of days, weeks, or months.
More recently, we’ve seen the introduction of microservices as a popular architectural
decision. With a microservices approach, there are still distinct servers, but many different
servers, which of which handles a single purpose. A single service might be in charge of user
authentication, and another one may handle file uploads. Microservice architectures are
characterized by many separate codebases and incremental deployments of each individual
service. The idea here is that a service which isn’t modified often is less likely to break, along
with providing a more logical separation of responsibilities. Like monolithic deployments,
microservices are traditionally long-running processes being executed on dedicated or
virtualized machinery.
Finally, serverless architectures. Think of them as a natural evolution or extension to
microservices.

222
This is a microservice architecture driven to the extreme. A single chunk of code, or ‘function’
is executed anytime a distinct event occurs. This event might be a user requesting to login, or
a user attempting to upload a file. These functions are traditionally very short running in
nature — the function ‘wakes up’, executes some amount of with a duration of 10 milliseconds
to 10 seconds, and is then terminated automatically by the service provider. No persistence,
no dedicated machinery — in effect, you have no idea where your code is running at any given
time. The benefit to serverless architectures shares some of the benefits of a microservices
based approach, where each function has some distinct responsibility and logical separation.
The Test App
To compare the two services, I wrote a small React Native application with the intent of
providing one-time-password authentication.
Rather than expecting a user to enter a tedious email and password combination, the user is
expected to enter just their phone number. Once we have their phone number in hand, we
generate a short six-digit token then text it to the user via SMS. The user then enters the code
into our app, after which we expect them to enter the code back into our app. If they enter the
correct code, great, they are now authenticated.
Given that the code is the key authenticating factor, its something that clearly shouldn’t be
generated or stored directly on the user’s mobile device. Instead, we should generate and store
the code somewhere else, somewhere that the user doesn’t have any type of read access to.
Enter our serverless functions!
Its always important to plan out the different cloud functions that will be created. In this case,
I see three clear phases of the login process where some amount of logic must be executed in a
secure environment:
1. Create a new user (sign up)
2. Generate, save, and text a new login code (sign in)
3. Verify a login code
Each function we create is assigned a unique name, usually to identify its purpose. I followed a
simple nomenclature, opting for ‘createUser’, ‘requestOneTimePassword’, and
‘verifyOneTimePassword’.
With these three functions in mind, let’s walk through the deployment process
Function Creation — Lambda
Creation of functions with Lambda can take two forms, either direct access of the Lambda
Console or through the Serverless framework. I chose to use the Serverless framework, as it
made deployment (later) much easier.
Serverless encourages centralizing all configuration of your functions into a single YML file.
The YML file requires the function name as it will be displayed on the Lambda console, the
name of the function in your code base, and some configuration on when to execute the
function. In our case, we wanted to execute the function on an incoming HTTP request with a
method of POST.
Here’s the relevant snippet of config from the YML file for creating a new user:
functions:
userCreate:
handler: handler.userCreate

223
events:
- http:
path: users
method: post
integration: lambda-proxy
cors: true

One of the interesting aspects of AWS Lambda is that it is truly built assuming that you’ll have
any type of event driving a function invocation, not just an incoming HTTP request issued by
a client device. Other valid triggers might be a file upload to S3, or a deploy to some other
service on AWS. Even though its clear to you and me that we only want to run the function
with an incoming HTTP request, we still have to be awfully explicit.
I found writing the actual function to require a little more boilerplate than I’d like:
const firebase = require('./firebase');
const helpers = require('./helpers');
const handleError = helpers.handleError;
const handleSuccess = helpers.handleSuccess;

module.exports = function(event, context, callback) {


const body = JSON.parse(event.body);

if (!body.phone) {
return handleError(context, { error: 'Bad Input' });
}

const phone = String(body.phone).replace(/[^\d]/g, "");

firebase.auth().createUser({
uid: phone
})
.then(user => handleSuccess(context, { uid: phone }))
.catch((err) => handleError(context, { error: 'Email or phone in
use' }));
}

You will notice a reference to firebase in here; I am still using Firebase for user management,
even though the app is hosted on AWS infrastructure.
Yep, the request body has to be manually parsed. You’ll also notice that I made some
‘handleSuccess’ and ‘handleError’ helpers, to avoid some otherwise awful boilerplate. Here’s
‘handleSuccess’:
handleSuccess(context, data) {
context.succeed({
"statusCode": 200,
"headers": { "Content-Type": "application/json" },

224
"body": JSON.stringify(data)
});
}
}

Again, don’t expect Lambda to handle JSON encoding or decoding for you, this is all manual.
Function Creation — Google Cloud Functions
Project creation with Cloud Functions was clearly easier. Its clear that the managers around
this project assume that the most common use case is handling incoming HTTP requests, so
there wasn’t a tremendous amount of configuration to route a particular event to a particular
function.
Generation of the initial project was done by using the firebase CLI, which I hadn’t been
previously familiar with. The CLI generates an entire Firebase project, which allows hosting
important configuration like your security rules in a VCS, rather than relying entirely upon the
console rule editor.
Definition of the functions took place inside of a Javascript file, where each export is
essentially assumed to be a deployable function. For example:
exports.createUser = functions.https.onRequest(createUser);

The actual function creation was far more straightforward.


const admin = require('firebase-admin');

module.exports = function(req, res) {


if (!req.body.phone) {
return res.status(422).send({ error: 'Bad Input' });
}

const phone = String(req.body.phone).replace(/[^\d]/g, "");

admin.auth().createUser({ uid: phone })


.then(user => res.send(user))
.catch(err => res.status(422).send({ error: err }));
}

Fans of Express JS will immediately be at home with the req, res function signature. The
request and response objects use an identical API to Express’, which makes for a
straightforward learning curve. Also notice no need for complicated boilerplate around
handling responses.
Winner: Google Cloud Functions
Creating functions with Firebase is a clear winner. There’s less upfront configuration required,
along with a far more palatable API. Of course, the caveat is that Firebase’s amount of
configuration is smaller because there are fewer function triggers available on Firebase. No

225
need to specify that a function should be executed on an incoming HTTP request when there
are only six different ways of triggering them
Deployment
Certainly not much to say here, as the deployment process is nearly identical on both
platforms. Having set up the initial project with Serverless, deployment on the AWS side was
as easy as a terminal command:
serverless deploy

Firebase deployment was similar by using the Firebase CLI


firebase deploy

In both cases, the time from initiating the deployment to seeing the function go live was about
forty seconds. Nothing to lose sleep over.
Winner: Tie
Testing — Lambda
If function creation was easier on Firebase, I can confidently say that testing your functions in
a staging environment is far easier on AWS.
For the above project, I spent around two hours from start to finish on AWS, whereas the
same exact project took around five hours, simply because of of the atrocious debug cycle. It
all comes down to the presence of a simple tool on the AWS side — the beautiful blue Test
button.
Once your function has been deployed, you can create a ‘test’ event, by manually creating a
request to be sent directly to your function. In this case, I wanted to manually test the creation
of a new user by providing a unique phone number. Using one of the sample templates, I
manipulated the body of the request to include a phone number, then saved the test event.
Once your test event is created, that beautiful blue Test button will execute your function
instantaneously and immediately show output from the execution in plain text, including not
only the function’s request response, but also any log output coming from the function.
Testing — Google Cloud Functions
June 8 update: There is a testing mechanism for Cloud Functions, but it’s not (currently)
available in the Firebase console. If you access the “Cloud Console”
(https://console.cloud.google.com) you’ll see Cloud Functions there with a range of
capabilities, including quick testing. There is also a local emulator which allows you to debug
functions locally, and Cloud Platform also has a (free) Cloud Debugger which actually lets you
put a breakpoint on live code!
Original writeup: Let me be clear: manual testing of Cloud Functions is a pain, stemming
from two aspects:
1. Cloud Function’s don’t have a built in testing solution with a quick feedback
mechanism as AWS does
2. Getting logs to the Firebase console usually involves waiting for about one to five
minutes

226
To the first point, manual testing of Cloud Functions revolves around your favorite HTTP
request utility, be it curl or Postman. If your function fails to execute due to some hidden typo,
rest assured that you’ll get a 50x status code without much more information, rather than any
helpful debug output.
If you do want to get information out, you’ll be using Firebase’s Function console.
At the console, you’re limited to seeing only logged information, as opposed to AWS’s console
which shows both log statements and function response bodies.
But the biggest gripe I have is how long it takes to see logs appear here. With stopwatch in
hand, it would take one to five minutes of waiting to see any log information pop up from a
single request. That terrible feedback loop lead to a lot of confusion as I tried to keep the order
in which I’d execute test requests in mind. Let’s face it; when you have a long feedback loop
like that, you may immediately execute one to five manual tests, then try to decipher the
output you receive a few minutes later. Not fun.
Winner: AWS Lambda
Pricing
In general, you can count on paying for function invocations based on two metrics: the
number of invocations, and the amount of time each invocation takes to execute, modified by
the hardware that the function is executed upon.
June 8 Update: I have neglected to include Amazon’s API Gateway price, which is $3.50
per million requests and is necessary if you want to have HTTP invocation of the function.
Cloud Functions includes this for no extra charge. So the 19,193,857 requests you quoted for
AWS would actually cost ~$65, not $1, which is a pretty large difference.
Original: At the time of this writing, Cloud Functions cost $0.40 per million invocations
(after two million that are free), while Lambda clocks in at $0.20 per million invocations
(after one million that are free).
Execution environment refers to the hardware that is used to run the function. More powerful
hardware, more cost. Its a bit of an exercise in engineering economics, however. If you’re
running a computation heavy function that takes some non-zero amount of time to execute,
you might think to use a less powerful machine, as it costs less money per millisecond of
execution time. But its a double edged sword; the slower the machine, the more milliseconds
you’re spending! I’d love to do some followup work to figure out the sweet spot in machine
size for compute-heavy tasks.
Google Cloud Function’s invocation time pricing is a function of the CPU plus RAM size,
whereas AWS is a function of the RAM size only.
For example, a function that takes 100ms to execute on a 256mb memory machine with a
400mhz cpu would cost the following on Google :
(256mb/1024(gb/mb)) * .5s * $0.0000025 gb-s
+ (400mhz / 1000 ghz/mhz) * .5s * $0.0000100 gb-s
= $0.0000003125 + $0.000002
= $0.0000023125 per request

Or, put another way, you’d get 432,432 requests for $1 on Google, not including the free tier
or flat cost of invocation.

227
On AWS Lambda, a similar setup would cost
(256mb/1024(gb/mb) * .5s * $0.000000417 gb-s
= $0.0000000521

Or, put another way, you’d get 19,193,857 invocations for $1, not including the free tier or flat
cost of invocation. A factor of four, really? Someone check my math, please.
Winner: AWS
Conclusion
At this point, AWS Lambda is head and shoulders above Google Cloud Functions. The testing
cycle feels much tighter, and the pricing is currently no-contest. Function creation is a bit
easier with Google Cloud, but as soon as you get that boilerplate down you’re good to go.
Officially, Google Cloud Functions are still in beta, so we might see price reductions at some
point in time, or better tooling, but for now I can’t help but point friends over to AWS
Lambda.

228
229
Cloud Shell:

No localhost? No problem! Use


Google Cloud Shell as the
development environment.
What is Google Cloud Shell
Cloud Shell is a free terminal that you can use for whatever you want, and it runs 100% in a
web browser.
Click this link to open Cloud Shell: https://console.cloud.google.com/cloudshell
Seeing how most of my work requires the internet, I’m completely fine with being tied to a
browser.
What does it come with?
Cloud Shell comes with most of the tools I use on a daily basis right out of the box. These
include gcloud, node, kubectl, docker, go, python, git, vim, and more.
There is 5GB of persistent storage that is tied to your $HOME directory, and this is also
completely free.
The other really nice thing is that you are automatically authenticated to the current Google
Cloud project you’re working in. This makes setup super easy, everything just works!
Tips and Tricks
1. Running a web server (with auto-HTTPS for FREE!)
Most people developing cloud apps usually run a web server of some sort. Usually, you would
run it locally and just type in “localhost” into your web browser to access it.
This is not possible with Cloud Shell, so the team created a neat “web preview” function that
creates a URL on the fly to point to your local server.
You can open up any port from 2000 to 65000, and your web traffic will come through!
Warning: There is some transparent auth done behind the scenes, so the URL the Cloud
Shell opens might not work if you give it to someone else. I’d recommend a tool like ngrok if
you want to share your “localhost” connection remotely.
2. Get extra power with “boost mode”
By default, Cloud Shell runs on a g1-small VM, which can be under-powered for some tasks.
You can easily upgrade to a n1-standard-1 by enabling “boost mode.” It’s like the TURBO
button on an old PC, but this time it actually works :)

230
3. Edit your files with a GUI
Yes yes, vim and emacs and nano are great and all. But sometimes you just want a nice,
comfortable GUI to work with.
Cloud Shell ships with a customized version of the Orion editor.
While its not as good as VS Code or Eclipse, its actually a fully featured editor and I feel quite
productive with it!
4. Upload/Download files
If you have files locally you want to upload to cloud shell, just click the “Upload” button in the
menu and choose your file.
To download files, run this inside Cloud Shell:
$ cloudshell dl <FILENAME>

And it will download the file!


5. Persist binary/program installations
Because Cloud Shell only persists your $HOME directory, if you install things that don’t come
out of the box with Cloud Shell, chances are it will be gone the next time you use it.
If you install things with apt-get, there really is no good solution. However, if you are
downloading the binary directly or are compiling from source, you can create a path in your
$HOME directory (for example, /home/sandeepdinesh/bin) and add that to your PATH. Any
binaries in this folder will run like normal, and they will be persisted between reboots.
Bonus: Open in Cloud Shell
If you have a git repository in a public place (like a public GitHub repo), you can actually
create a link that will open the end user’s Cloud Shell and automatically clone the repo to their
$HOME directory.
Open in Cloud Shell | Cloud Shell | Google Cloud
To provide a link to open a sample Git repository in Cloud Shell, you would need to use Markdown
such as the following…cloud.google.com
Because it is all free, it is a great way to get someone up and running with your project without
having to worry about their local machine!

231
Firebase Components:

Firebase
Firebase is a mobile and web app development platform that provides developers with a
plethora of tools and services to help them develop high-quality apps, grow their user base,
and earn more profit.
A Brief History
Back in 2011, before Firebase was Firebase, it was a startup called Envolve. As Envolve, it
provided developers with an API that enabled the integration of online chat functionality into
their website.
What’s interesting is that people used Envolve to pass application data that was more than
just chat messages. Developers were using Envolve to sync application data such as a game
state in real time across their users.
This led the founders of Envolve, James Tamplin and Andrew Lee, to separate the chat system
and the real-time architecture. In April 2012, Firebase was created as a separate company that
provided Backend-as-a-Service with real-time functionality.
After it was acquired by Google in 2014, Firebase rapidly evolved into the multifunctional
behemoth of a mobile and web platform that it is today.
Firebase Services
Firebase Services can be divided into two groups:

232
Develop & test your app
 Realtime Database
 Auth
 Test Lab
 Crashlytics
 Cloud Functions
 Firestore
 Cloud Storage
 Performance Monitoring
 Crash Reporting
 Hosting
Grow & Engage your audience
 Firebase Analytics
 Invites
 Cloud Messaging
 Predictions
 AdMob
 Dynamic Links
 Adwords

233
 Remote Config
 App Indexing
Realtime Database
The Firebase Realtime Database is a cloud-hosted NoSQL database that lets you store and
sync between your users in realtime.
The Realtime Database is really just one big JSON object that the developers can manage in
realtime.
Realtime Database => A Tree of Values
With just a single API, the Firebase database provides your app with both the current value of
the data and any updates to that data.
Realtime syncing makes it easy for your
users to access their data from any
device, be it web or mobile. Realtime
Database also helps your users
collaborate with one another.
Another amazing benefit of Realtime Database is that it ships with mobile and web SDKs,
allowing you to build your apps without the need for servers.
When your users go offline, the Realtime Database SDKs use local cache on the device to serve
and store changes. When the device comes online, the local data is automatically
synchronized.
The Realtime Database can also integrate with Firebase Authentication to provide a simple
and intuitive authentication process.

Authentication
Firebase Authentication provides backend services, easy-to-use SDKs, and ready-made UI
libraries to authenticate users to your app.
Normally, it would take you months to set up your own authentication system. And even after
that, you would need to keep a dedicated team to maintain that system. But if you use
Firebase, you can set up the entire system in under 10 lines of code that will handle
everything for you, including complex operations like account merging.
You can authenticate your app’s users through the following methods:
 Email & Password
 Phone numbers
 Google
 Facebook
 Twitter
 & more!
Using Firebase Authentication makes building secure authentication systems easier, while
also improving the sign-in and onboarding experience for end users.

234
Firebase Authentication is built by the same people who created Google Sign-in, Smart Lock,
and Chrome Password Manager.
Firebase Cloud Messaging (FCM)
Firebase Cloud Messaging (FCM) provides a reliable and battery-efficient
connection between your server and devices that allows you to deliver and
receive messages and notifications on iOS, Android, and the web at no cost.
You can send notification messages (2KB limit) and data messages (4KB limit).
Using FCM, you can easily target messages using predefined segments or create your own,
using demographics and behavior. You can send messages to a group of devices that are
subscribed to specific topics, or you can get as granular as a single device.
FCM can deliver messages instantly, or at a future time in the user’s local time zone. You can
send custom app data like setting priorities, sounds, and expiration dates, and also track
custom conversion events.
The best thing about FCM is that there is hardly any coding involved! FCM is completely
integrated with Firebase Analytics, giving you detailed engagement and conversion tracking.
You can also use A/B testing to try out different versions of your notification messages, and
then select the one which performs best against your goals.
Firebase Database Query
Firebase has simplified the process of retrieving specific data from the database through
queries. Queries are created by chaining together one or more filter methods.
Firebase has 4 ordering functions:
 orderByKey()
 orderByChild(‘child’)
 orderByValue()
 orderByPriority()
Note that you will only receive data from a query if you have used the on() or once()
method.
You can also use these advanced querying functions to further restrict data:
 startAt(‘value’)
 endAt(‘value’)
 equalTo(‘child_key’)
 limitToFirst(10)
 limitToLast(10)
In SQL, the basics of querying involve two steps. First, you select the columns from your table.
Here I am selecting the Users column. Next, you can apply a restriction to your query using
the WHERE clause. From the below-given query, I will get a list of Users whose name is
GeekyAnts.
You can also use the LIMIT clause, which will restrict the number of results that you will get
back from your query.

235
In Firebase, querying also involves two steps. First, you create a reference to the parent key
and then you use an ordering function. Optionally, you can also append a querying function
for a more advanced restricting.
How to Store Data? => Firebase Storage
Firebase Storage is a standalone solution for uploading user-generated content like images
and videos from an iOS and Android device, as well as the Web.
Firebase Storage is designed specifically to scale your apps, provide security, and
ensure network resiliency.
Firebase Storage uses a simple folder/file system to structure its data.
Firebase Test Labs
Firebase Test Labs provides a large number of mobile test devices to help you test your apps.
Firebase Test Labs comes with 3 modes of testing:
Instrumentation Test
These are tests that you written specifically to test your app, using frameworks like Espresso
and UI Automator 2.0
 Robo Test
This test is for people who just want to relax and let Firebase worry about tests. Firebase Test
Labs can simulate user touch and see how each component of the app functions.
 Game Loop Test
Test Labs support game app testing. It comes with a beta support for using a “demo mode”
where the game app runs while simulating the actions of the player.
Remote Config
Remote config essentially allows us to publish updates to our
users immediately. Whether we wish to change the color scheme
for a screen, the layout for a particular section in our app or show
promotional/seasonal options — this is completely doable using
the server side parameters without the need to publish a new
version.
Remote Config gives us the power to:
 Quickly and easily update our applications without the need to publish a new build to
the app/play store.
 Effortlessly set how a segment behaves or looks in our application based on the
user/device that is using it.
Firebase App Indexing
To get your app’s content indexed by Google, use the same URLs in your app that you use on
your website and verify that you own both your app and your website. Google Search crawls
the links on your website and serves them in Search results. Then, users who’ve installed your
app on their devices go directly to the content in your app when they click on a link.

236
Firebase Dynamic Links
Deep links are URLs that take you to a content. Most web links are deep links.
Firebase can now modify deep links into Dynamic Links! Dynamic Links allow the user to
directly come to a particular location in your app.
There are 3 fundamental uses for Dynamic Links
 Convert Mobile Web Users to Native App Users.
 Increase conversion for user-to-user sharing. By converting your app’s users, when the
app is shared with other users you can skip the generic message which is shown when a
user downloads it from the store. Instead, you can show them personalised greeting
message.
 Drive installs from the third party. You can use social media networks, email, and SMS
can be used to increase your target audience. When users install the app, they can see
the exact content of your campaigns.

Firestore
Cloud Firestore is a NoSQL document database that lets you easily store, sync, and query data
for your mobile and web apps — at a global scale.
Though this may sound like something similar to the Realtime Database, Firestore brings
many new things to the platform that makes it into something completely different from
Realtime Database.
Improved Querying and Data Structure
Where Realtime Database stores data in the form of a giant JSON tree, Cloud Firestore takes a
much more structured approach. Firestore keeps its data inside objects called documents.
These documents consist of key-value pairs and can contain any kind of data, from strings to
binary data to even objects that resemble JSON trees (Firestore calls it as maps). The
documents, in turn, are grouped into collections.
Firestore database can consist of multiple collections that can contain
documents pointing towards sub-collections. These sub-collections can
again contain documents that point to other sub-collections, and so on.
You can build hierarchies to store related data and easily retrieve any data that you need using
queries. All queries can scale with the size of your result set, so your app is ready to scale from
its first day itself.
Firestore’s queries are shallow. By this, I mean to say that in Firestore, you can simply fetch
any document that you want without having to fetch all of the data that is contained in any of
its linked sub-collections.
You can fetch a single document without having to grab any of its sub-collections

237
Query with Firestore
Imagine that you have created a collection in Firestore that contains a list of Cities. So, before
you can send out a query, you will have to store the database inside a variable.
Here, citiesRef is that variable that contains your collection of cities. Now, if you want to
find a list of capital cities, you would write a query like this:
Here’s another example of queries in Firestore. Say you want to see only 2 of cities from your
database whose population is more than 100,000.
But Cloud Firestore can make querying even easier! In some cases, Cloud Firestore can
automatically search your database across multiple fields. Firestore will guide you towards
automatically building an index that will help Firestore to make querying extremely simple.
Better Scalability
Though Firebase’s Realtime Database is capable of scaling, things will start to get crazy when
you app becomes really popular or if your database becomes really massive.
Cloud Firestore is based on Googles Cloud infrastructure. This allows it to scale much more
easily and to a greater capacity than the Realtime Database.
Multi-Region Database
In Firestore, your data is automatically copied to various regions. So if one data center goes
offline due to some unforeseen reason, you can be sure that your app’s data is still safe
somewhere else.
Firestore’s multi-region database also provides strong consistency. Any changes to your data
will be mirrored across every copy of your database.
Different Pricing Model
The Realtime Database charges its users based on the amount of data that you have stored in
the database.
Cloud Firestore also charges you for the same, but the cost is significantly lower than that of
Realtime Database and instead of basing the cost on the amount of data stored, Firestore’s
pricing is driven by the number of reads/writes that you perform.

238
239
Google Cloud Console:
Console that aims for the Management of your information on all the elements that come with
your application: web applications, data analysis, virtual machines, databases, network,
services for developers, etc. increase their capacity as needed and diagnose production
issues in an easy-to-use web interface. Search for resources quickly and connect to instances
via SSH within your browser. Manage the development processes where you are, with native
iOS and Android apps. Master complex development tasks with Google Cloud, your cloud-
based management machine.

OpenShift on OpenStack 1-2-3: Bringing IaaS and


PaaS Together

Overview
In this article, we will explore why you should consider tackling IaaS and PaaS together. Many
organizations gave up on OpenStack during its hype phase, but in my view, it is time to reconsider the
IaaS strategy. Two main factors are really pushing a re-emergence of interest in OpenStack and that is
containers and cloud.
Containers require very flexible, software-defined infrastructure and are changing the application
landscape fast. Remember when we had the discussions about pets vs cattle? The issue with OpenStack

240
during its hype phase was that the workloads simply didn’t exist within most organizations, but now
containers are changing that, from a platform perspective. Containers need to be orchestrated and the
industry has settled in on Kubernetes for that purpose. In order to run Kubernetes, you need quite a lot
of flexibility at scale on the infrastructure level. You must be able to provide solid Software Defined
Networking, Compute, Storage, Load Balancing, DNS, Authentication, Orchestration, basically
everything and do so at a click of the button. Yeah, we can all do that, right.
If we think about IT, there are two types of personas. Those that feel IT is generic, 80% is good enough
and for them, it is a light switch: on or off. This persona has no reason whatsoever to deal with IaaS and
should just go to the public cloud, if not already there. In other words, OpenStack makes no sense. The
other persona feel IT adds compelling value to their business and going beyond 80% provides them
with distinct business advantages. Anyone can go to public cloud but if you can turn IT into a
competitive advantage then there may actually be a purpose for it. Unfortunately, with the way many
organizations go about IT today, it is not really viable, unless something dramatic happens. This brings
me back to OpenStack. It is the only way an organization can provide the capabilities a public cloud
offers while also matching price, performance and providing a competitive advantage. If we cannot
achieve the flexibility of public cloud, the consumption model, the cost effectiveness and provide
compelling business advantage then we ought to just give up right?
I also find it interesting that some organizations, even those that started in the public cloud are starting
to see value in build-your-own. Dropbox, for example, originally started using AWS and S3. Over last
few years they built their own object storage solution, one that provided more value and saved 75
million over two years. They also did so with a fairly small team. I certainly am not advocating for
doing everything yourself, I am just saying that we need to make a decision, does IT provide
compelling business value? Can you do it for your business, better than the generic level playing field
known as public cloud? If so, you really ought to be looking into OpenStack and using momentum
behind containers to bring about real change.

OpenShift and the Case for OpenStack


OpenShift, of course, is infrastructure independent. You can run it on public cloud, virtualization,
baremetal or anything that can boot Red Hat Enterprise Linux. All organizations definitely want and
will use the public cloud but likely will also want to maintain control, avoiding lock-in. OpenShift is
the only way to truly get multi-cloud, enterprise Kubernetes. The idea here with OpenStack is to deliver
the on-premise portion of multi-cloud, with the same capabilities as public cloud. Today organizations
have an incredible investment in their on-premise IT. Even if you don’t see IT as a value generator, it is
clear you most likely won’t want to divest all those resources at once. Growth will most likely be
augmented by public cloud as opposed to a complete migration.
To the next point, what is the right infrastructure to actually run on? Certainly, over the years a vast
majority of applications have moved to virtualization platforms but not all. I expect this also remains.
Why? Well beyond 16 vCPUs, VMs start getting into the law of diminishing returns. You end up
getting less value out of hyperthreading and usually needing to limit vCPUs to number of cores.
Baremetal may also have advantages in certain container use cases like large-scale computing. With
emergence of AI and also need for large data crunching, baremetal could actually be gaining steam as a
future platform. Regardless the point here is you may want your containers to run in VMs (smaller
OpenShift Nodes) or baremetal (larger OpenShift nodes) and this is highly dependent on application or

241
workload. Finally, there are other factors that could make baremetal play important role that won’t be
covered, cost/performance or isolation/security.
If we stick to virtualization technology we have one and only one choice. This again is where
OpenStack shines, at least Red Hat OpenStack. One of the components shipped is ironic (metal-as-a-
service). Ironic allows us to manage baremetal just like a virtual machine, in fact in OpenStack there is
no difference and why OpenStack refers to compute units as instances, because it could be either.
OpenStack can provide OpenShift with VM or baremetal based nodes and much, much more.

OpenShift integration with OpenStack


OpenShift and OpenStack fit perfectly together. Below is a list of the major integration points.
 Keystone provides identity and can be used to authenticate OpenShift or LDAP users.
 Ceilometer provides telemetry of IaaS allowing correlation using CloudForms between
container, node, and instance.
 Multitenant could help if running many OpenShift clusters.
 Heat provides orchestration enabling dynamic scale-up or scale-down of OpenShift cluster.
 Nova provides OpenShift nodes as a VM or baremetal instance.
 Neutron provides SDN and through Kuryr (starting with Red Hat OpenStack 13) will allow
neutron SDN to be consumed in OpenShift directly allowing single SDN to serve both container
and non-container workloads.
 Cinder provides dynamic storage and provisioning for containers running in OpenShift.
 LBaaS provides load balancer for API across masters and for application traffic across
infrastructure nodes running OpenShift router.
 Designate provides DNS and OpenShift needs either dynamic DNS or to use wildcard for
application domains.
 Ironic plugs into Nova via ironic conductor and allows provisioning of baremetal systems.
openshift_on_openstack_high_level

242
OpenShift on OpenStack Architectures
Important to any underlying architecture discussion is how to group OpenShift masters, infrastructure
and application nodes. OpenStack provides two different possibilities.
Resource vs AutoScaling Groups
Resource groups allow us to group instances together and apply affinity or anti-affinity policies via the
OpenStack scheduler. AutoScaling groups allow us to group instances and based on alarms, scale-up or
scale-down those instances automatically. At first glance, you would think for masters and infra nodes
use resource groups and app nodes autoscaling groups. While autoscaling sounds great, especially for
app nodes, there are a lot of possibilities that can lead to scaling either happening or not happening
when desired. My experience is this can work well with simple WordPress-type applications but not
something more complex, like a container platform or OpenShift. Also another disadvantage with
autoscaling groups is they don’t support an index. Indexes within groups are used to increment the
instance name: master0, master1 and so on. A final point is that you can easily scale resource groups, it
just needs to be triggered by an update to the Heat stack. The nice thing is you can also control scaling
and if it is to be automated, you have more flexibility than relying on alarms in Ceilometer. For all of
these reasons, I recommend creating three resource groups: masters, infras,and nodes.
Two common OpenShift architectures for OpenStack are non-ha and ha within single tenant.
Non-HA
In this architecture, we will have one master, one infra node, and x application nodes. While certainly
application availability can be achieved by deploying across multiple nodes, the master presents a

243
single point of failure for the data plane. The infra node runs the OpenShift router and as such a failure
here would mean incoming traffic to applications would be interrupted.

HA
The HA architecture typically has three masters, two infra nodes and x app nodes. There are variations
where you could have 3 infra nodes if you are running metrics and logging services that require a third
node. In addition you could also split etcd and run it independently, on three additional nodes. If
east/west traffic is not allowed between network zones, then you would likely require two infra nodes
in each zone, to handle incoming traffic for app nodes. There are many variations of course, but for
now let us keep it simple.

244
Deploying OpenStack
In order to deploy OpenShift on OpenStack we obviously need OpenStack. Here are some guides to
help.
 OpenStack 12 (Pike) RDO Lab Installation and Configuration Guide on Hetzner Root Servers
 OpenStack 11 (Ocata) RDO Lab Installation and Configuration Guide
 Red Hat OpenStack Platform 10 (Newton) Installation and Configuration Guide
Once OpenStack is deployed you need to ensure a few things are in place.
Create Flavors
# openstack flavor create --ram 2048 --disk 30 --ephemeral 0 --vcpus 1 --
public ocp.bastion
# openstack flavor create --ram 8192 --disk 30 --ephemeral 0 --vcpus 2 --
public ocp.master
# openstack flavor create --ram 8192 --disk 30 --ephemeral 0 --vcpus 1 --
public ocp.infra
# openstack flavor create --ram 8192 --disk 30 --ephemeral 0 --vcpus 1 --
public ocp.node

Create RHEL Image


Download RHEL 7.4 Cloud (qcow2) Image
# openstack image create --disk-format qcow2 \
--container-format bare --public \

245
--file /root/rhel-server-7.4-x86_64-kvm.qcow2 "rhel74"

Create Private Key


# openstack keypair create admin

Save Private Key


# vi /root/admin.pem
-----BEGIN RSA PRIVATE KEY-----
MIIEowIBAAKCAQEAwTrb+xdbpgY8hVOmftBIShqYUgXXDC/1gggakq8bkEdNnSku
IaNGeJykzksjdksjd9383iejkjsu92wiwsajFLuE2lkh5dvk9s6hpfE/3UvSGk6m
HWIMCf3nJUv8gGCM/XElgwNXS02c8pHWUywBiaQZpOsvjCqFLGW0cNZLAQ+yzrh1
dIWddx/E1Ppto394ejfksjdksjdksdhgu4t39393eodNlVQxWzmK4vrLWNrfioOK
uRxjxY6jnE3q/956ie69BXbbvrZYcs75YeSY7GjwyC5yjWw9qkiBcV1+P1Uqs1jG
1yV0Zvl5xlI1M4b97qw0bgpjTETL5+iuidFPVwIDAQABAoIBAF7rC95m1fVTQO15
buMCa1BDiilYhw+Mi3wJgQwnClIwRHb8IJYTf22F/QptrrBd0LZk/UHJhekINXot
z0jJ+WvtxVAA0038jskdjskdjksjksjkiH9Mh39tAtt2XR2uz/M7XmLiBEKQaJVb
gD2w8zxqqNIz3438783787387s8s787s8sIAkP3ZMAra1k7+rY1HfCYsRDWxhqTx
R5FFwYueMIldlfPdGxwd8hLrqJnDY7SO85iFWv5Kf1ykyi3PRA6r2Vr/0PMkVsKV
XfxhYPkAOb2hNKRDhkvZPmmxXu5wy8WkGeq+uTWRY3DoyciuC4xMS0NMd6Y20pfp
x50AhJkCgYEA8M2OfUan1ghV3V8WsQ94vguHqe8jzLcV+1PV2iTwWFBZDZEQPokY
JkMCAtHFvUlcJ49yAjrRH6O+EGT+niW8xIhZBiu6whOd4H0xDoQvaAAZyIFoSmbX
2WpS74Ms5YSzVip70hbcXb4goDhdW9YxvTVqJlFrsGNCEa3L4kr2qFMCgYEAzWy0
5cSHkCWaygeYhFc79xoTnPxKZH+QI32dAeud7oyZtZeZDRyjnm2fEtDCEn6RtFTH
NlI3W6xFkXcp1u0wbmYJJVZdn1u9aRsLzVmfGwEWGHYEfZ+ZtQH+H9XHXsi1nPpr
Uy7Msd,sl,.swdko393j495u4efdjkfjdkjfhflCgYEA7VO6Xo/XdKPMdJx2EdXM
y4kzkPFHGElN2eE7gskjdksjdksjkasnw33a23433434wk0P8VCksQlBlojjRVyu
GgjDrMhGjWamEA1y3vq6eka3Ip0f+0w26mnXCYYAJslNstu2I04yrBVptF846/1E
ElXlo5RVjYeWIzRmIEZ/qU8CgYB91kOSJKuuX3rMm46QMyfmnLC7D8k6evH+66nM
238493ijsfkjalsdjcws9fheoihg80eWDSAFDOASDF=OIA=FSLoiidsiisiisNDo
ACh40FeKsHDby3LK8OeM9NXmeCjYeoZYNGimHForiCCT+rIniiu2vy0Z/q+/t3cM
BgmAmQKBgCwCTX5kbLEUcX5IE6Nzh+1n/lkvIqlblOG7v0Y9sxKVxx4R9uXi3dNK
6pbclskdksdjdk22k2jkj2kalksx2koUeLzwHuRUpMavRhoTLP0YsdbQrjgHIA+p
kDNrgFz+JYKF2K08oe72x1083RtiEr8n71kjSA+5Ua1eNwGI6AVl
-----END RSA PRIVATE KEY-----

# chmod 400 /root/admin.pem

Create Public Floating IP Network


# openstack network create --provider-network-type flat \
--provider-physical-network extnet2 --external public

Create Public Floating IP Subnet


# openstack subnet create --network public --allocation-pool \

246
start=144.76.132.226,end=144.76.132.230 --no-dhcp \
--subnet-range 144.76.132.224/29 public_subnet

Create Router
# openstack router create --no-ha router1

Set Router Gateway


# openstack router set --external-gateway public router1

That is it! Everything else will be created automatically by the deployment of the OpenShift
infrastructure. If you want to include more or less you can also easily update the Heat templates
provided.

Deploying OpenShift on OpenStack 1-2-3


Once you have OpenStack environment configured, deploying OpenShift will be done using a simple
three-step phased approach.
 Step 1 Deploy OpenShift Infrastructure using Heat and Ansible.
 Step 2 Install OpenShift using Ansible.
 Step 3 Configure OpenShift and additional services using Ansible.
The Heat templates, all playbooks, and a README is provided in the following Github repository:
https://github.com/ktenzer/openshift-on-openstack-123
Step 1
This step is responsible for deploying OpenShift infrastructure. Ansible will be used to call Heat to
deploy the infrastructure in OpenStack. The heat templates will create a private network, load
balancers, cinder storage, connect to existing public network, boot all instance and prepare the bastion
host. The bastion host is used to deploy and manage the OpenShift deployment.
[OpenStack Controller]
Clone Git Repository
# git clone https://github.com/ktenzer/openshift-on-openstack-123.git

Checkout release branch 1.0


# git checkout release-1.0

Change dir to repository


# cd openshift-on-openstack-123

Configure Parameters
# cp sample-vars.yml vars.yml

# vi vars.yml
---

247
### OpenStack Setting ###
domain_name: ocp3.lab
dns_forwarders: [213.133.98.98, 213.133.98.99]
external_network: public
service_subnet_cidr: 192.168.1.0/24
router_id:
image: rhel74
ssh_user: cloud-user
ssh_key_name: admin
stack_name: openshift
openstack_version: 12
contact: admin@ocp3.lab
heat_template_path: /root/openshift-on-openstack-123/heat/openshift.yaml

### OpenShift Settings ###


openshift_version: 3.7
docker_version: 1.12.6
openshift_ha: true
registry_replicas: 2
openshift_user: admin
openshift_passwd:

### Red Hat Subscription ###


rhn_username:
rhn_password:
rhn_pool:

### OpenStack Instance Count ###


master_count: 3
infra_count: 2
node_count: 2

### OpenStack Instance Group Policies ###


### Set to 'affinity' if only one compute node ###
master_server_group_policies: "['anti-affinity']"
infra_server_group_policies: "['anti-affinity']"
node_server_group_policies: "['anti-affinity']"

### OpenStack Instance Flavors ###


bastion_flavor: ocp.bastion
master_flavor: ocp.master
infra_flavor: ocp.infra
node_flavor: ocp.node

248
Authenticate OpenStack Credentials
# source /root/keystonerc_admin

Disable host key checking


# export ANSIBLE_HOST_KEY_CHECKING=False

Deploy OpenStack Infrastructure for OpenShift


# ansible-playbook deploy-openstack-infra.yml \n
--private-key=/root/admin.pem -e @vars.yml

Step Two
This step is responsible for preparing OpenShift environment. The hostnames will be set, OpenShift
inventory file dynamically generated, systems will be registered to rhn, required packages installed and
docker, among other things, properly configured.
Get IP address of the Bastion Host
# openstack stack output show -f value -c output_value openshift ip_address

{
"masters": [
{
"name": "master0",
"address": "192.168.1.19"
},
{
"name": "master1",
"address": "192.168.1.16"
},
{
"name": "master2",
"address": "192.168.1.15"
}
],
"lb_master": {
"name": "lb_master",
"address": "144.76.134.230"
},
"infras": [
{
"name": "infra0",
"address": "192.168.1.10"
},
{

249
"name": "infra1",
"address": "192.168.1.11"
}
],
"lb_infra": {
"name": "lb_infra",
"address": "144.76.134.229"
},
"bastion": {
"name": "bastion",
"address": "144.76.134.228"
},
"nodes": [
{
"name": "node0",
"address": "192.168.1.6"
},
{
"name": "node1",
"address": "192.168.1.13"
}
]
}

SSH to the Bastion Host using cloud-user and Private Key


# ssh -i /root/admin.pem cloud-user@144.76.134.229

[Bastion Host]
Change Directory to Cloned Git Repository
# cd openshift-on-openstack-123

Authenticate OpenStack Credentials


[cloud-user@bastion ~]$ source /home/cloud-user/keystonerc_admin

Disable Host Key Checking


[cloud-user@bastion ~]$ export ANSIBLE_HOST_KEY_CHECKING=False

Prepare the Nodes for Deployment of OpenShift


[cloud-user@bastion ~]$ ansible-playbook prepare-openshift.yml \n
--private-key=/home/cloud-user/admin.pem -e @vars.yml

250
PLAY RECAP
***************************************************************************
**************
bastion : ok=15 changed=7 unreachable=0 failed=0
infra0 : ok=18 changed=13 unreachable=0 failed=0
infra1 : ok=18 changed=13 unreachable=0 failed=0
localhost : ok=7 changed=6 unreachable=0 failed=0
master0 : ok=18 changed=13 unreachable=0 failed=0
master1 : ok=18 changed=13 unreachable=0 failed=0
master2 : ok=18 changed=13 unreachable=0 failed=0
node0 : ok=18 changed=13 unreachable=0 failed=0
node1 : ok=18 changed=13 unreachable=0 failed=0

Step Three
This step is responsible for configuring a vanilla OpenShift environment. By default, only the
OpenShift router and registry will be configured. OpenShift will be deployed based on the dynamically
generated inventory file in step 2. You can certainly edit the inventory file and make any changes. After
deployment of OpenShift, there is a small post-deployment playbook which will configure dynamic
storage to use OpenStack Cinder. Optional steps are defined as well to configure metrics and logging if
that is desired.
[Bastion Host]
Deploy OpenShift
[cloud-user@bastion ~]$ ansible-playbook -i /home/cloud-user/openshift-
inventory --private-key=/home/cloud-user/admin.pem -vv
/usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
PLAY RECAP
***************************************************************************
**************
infra0.ocp3.lab : ok=183 changed=59 unreachable=0 failed=0
infra1.ocp3.lab : ok=183 changed=59 unreachable=0 failed=0
localhost : ok=12 changed=0 unreachable=0 failed=0
master0.ocp3.lab : ok=635 changed=265 unreachable=0 failed=0
master1.ocp3.lab : ok=635 changed=265 unreachable=0 failed=0
master2.ocp3.lab : ok=635 changed=265 unreachable=0 failed=0
node0.ocp3.lab : ok=183 changed=59 unreachable=0 failed=0
node1.ocp3.lab : ok=183 changed=59 unreachable=0 failed=0

INSTALLER STATUS
***************************************************************************
********
Initialization : Complete

251
Health Check : Complete
etcd Install : Complete
Master Install : Complete
Master Additional Install : Complete
Node Install : Complete
Hosted Install : Complete
Service Catalog Install : Complete

Run Post Install Playbook


[cloud-user@bastion ~]$ ansible-playbook post-openshift.yml --private-key=/
home/cloud-user/admin.pem -e @vars.yml

PLAY RECAP
***************************************************************************
***********************************************
infra0 : ok=4 changed=2 unreachable=0 failed=0
infra1 : ok=4 changed=2 unreachable=0 failed=0
localhost : ok=7 changed=6 unreachable=0 failed=0
master0 : ok=6 changed=4 unreachable=0 failed=0
master1 : ok=6 changed=4 unreachable=0 failed=0
master2 : ok=6 changed=4 unreachable=0 failed=0
node0 : ok=4 changed=2 unreachable=0 failed=0
node1 : ok=4 changed=2 unreachable=0 failed=0
Login in to UI

https://openshift.144.76.134.226.xip.io:8443

Optional
Configure Admin User
[cloud-user@bastion ~]$ ssh -i /home/cloud-user/admin.pem cloud-
user@master0

Authenticate as system:admin User


[cloud-user@master0 ~]$ oc login -u system:admin -n default

Make User OpenShift Cluster Administrator


[cloud-user@master0 ~]$ oadm policy add-cluster-role-to-user cluster-admin
admin

Install Metrics
Set Metrics to true in Inventory
[cloud-user@bastion ~]$ vi openshift_inventory

252
...
openshift_hosted_metrics_deploy=true
...

Run Playbook for Metrics


[cloud-user@bastion ~]$ ansible-playbook -i /home/cloud-user/openshift-
inventory --private-key=/home/cloud-user/admin.pem -vv
/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/
openshift-metrics.yml
PLAY RECAP
***************************************************************************
***********************************************
infra0.ocp3.lab : ok=45 changed=4 unreachable=0 failed=0
infra1.ocp3.lab : ok=45 changed=4 unreachable=0 failed=0
localhost : ok=11 changed=0 unreachable=0 failed=0
master0.ocp3.lab : ok=48 changed=4 unreachable=0 failed=0
master1.ocp3.lab : ok=48 changed=4 unreachable=0 failed=0
master2.ocp3.lab : ok=205 changed=48 unreachable=0 failed=0
node0.ocp3.lab : ok=45 changed=4 unreachable=0 failed=0
node1.ocp3.lab : ok=45 changed=4 unreachable=0 failed=0

INSTALLER STATUS
***************************************************************************
*****************************************
Initialization : Complete
Metrics Install : Complete
Install Logging

Set logging to true in Inventory


[cloud-user@bastion ~]$ vi openshift_inventory
...
openshift_hosted_logging_deploy=true
...

Run Playbook for Logging


[cloud-user@bastion ~]$ ansible-playbook -i /home/cloud-user/openshift-
inventory --private-key=/home/cloud-user/admin.pem

There’s a lot of temptation to compare Pivotal’s Cloud Foundry (PCF) and Kubernetes (K8s) to each
other, we get it. They’re both platform services for deploying cloud-native apps, and they both deal

253
with containers, and the list goes on. There’s a lot of functional overlap between PCF and K8s, but it’s
important to understand how they differ from each other and when it’s best to use one rather than the
other, and when it’s best to use them together.

Pivotal Clound Foundry vs Kubernetes: Choose the best way to


deploy native cloud applications

Nowadays, more than 10 years after the introduction of AWS, there are 3 levels of cloud-service
abstractions: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service
(SaaS).
Amazon Elastic Compute Cloud (EC2) falls into the IaaS category, giving users the basic infrastructure
needed to build and deploy an application. The next level is PaaS products, which will be the topic of
this post. PaaS products offer a higher level of abstraction, so the user won’t be exposed to the O/S,
middleware or runtime and needs only to concern him or herself with the application and data. And
lastly, SaaS products are applications built and hosted by a third-party platform and made available to
users via the internet.

254
Cloud-Native Service Models Comparison
A PaaS is a platform upon which developers can build and deploy applications. These products offer a
higher level of abstraction than we get from IaaS products meaning that, beyond networking, storage
and servers, the application’s O/S, middleware and runtime are all managed by the PaaS.
Within the PaaS market, two of the major players are Pivotal Cloud Foundry and Kubernetes. They are
both open source cloud PaaS products for building, deploying and scaling applications. Despite having
large areas of functional overlap, these two systems offer very different capabilities to their users and
are each better suited for different circumstances.

“Application” PaaS vs. “Container” PaaS


Pivotal Cloud Foundry and Kubernetes share many similar features like containerization, namespacing,
and authentication but their overall approaches to the deployment of cloud-native applications differ
greatly.
The difference between an “application” PaaS and a “container” PaaS isn’t so difficult to figure out.
When we talk about abstractions, we can simply break the PaaS level into two pieces. On the one hand,
we have the platform abstraction at the application level, building and deploying a fully-configured

255
application, and on the other hand, we have the platform abstraction at the container level, building and
deploying containers as a part of a complete application.
PCF is one example of an “application” PaaS, also called the Cloud Foundry Application Runtime, and
Kubernetes is a “container” PaaS (sometimes called CaaS).
The bottom line is that it doesn’t have to be an ‘OR,’ it can be an ‘AND’. The question isn’t necessarily
whether you should use Cloud Foundry OR Kubernetes, the question is when you may need one AND/
OR the other. Because of a few key differentiators, they can be used together demonstrated in the way
they complement each other in the Cloud Foundry Container Runtime, an open-source collaboration
between Pivotal and Google (more on this later).

Pivotal Cloud Foundry


Pivotal Cloud Foundry is a high-level abstraction of cloud-native application development. You give
PCF an application, and the platform does the rest. It does everything from understanding application
dependencies to container building and scaling and wiring up networking and routing.

(source: pivotal.io) Pivotal Cloud Foundry architecture – open source and enterprise

256
Features
Applications run on Cloud Foundry are deployed, scaled and maintained by BOSH (PCF’s
infrastructure management component). It deploys versioned software and the VM for it to run on and
then monitors the application after deployment. Although the learning curve for BOSH is considered to
be fairly high, once mastered it adds considerable value by boosting team productivity.
 More basic features of Pivotal Cloud Foundry include:
 Cloud Controller to direct application deployment
 Deploy using Docker Images and Buildpacks
 Automated routing of all incoming traffic to appropriate component
 Instant (vertical or horizontal) application scaling
 Cf CLI (PCF command line interface)
 Cluster scheduler
 Load balancer and DNS
 “Loggregator” – Logging and metrics aggregation
Installation and Usability
Before beginning the installation process for Pivotal Cloud Foundry, Pivotal documentation directs
users to configure their firewall for PCF and establish IaaS user role guidelines. After that, installation
is guided by a web user interface.
Best Use Cases
Cloud Foundry’s platform is a higher-level abstraction and so it offers a higher level of productivity to
its users. With productivity, though, comes certain limitations in what can be customized in the
runtime.
PCF is ideal for new applications, cloud-native apps and apps that run fine out of a buildpack. For
teams working with short lifecycles and frequent releases, PCF offers an excellent product.

Kubernetes
Kubernetes is a container scheduler or orchestrator. With container orchestration tools, the user creates
and maintains the container themselves. For many teams, having this flexibility and control over the
application is preferred.
Instead of focusing only on the app, the developer needs to create the container and then maintain it in
the future, for example, when anything on the stack has an update (a new JVM version, etc.).

257
(source: x-team.com) Kubernetes architecture
Features
Kubernetes is a mature container orchestrator that runs in the same market as Docker Swarm and
Apache Mesos. In Kubernetes, containers are grouped together into pods based on logical dependencies
which can then be easily scaled at runtime.
More basic features of Kubernetes include:
 Master node for global control (scheduling, API server, data center)
 Worker nodes (VM or physical machine) with services needed to run container pods
 Auto-scaling of containers and volume management
 Flexible architecture with replaceable components and 3rd party plugins
 Stateful persistence layer
 Kubectl (Kubernetes command line interface)
 Active OSS community
Installation and Usability
A common complaint that users have about Kubernetes is the difficulty of the setup process. First of
all, you have to plan ahead when starting to implement K8s because you have to define your nodes in

258
advanced, which can be very time consuming. On top of that, setting up Kubernetes varies for each OS,
and the documentation isn’t sufficient in cases when users need to build custom environments. Add to
all of that manual integrations that are required, and even thinking about going through the setup
process can give you a headache.
Once it’s ready to use, though, Kubernetes offers the most mature and most popular service on the
market in terms of container orchestration tools. It also has an active community offering support and
resources to users.
Best Use Cases
Kubernetes is a lower-level abstraction in the PaaS world meaning greater flexibility to implement
customizations and build your containers to run how you want them to run. Unfortunately, this also
means more work for your engineering teams and decreased productivity.
When moving to any new system or product, a good rule of thumb is to use the highest level
abstraction that will solve your problem without putting any unnecessary limitations on the workload.
If you need more flexibility to do customizations, and you’re willing to put in the work, stick with
Kubernetes (or check out Kubo below).

Best of Both Worlds: Cloud Foundry Container Runtime


The Cloud Foundry Container Runtime (CFCR), previously called Kubo, basically takes Kubernetes
and runs it on top of BOSH, Cloud Foundry’s open-source lifecycle management tool. The goal of this
collaboration between Pivotal and Google, the creator of Kubernetes, is to create “a uniform way to
instantiate, deploy, and manage highly available Kubernetes clusters.”
CFCR gives users the customization abilities of Kubernetes with the deployment and management
power of BOSH.

259
Conclusion:
To conclude we saw that we have several methods for deploying native applications on
cloud platforms via PaaS in Open Source platform we have to Infrastructure as a Service
for configuration even for advanced administration in prioretary cloud platfroms.

AWS is rich by its tools that can be used in many fields sattelite robotics and even
blockchain,
If we want to use large scale data storage or large scale plateforms we can use GCP for
its costs

Azure begun collaborating we the OpenSource world in last few years try to reach
concurrency.
Blue Mix is still in an experimental step. Other platforms are also solid having the
advantage to be reserved to advanced users such as System Administrators and
experienced IT engineers.

260
Bibliography:
https://blog.overops.com/pivotal-cloud-foundry-vs-kubernetes
https://intellipaat.com/blog
https://searchaws.techtarget.com
https://shout.setfive.com
https://developer.searchblox.com
https://www.percona.com/blog
https://read.acloud.guru
http://blog.totalcloud.io
https://www.edureka.co/blog
https://www.cmswire.com/information-management
https://jumpcloud.com/blog
https://www.networkmanagementsoftware.com
https://blog.affini-tech.com
https://blog.overops.com

261
262

Vous aimerez peut-être aussi