Académique Documents
Professionnel Documents
Culture Documents
Introduction:........................................................................................................................................11
Route53...........................................................................................................................................11
Simple Email Service.....................................................................................................................12
Identity and Access Management...................................................................................................12
Simple Storage Service...................................................................................................................12
Elastic Compute Cloud...................................................................................................................13
Elastic Block Store.........................................................................................................................13
CloudWatch....................................................................................................................................13
AWS components................................................................................................................................14
Amazon cluster....................................................................................................................................14
Storage.................................................................................................................................................15
Databases.............................................................................................................................................16
Administration and security................................................................................................................16
Networking..........................................................................................................................................16
Analytics..............................................................................................................................................16
Application services............................................................................................................................16
Deployment and management.............................................................................................................17
Mobile services....................................................................................................................................17
How does AWS work?.........................................................................................................................17
Who uses AWS?..................................................................................................................................17
Why use AWS?....................................................................................................................................18
Scalability and adaptability of AWS....................................................................................................18
AWS’s security and reliability.............................................................................................................18
Analysis...............................................................................................................................................26
Azure Databricks.................................................................................................................................26
Azure Stream Analytics.......................................................................................................................26
SQL Data Warehouse..........................................................................................................................26
HDInsight............................................................................................................................................26
Data Factory........................................................................................................................................26
Data Analytics Lake............................................................................................................................26
Event Hubs..........................................................................................................................................26
Power Embedded BI............................................................................................................................26
Azure Analysis Services......................................................................................................................26
R Server for HDInsight.......................................................................................................................27
Data Catalog........................................................................................................................................27
Azure Data Storage Lake....................................................................................................................27
Azure Data Explorer............................................................................................................................27
Data base.............................................................................................................................................27
SQL Server on virtual machines..........................................................................................................27
SQL Azure Database...........................................................................................................................27
Azure Cosmos DB...............................................................................................................................27
SQL Data Warehouse..........................................................................................................................27
1
Data Factory........................................................................................................................................27
Cache Azure for Redis.........................................................................................................................27
SQL Server Database Stretch..............................................................................................................28
storage Table........................................................................................................................................28
Azure database to PostgreSQL............................................................................................................28
Azure Database for MariaDB..............................................................................................................28
Azure Database for MySQL................................................................................................................28
Azure database migration service........................................................................................................28
Calculation...........................................................................................................................................28
virtual machines..................................................................................................................................28
Virtual Machine Scale Sets..................................................................................................................28
Azure Kubernetes Service (AKS).......................................................................................................28
Azure Functions..................................................................................................................................28
Fabric Service......................................................................................................................................29
App Service.........................................................................................................................................29
container Instances..............................................................................................................................29
batch....................................................................................................................................................29
Azure Batch AI....................................................................................................................................29
SQL Server on virtual machines..........................................................................................................29
cloud services......................................................................................................................................29
SAP HANA on large bodies Azure......................................................................................................29
Web Apps.............................................................................................................................................29
mobile apps..........................................................................................................................................29
Apps APIs............................................................................................................................................29
Linux virtual machines........................................................................................................................30
Azure CycleCloud...............................................................................................................................30
containers.............................................................................................................................................30
Azure Kubernetes Service (AKS).......................................................................................................30
Azure Functions..................................................................................................................................30
Fabric Service......................................................................................................................................30
App Service.........................................................................................................................................30
container Instances..............................................................................................................................30
container Registry................................................................................................................................30
Web Apps.............................................................................................................................................30
mobile apps..........................................................................................................................................30
Apps APIs............................................................................................................................................31
Web App for Containers......................................................................................................................31
DevOps................................................................................................................................................31
Azure DevOps.....................................................................................................................................31
Azure Pipelines....................................................................................................................................31
Azure Boards.......................................................................................................................................31
Azure Rest...........................................................................................................................................31
Azure Artifacts....................................................................................................................................31
Azure Test Plans..................................................................................................................................31
2
Azure Dev / Test Lab...........................................................................................................................31
Integrations of DevOps tools...............................................................................................................31
multimedia data...................................................................................................................................32
Content Distribution Network.............................................................................................................32
Media Services....................................................................................................................................32
encoding..............................................................................................................................................32
Streaming live and on demand............................................................................................................32
Azure Media Player.............................................................................................................................32
Content protection...............................................................................................................................32
Digital Analytics..................................................................................................................................32
video Indexer.......................................................................................................................................32
Management and Governance.............................................................................................................32
Azure backup.......................................................................................................................................32
Azure Site Recovery............................................................................................................................32
Azure Advisor......................................................................................................................................33
Scheduler.............................................................................................................................................33
Automation..........................................................................................................................................33
Traffic Manager...................................................................................................................................33
Azure Monitor.....................................................................................................................................33
Network Watcher.................................................................................................................................33
Azure Service Health...........................................................................................................................33
Microsoft Azure Portal........................................................................................................................33
Azure Resource Manager....................................................................................................................33
Cloud Shell..........................................................................................................................................33
Mobile Azure application....................................................................................................................33
Azure Policy........................................................................................................................................34
Expenses management........................................................................................................................34
Azure Managed Applications..............................................................................................................34
Azure Migrate......................................................................................................................................34
Azure Blueprint...................................................................................................................................34
IA + Machine Learning.......................................................................................................................34
Azure Batch AI....................................................................................................................................34
Bot Azure Service................................................................................................................................34
Azure Databricks.................................................................................................................................34
Search Azure........................................................................................................................................34
Bing Automatic Suggestion.................................................................................................................34
Custom Search Bing............................................................................................................................35
Entity Search Bing...............................................................................................................................35
Bing Image Search..............................................................................................................................35
Search Bing News...............................................................................................................................35
Bing Spell Checking............................................................................................................................35
Search Bing Videos.............................................................................................................................35
Bing Visual Search..............................................................................................................................35
Bing Web search..................................................................................................................................35
3
cognitive Services................................................................................................................................35
Computer Vision..................................................................................................................................35
Content Moderator..............................................................................................................................36
personalized speech.............................................................................................................................36
personalized vision..............................................................................................................................36
Virtual machines Science Data............................................................................................................36
Emotion...............................................................................................................................................36
Face.....................................................................................................................................................36
Azure Machine Learning Service........................................................................................................36
Machine Learning Studio....................................................................................................................36
Microsoft Genomics............................................................................................................................36
Speech Conversation Translation Translator.......................................................................................36
Language Understanding.....................................................................................................................36
linguistic analysis................................................................................................................................37
QnA Maker..........................................................................................................................................37
Speaker recognition.............................................................................................................................37
voice translation..................................................................................................................................37
Speech Recognition.............................................................................................................................37
Text Analysis.......................................................................................................................................37
Vocal synthesis....................................................................................................................................37
Text translation Text Translator...........................................................................................................37
video Indexer.......................................................................................................................................37
Identity.................................................................................................................................................37
Azure Active Directory.......................................................................................................................37
Azure Information Protection..............................................................................................................38
Azure Active Directory Domain Services...........................................................................................38
Azure Active Directory B2C...............................................................................................................38
Integration...........................................................................................................................................38
Event Grid...........................................................................................................................................38
Logic Apps..........................................................................................................................................38
API Management.................................................................................................................................38
Bus Service..........................................................................................................................................38
Internet of Things................................................................................................................................38
Azure Functions..................................................................................................................................38
Azure IoT Hub.....................................................................................................................................38
Azure IoT Edge...................................................................................................................................39
Azure IoT Central................................................................................................................................39
Accelerators Azure IoT solution..........................................................................................................39
Azure Sphere.......................................................................................................................................39
Azure Time Series Insights..................................................................................................................39
Azure Maps.........................................................................................................................................39
Event Grid...........................................................................................................................................39
Windows 10 IoT Core Services...........................................................................................................39
Azure Machine Learning Service........................................................................................................39
4
Machine Learning Studio....................................................................................................................39
Azure Stream Analytics.......................................................................................................................40
Logic Apps..........................................................................................................................................40
Hubs notification.................................................................................................................................40
Azure Cosmos DB...............................................................................................................................40
API Management.................................................................................................................................40
Azure Digital Twins............................................................................................................................40
Migration.............................................................................................................................................40
Azure Site Recovery............................................................................................................................40
Expenses management........................................................................................................................40
Azure database migration service........................................................................................................40
Azure Migrate......................................................................................................................................40
Data Box..............................................................................................................................................41
Networking..........................................................................................................................................41
Content Distribution Network.............................................................................................................41
ExpressRoute.......................................................................................................................................41
Azure DNS..........................................................................................................................................41
Virtual Network...................................................................................................................................41
Traffic Manager...................................................................................................................................41
Load Balancer......................................................................................................................................41
VPN gateway.......................................................................................................................................41
Gateway Application...........................................................................................................................41
Azure DDoS Protection.......................................................................................................................41
Network Watcher.................................................................................................................................42
Azure Firewall.....................................................................................................................................42
Virtual WAN........................................................................................................................................42
Azure Front Door Service...................................................................................................................42
Mobile.................................................................................................................................................42
App Service.........................................................................................................................................42
Azure Maps.........................................................................................................................................42
Hubs notification.................................................................................................................................42
Web Apps.............................................................................................................................................42
mobile apps..........................................................................................................................................42
Apps APIs............................................................................................................................................42
Mobile Azure application....................................................................................................................43
Visual Studio Application Center........................................................................................................43
Xamarin...............................................................................................................................................43
Web App for Containers......................................................................................................................43
development tools................................................................................................................................43
Visual Studio.......................................................................................................................................43
Visual Studio Code..............................................................................................................................43
Software Development Kits (SDKs)...................................................................................................43
Azure DevOps.....................................................................................................................................43
CLI.......................................................................................................................................................43
5
Azure Pipelines....................................................................................................................................43
Azure Lab Services.............................................................................................................................44
Azure Dev / Test Lab...........................................................................................................................44
Integration of development tools.........................................................................................................44
security................................................................................................................................................44
Azure Active Directory.......................................................................................................................44
Azure Information Protection..............................................................................................................44
Azure Active Directory Domain Services...........................................................................................44
Key Vault.............................................................................................................................................44
Security Center....................................................................................................................................44
hardware security module (HSM) dedicated Azure............................................................................44
VPN gateway.......................................................................................................................................44
Gateway Application...........................................................................................................................45
Azure DDoS Protection.......................................................................................................................45
Storage.................................................................................................................................................45
Storage.................................................................................................................................................45
Azure backup.......................................................................................................................................45
StorSimple...........................................................................................................................................45
Azure Data Storage Lake....................................................................................................................45
Blob storage.........................................................................................................................................45
Disk Storage........................................................................................................................................45
managed disks.....................................................................................................................................45
Queue Storage.....................................................................................................................................45
File storage..........................................................................................................................................46
Data Box..............................................................................................................................................46
Avere vFXT Azure..............................................................................................................................46
Storage Explorer..................................................................................................................................46
archive storage.....................................................................................................................................46
Azure NetApp Files.............................................................................................................................46
Web......................................................................................................................................................46
App Service.........................................................................................................................................46
Content Distribution Network.............................................................................................................46
Search Azure........................................................................................................................................46
Hubs notification.................................................................................................................................46
API Management.................................................................................................................................47
Web Apps.............................................................................................................................................47
mobile apps..........................................................................................................................................47
Apps APIs............................................................................................................................................47
Web App for Containers......................................................................................................................47
Azure SignalR Service........................................................................................................................47
Games...........................................................................................................................................104
What is Amazon Athena: a complete overview............................................................................107
Amazon Athena Features..............................................................................................................108
Amazon Athena Use Cases...........................................................................................................111
6
What is Amazon Athena: pricing..................................................................................................112
What is Amazon CloudSearch?.........................................................................................................112
Overview of Amazon CloudSearch Benefits.....................................................................................113
Overview of Amazon CloudSearch Features....................................................................................115
How Much Does Amazon CloudSearch Cost?..................................................................................116
Amazon EMR....................................................................................................................................130
Processing big data with Amazon EMR.......................................................................................131
Architecture............................................................................................................................................132
Setup.......................................................................................................................................................132
AWS Region.................................................................................................................................132
1. Create VPC....................................................................................................................................133
2. Create KeyPair..............................................................................................................................133
SSH...............................................................................................................................................134
3. Create IAM Role...........................................................................................................................134
4. Create AWS Elasticsearch Domain...............................................................................................137
Elasticsearch Version....................................................................................................................137
5. Start SearchBlox IndexServer via Amazon Marketplace..............................................................142
Integrate with IAM Role...............................................................................................................147
SSH into SearchBlox IndexServer....................................................................................................148
Increase RAM memory for SearchBlox in AWS..........................................................................151
Kibana and Amazon Elasticsearch Service............................................................................................153
Amazon Kinesis................................................................................................................................161
When Should I Use Amazon Aurora and When Should I use RDS MySQL?..................................163
An introduction to Amazon RDS.............................................................................................163
So, Aurora or RDS MySQL?...................................................................................................164
Performance considerations................................................................................................164
Capacity Planning...............................................................................................................165
Replication..........................................................................................................................165
Monitoring...........................................................................................................................166
Costs....................................................................................................................................166
TL;DR.................................................................................................................................167
Drawbacks and Alternatives to DynamoDB.....................................................................................168
Rules of the Game.............................................................................................................................169
Let the Games Begin!........................................................................................................................170
YCSB Workload A, Uniform Distribution........................................................................................172
Real Life Use-case: Zipfian Distribution..........................................................................................173
The Dress that Broke DynamoDB.....................................................................................................176
Scylla vs DynamoDB – Single (Hot) Partition.................................................................................177
Additional Factors.............................................................................................................................177
Cross-region Replication and Global Tables................................................................................177
Explicit Caching is Expensive and Bad for You...........................................................................180
Freedom........................................................................................................................................180
No Limits......................................................................................................................................181
Total Cost of Ownership (TCO)........................................................................................................181
7
ElastiCache versus self-hosted Redis on EC2...................................................................................184
The Practical Comparison: ElastiCache Vs. Self-hosted Redis on EC2...........................................185
Deep Diving into the Practicalities of ElastiCache and Self-hosted Redis on EC2..........................187
ElastiCache: Supports Fully Managed Redis and Memcached...............................................187
ElastiCache: Scales Automatically According to Requirements.............................................187
ElastiCache: Instances with More Than One vCPU Cannot Utilize All the Cores..................187
Self Hosted Redis on EC2: Allows You to Update Latest Version ASAP...............................187
Self Hosted Redis on EC2: Provides the Freedom to Modify Configurations........................187
Self Hosted Redis on EC2: Unavailability of Pertinent Metrics Makes Maintenance Tedious188
Self Hosted Redis on EC2: Instance Limitations.....................................................................188
What is AWS EC2 ?..........................................................................................................................188
Why AWS EC2 ?...............................................................................................................................189
Let’s understand the types of EC2 Computing Instances:.................................................................191
EBS-optimized Instances..............................................................................................................193
Security in AWS EC2 .......................................................................................................................196
Auto Scaling......................................................................................................................................196
AWS EC2 Pricing..............................................................................................................................197
AWS EC2 Use Case..........................................................................................................................198
QLDB, BlockChain Database with Amazon webservices................................................................209
Similar Problems, With a Key Difference.........................................................................................209
The Benefits of Blockchain, But for Closed Systems.......................................................................210
Amazon S3.............................................................................................................................................211
How Amazon S3 works................................................................................................................211
Amazon S3 features......................................................................................................................212
Amazon S3 storage classes...........................................................................................................212
Working with buckets...................................................................................................................213
Protecting your data......................................................................................................................213
Comparing AWS vs Azure vs Google Cloud Platforms For Enterprise App Development..............215
Amazon Web Services..................................................................................................................215
Features.........................................................................................................................................215
Pricing...........................................................................................................................................215
Advantages...................................................................................................................................216
Microsoft Azure............................................................................................................................216
Features.........................................................................................................................................216
Pricing...........................................................................................................................................216
Advantages...................................................................................................................................217
Google Cloud Platform.................................................................................................................217
Features.........................................................................................................................................217
Pricing...........................................................................................................................................217
Advantages...................................................................................................................................218
Google Cloud Platform:....................................................................................................................218
Cloud IAM:......................................................................................................................................218
Evolution of Identity and Access Management.................................................................................219
Why Cloud IAM?..............................................................................................................................220
8
The Fundamentals of Google Compute Engine (GCE).........................................................................226
Virtual Machine (VM) Instances..................................................................................................226
Machine Types..............................................................................................................................226
Custom machine types..................................................................................................................227
Disks.............................................................................................................................................227
Persistent Disks.............................................................................................................................228
Local SSD.....................................................................................................................................229
Images...........................................................................................................................................229
Zones............................................................................................................................................230
What if I choose a zone and want it changed afterward?.............................................................231
Networking & Firewall.................................................................................................................231
Availability Policy........................................................................................................................232
Preemptibility...............................................................................................................................232
Automatic Restart.........................................................................................................................233
On host maintenance.....................................................................................................................233
Other Options...............................................................................................................................233
Accessing the VM.........................................................................................................................233
Pricing...........................................................................................................................................234
Serverless Showdown: AWS Lambda vs Firebase Google Cloud Functions....................................238
Function Creation — Lambda........................................................................................................240
Function Creation — Google Cloud Functions.............................................................................241
Deployment...................................................................................................................................243
Testing — Lambda.........................................................................................................................243
Testing — Google Cloud Functions...............................................................................................243
Pricing...........................................................................................................................................244
Conclusion....................................................................................................................................245
Cloud Shell:.......................................................................................................................................247
What is Google Cloud Shell.........................................................................................................247
What does it come with?..........................................................................................................247
Tips and Tricks.............................................................................................................................247
1. Running a web server (with auto-HTTPS for FREE!).........................................................247
2. Get extra power with “boost mode”.....................................................................................248
3. Edit your files with a GUI....................................................................................................248
4. Upload/Download files........................................................................................................248
5. Persist binary/program installations.....................................................................................248
Bonus: Open in Cloud Shell....................................................................................................248
Firebase Components:.......................................................................................................................249
Firebase.........................................................................................................................................249
A Brief History.........................................................................................................................249
Firebase Services.....................................................................................................................249
Realtime Database........................................................................................................................251
Authentication...............................................................................................................................251
Firebase Cloud Messaging (FCM)................................................................................................252
Firebase Database Query..............................................................................................................253
9
How to Store Data? => Firebase Storage.....................................................................................253
Firebase Test Labs........................................................................................................................254
Instrumentation Test.................................................................................................................254
Remote Config..............................................................................................................................254
Firebase App Indexing..................................................................................................................254
Firebase Dynamic Links...............................................................................................................255
Firestore........................................................................................................................................255
Improved Querying and Data Structure...................................................................................255
Query with Firestore................................................................................................................256
Better Scalability......................................................................................................................256
Multi-Region Database............................................................................................................256
Different Pricing Model...........................................................................................................257
Google Cloud Console:.....................................................................................................................258
OpenShift on OpenStack 1-2-3: Bringing IaaS and PaaS Together..................................................259
Overview...........................................................................................................................................259
OpenShift and the Case for OpenStack.............................................................................................260
OpenShift integration with OpenStack..............................................................................................260
OpenShift on OpenStack Architectures.............................................................................................261
Resource vs AutoScaling Groups.................................................................................................262
Non-HA........................................................................................................................................262
HA................................................................................................................................................263
Deploying OpenStack........................................................................................................................264
Deploying OpenShift on OpenStack 1-2-3.......................................................................................266
Step 1............................................................................................................................................266
Step Two.......................................................................................................................................268
Step Three.....................................................................................................................................270
Optional.............................................................................................................................................271
Pivotal Clound Foundry vs Kubernetes: Choose the best way to deploy native cloud applications 273
“Application” PaaS vs. “Container” PaaS.........................................................................................274
Pivotal Cloud Foundry......................................................................................................................275
Features.........................................................................................................................................276
Installation and Usability..............................................................................................................277
Best Use Cases..............................................................................................................................277
Kubernetes.........................................................................................................................................277
Features.........................................................................................................................................278
Installation and Usability..............................................................................................................279
Best Use Cases..............................................................................................................................279
Best of Both Worlds: Cloud Foundry Container Runtime................................................................279
Conclusion:........................................................................................................................................280
Bibliography:.....................................................................................................................................281
10
Introduction:
Since the Cloud Computing had been deployed to the professional world, There were many interfaces
to deal with the end user needs and other stakeholders: Those platforms begun with amazon web
services, Google Cloud Platfrom Microsoft Azure IBM Blue Mix and other Red Hat PaaS such as
OpenShift and
OpenStack, The Open Source world also have its platform named CloudFoundry.
Over the last couple of years, the popularity of the “cloud computing” has grown dramatically and
along with it so has the dominance of Amazon Web Services (AWS) in the market. Unfortunately, AWS
doesn’t do a great job of explaining exactly what AWS is, how its pieces work together, or what typical
use cases for its components may be. This post is an effort to address this by providing a whip around
overview of the key AWS components and how they can be effectively used.
Great, so what is AWS? Generally speaking, Amazon Web Services is a loosely coupled collection of
“cloud” infrastructure services that allows customers to “rent” computing resources. What this means is
that using AWS, you as the client are able to flexibly provision various computing resources on a “pay
as you go” pricing model. Expecting a huge traffic spike? AWS has you covered. Need to flexibly store
between 1 GB or 100 GB of photos? AWS has you covered. Additionally, each of the components that
makes up AWS is generally loosely coupled meaning that they can work independently or in concert
with other AWS resources.
Since AWS components are loosely coupled, you’d be able to mix and match only what you need but
here is an overview of the key services.
Route53
What is it? Route53 is a highly available, scalable, and feature rich domain name service (DNS) web
service. What a DNS service does is translate a domain name like “setfive.com” into an IP address like
64.22.80.79 which allows a client’s computer to “find” the correct server for a given domain name. In
addition, Route53 also has several advanced features normally only available in pricey enterprise DNS
solutions. Route53 would typically replace the DNS service provided by your registrar like GoDaddy
or Register.com.
Should you use it? Definitely. Allow it isn’t free, after last year’s prolonged GoDaddy outage it’s clear
that DNS is a critical component and using a company that treats it as such is important.
11
Simple Email Service
What is it? Simple Email Service (SES) is a hosted transactional email service. It allows you to easily
send highly deliverable emails using a RESTful API call or via regular SMTP without running your
own email infrastructure.
Should you use it? Maybe. SES is comparable to services like SendGrid in that it offers a highly
deliverable email service. Although it is missing some of the features that you’ll find on SendGrid, its
pricing is attractive and the integration is straightforward. We normally use SES for application emails
(think “Forgot your password”) but then use MailChimp or SendGrid for marketing blasts and that
seems to work pretty well.
What is it? Identity and access management (IAM) provides enhanced security and identity
management for your AWS account. In addition, it allows you to enable “multi factor” authentication to
enhance the security of your AWS account.
Should you use it? Definitely. If you have more than 1 person accessing your AWS account using IAM
will allow everyone to get a separate account with fine grained permissions. Multi factor authentication
is also critically important since a compromise at the infrastructure level would be catastrophic for
most businesses. Read more about IAM here.
What is it? Simple storage service (S3) is a flexible, scalable, and highly available storage web
service. Think of S3 like having an infinitely large hard drive where you can store files which are then
accessible via a unique URL. S3 also supports access control, expiration times, and several other useful
features. Additionally, the payment model for S3 is “pay as you go” so you’ll only be billed for the
amount of data you store and how much bandwidth you use to transfer it in and out.
Should you use it? Definitely. S3 is probably the most widely used AWS service because of its
attractive pricing and ease of use. If you’re running a site with lots of static assets (images, CSS assets,
etc.), you’ll probably get a “free” performance boost by hosting those assets on S3. Additionally, S3 is
an ideal solution for incremental backups, both data and code. We use S3 extensively, usually for
12
hosting static files, frequently backing up MySQL databases, and backing up git repositories. The new
AWS S3 Console also makes administering S3 and using it non-programmatically much easier.
What is it? Elastic Compute Cloud (EC2) is the central piece of the AWS ecosystem. EC2 provides
flexible, on-demand computing resources with a “pay as you go” pricing model. Concretely, what this
means is that you can “rent” computing resources for as long as you need them and process any
workload on the machines you’ve provisioned. Because of its flexibility, EC2 is an attractive
alternative to buying traditional servers for unpredictable workloads.
Should you use it? Maybe. Whether or not to use EC2 is always a controversial discussion because the
complexity it introduces doesn’t always justify its benefits. As a rule of thumb, if you have
unpredictable workloads like sporadic traffic using EC2 to run your infrastructure is probably a
worthwhile investment. However, if you’re confident that you can predict the resources you’ll need you
might be better served by a “normal” VPS solution like Linode.
What is it? Elastic block store (EBS) provides persist storage volumes that attach to EC2 instances to
allow you to persist data past the lifespan of a single EC2. Due to the architecture of elastic compute
cloud, all the storage systems on an instance are ephemeral. This means that when an instance is
terminated all the data stored on that instance is lost. EBS addresses this issue by providing persistent
storage that appears on instances as a regular hard drive.
Should you use it? Maybe. If you’re using EC2, you’ll have to weigh the choice between using only
ephemeral instance storage or using EBS to persist data. Beyond that, EBS has well documented
performance issues so you’ll have to be cognizant of that while designing your infrastructure.
CloudWatch
What is it? CloudWatch provides monitoring for AWS resources including EC2 and EBS. CloudWatch
enables administrators to view and collect key metrics and also set a series of alarms to be notified in
case of trouble. In addition, CloudWatch can aggregate metrics across EC2 instances which provides
useful insight into how your entire stack is operating.
13
Should you use it? Probably. CloudWatch is significantly easier to setup and use than tools
like Nagios but its also less feature rich. We’ve had some success coupling CloudWatch
with PagerDuty to provide alerts in case of critical service interruptions. You’ll probably need
additional monitoring on top of CloudWatch but its certainly a good baseline to start with.
AWS full form is Amazon Web Services. Previously a factory would typically build an electricity plant
and use it for their purposes. Then power experts would manage electricity plants to provide reliable
power supply at a very low cost to these factories as a whole. The electricity could be generated with
greater efficiency and the price in this model is also low. AWS cloud follows a similar model where
instead of building large scale infrastructures, companies can opt for cloud services where they can get
all infrastructure they could ever need.
AWS is a growing cloud computing platform which has a significant share of Cloud Computing with
respect to its competitors. AWS is geographically diversified into regions to ensure system robustness
and outages. In Japan, Eastern USA, two locations in Western USA, Brazil, Ireland, Singapore, and
Australia regions there are central hubs in place. There are over 50 services like application services,
networking, storage, mobile, management, compute and many others which are available for the client
easily.
This 2-minute video will familiarize you with the concepts of AWS :
To enterprises, start-ups; services can quickly be deployed without these firms needing much capital.
As AWS is closely collaborating with GE, Pinterest and MLB the cloud clients can pin, power and play
with the features in AWS cloud. Let’s now dig into the components of AWS.
AWS components
To assess the cloud computing capabilities of AWS we have to first look into the core components of
the cloud. There are various components of AWS but we are elucidating on only key components.
Amazon cluster
Also known as Amazon compute, AWS has mainly EC2 (Elastic cloud compute) and ELB (Elastic load
balancing) as the lead computing services. It is due to the virtue of these instances that companies can
scale up or down based upon need. System admins and developers use EC2 instances to get hold and
boot the computing instances in the cloud. The pricing is based on usage. The first timers to AWS get
around 750 hours of EC2 per month for the first year. But beyond that they have three pricing models
like on-demand, spot instance and reserve instance.
14
Depending on location, size, complexity and storage requirements on-demand prices range from $0.13
to $4.60.
Reserve instance pricing is where the users are expected to reserve the instance well in advance in the
range of one to three years. AWS offers upto 75% discount on on-demand pricing when users reserve
the cloud instances.
Spot instance pricing lets users bid on compute instances that are not used. Spot prices differ based on
usage, time of day, week or month.
For less human intervention and fault tolerance, AWS ELB distributes the applications widely
throughout the EC2 instances. The ELB service is free within 15GB of data processing and 750 hours
of monthly service for a year. Larger loads are charged on an hourly basis and each GB transferred.
“When I get into complex customer situations that leverage combinations of AWS services, AWS
Certification has allowed me to immediately add value.”
– Ryan Fackett, Director, Foghorn Consulting, Advanced APN Consulting Partner, AWS Certiifed
Solutions Architect – Professional
Storage
Amazon’s Simple Storage Service (S3), Elastic block storage (EBS) and CloudFront are the three
storage choices of Amazon. Storage in AWS is provided through pay-as-you-go model. Amazon S3 is
a storage offering of AWS that can store any amount of storage which is required. It is used for various
reasons like content storage, backup, archiving and disaster recovery, and also data analysis storage/
Along with free EC2 instance for the first year, AWS also offers 5GB of cloud storage and 20,000 GET
requests and 5,000 PUT requests from S3 free for the first year. After first year the pricing is $.0300 for
1GB upto 1TB per month. EBS is very helpful in scaling the EC2 instances. Pricing is based on
geographic regions like the disk technology used and the GBs of provisioned storage required.
CloudFront is a great storage option for developers and business organizations which facilitates low
latency and high data transfer speeds.
Databases
Along with in-memory caching and data warehousing facility in the range of petabytes AWS also scales
relational and NoSQL databases. DynamoDB is the NoSQL database which offers high scale, low cost
15
storage. Using EC2 and EBS, users can operate on their own databases in AWS. Relational Database
Service(RDS) and Amazon Redshift are the two database services from AWS.
To operate and scale MySQL, Oracle, SQLServer or PostgreSQL servers on AWS, Amazon RDS is
used. Based on the instance hours and storage amount, RDS pricing is used. Redshift is a data
warehouse service through which users can store data in columns rather than in rows. Pricing is based
on the instance hours like $0.25 per hour.
Networking
Amazon VPC (Virtual Private Cloud) provides a versatile networking capability in AWS meaning it
provides built-in security and a private cloud. VPC comes free with EC2. AWS Direct Connect
Service lets users directly connect to the cloud bypassing the internet. It is priced on an hourly basis.
Analytics
AWS offers services for data analytics on all fronts like Hadoop, orchestration and real-time streaming
and data warehousing. EMR (Elastic MapReduce) is the analytics facilitator which is used by the
Businesses, data analysts, researchers and developers to process data chunks. Pricing is done on an
hourly basis. Redshift also provides some analytics capabilities.
Application services
To automate workflow between different services Amazon SQS (Simple Queue Service) is used. A
dedicated queue is present which is used in storing messages. The service is free upto 1 million
messages per month and after that $0.50 is charged for every million messages.
SWS (Simple Workflow Service) is a task management and co-ordination service for AWS. 10,000
activity tasks, 30,000 workflow days and 1,000 initiated executions for a year are free for users. Above
that per workflow users pay around $0.0001.
16
Mobile services
Amazon Cognito and Mobile Analytics are two popular AWS mobile services. Cognito IDs users and
syncs data across their mobile devices. Upto 10GB of cloud sync storage and 10 lakh sync operations
per month are free here. Beyond that users are liable to pay around $0.15 for every 10,000 operations.
Usage data within 60 minutes is delivered by Mobile Analytics which tracks applications at scale. Upto
one million events usage is free and above that the pricing is $1 for every million.
“If we wouldn’t have gone through AWS training then our progress would have much slower.
There would have been a lot of pitfalls.”
– Christian Boehm, Head, Data center infrastructure, Siemens
Small, medium firms and startups make up majority of that space with enterprise users being just
around 1 lakh.
17
it. No matter how much demand there is, AWS can scale to that level. AWS is also helpful in big data
analytics. In AWS code deployment can be achieved continuously as DevOps processes are expertly
supported.
As the AWS services can be flexibly used, customers needn’t worry over about their computing usage.
The usage can be very high or very low and the AWS will scale for whichever way the company wants.
This high adaptability is what AWS stands for truly.
The reason why Amazon is so huge is because of AWS along with its retail arm. We have already said
that AWS offers upto 75% discount when the instances are reserved in advance. We all know that AWS
is in huge profit and is growing rapidly. This itself says that inspite of providing services at such
discounted rates AWS is able to rake that much in profits then imagine how many users are massively
using it. Did you know AWS’s IaaS cloud is 10 times greater than the 14 competitors of AWS
combined? Stats like this are only possible because of strong capability which AWS provides. Even
then large enterprises are ever hesitant to make transition to AWS as we said earlier mainly pinpointing
security issues. But again we would like to reiterate the case of Netflix which grew tremendously using
AWS. Therefore enterprises can leverage on AWS to achieve such a feat in their own game too. If you
own an enterprise then use AWS for infrastructure, if you are a developer, engineer, data professional
etc try out the AWS free instances at least, if you are a stock trader then buy the stocks of Amazon
before large enterprises realize the potential in AWS.
18
19
Analysis
Amazon AthenaS3 data queries in SQL
Amazon CloudSearchmanaged search service
Amazon EMRhosted Hadoop framework
Amazon ElasticSearch ServiceExecution and ElasticSearch cluster design
Amazon KinesisWorking with streaming data in real time
Amazon Streaming Managed KafkaApache Kafka fully managed service
Amazon Redshiftfast data storage, easy and affordable
Amazon QuickSightFast service analysis activities
AWS Data PipelineService Orchestration workflow periodic data-driven
AWS GluePreparation and Loading Data
AWS Lake FormationCreate a secure data lake within days
Application Integration
AR VR
Amazon SumerianCreate and run AR and VR applications
Blockchain
Business applications
Alexa for BusinessOptimize your organization with Alexa
Amazon ChimeMeetings, video calls and chat made easy
Amazon WorkDocsstorage and sharing service for business
Amazon WorkmailEmail and secured and managed planning
Calculation
20
Amazon EC2 Auto ScalingTo scale computing capacity to meet demand
Amazon Elastic Container Registry Storing and Retrieving Docker pictures
Amazon Elastic Container ServiceRun and manage Docker containers
Amazon Elastic Container Service for Kubernetes Run the Kubernetes system run on AWS
Amazon LightSailLaunching and managing virtual private servers
AWS BatchBatch execution of tasks at any scale
AWS Elastic BeanstalkRun and manage Web applications
AWS FargateRun containers without management servers or clusters
AWS LambdaRun your code in response to events.
AWS OutpostsRun AWS services on website
AWS Serverless Application Repository Discovery, deployment and publication of server applications
without
Elastic Load Balancing (ELB)incoming traffic distribution to multiple targets
VMware Cloud on AWSDevelop a hybrid cloud without special hardware
Customer engagement
Databases
Developer Tools
21
AWS CodestarDevelop and deploy applications AWS
Amazon CorrettoDistribution ready for production OpenJDK
AWS Cloud9Write, execute and debug code on a cloud IDE
AWS CodeBuildCreate and test code
AWS CodeCommitStore code in private Git repositories
AWS CodeDeployAutomate the deployment code
AWS CodePipelineStart software using the Streaming
Command Line Interface AWSunified tool to manage AWS services
Tools and SDKs AWSTools and SDKs for AWS
AWS X-RayAnalysis and debug your applications
Tech Game
IoT
22
Amazon TranscribeSpeech Recognition
Deep Learning AWS AMIQuickly Start Deep Learning on EC2
AWS DeepLensVideo camera with profound learning
AWS DeepRacerRace car scale 1 / 18th autonomously managed by ML
AWS InferentiaChip inference machine learning
Apache MXNet on AWSDeep Learning scalable and high performance
TensorFlow on AWSMachine Intelligence Open Source Library
Management and Governance
23
AWS Transfer for SFTPSFTP fully managed service
Mobile
Satellite
AWS Identity and Access Management (IAM) Manage user access and encryption keys
Amazon Cloud DirectoryCreate flexible cloud-native directories
Amazon CognitoIdentity Management for your players
Amazon GuardDutyManaged service threat detection
Amazon InspectorAnalysis of Application Security
Amazon MacieDiscover, organize and protect your data
AWS ArtifactOn-demand access to compliance reports AWS
AWS Certificate ManagerAllocation, management and SSL / TLS certificate deployment
AWS CloudHSMKey equipment storage for regulatory compliance
AWS Directory ServiceHost and manage Active Directory
AWS Firewall ManagerCentral policy management of firewalls
AWS Key Management ServiceCreating and supervised control of encryption keys
24
AWS OrganizationsManagement policies of several AWS accounts
AWS Manager SecretsRotate, management and extraction of secrets
AWS Security Hubsecurity and Unified Compliance Center
AWS ShieldProtection against DDoS
AWS Single Sign-OnSingle Sign On (SSO) cloud
AWS WAFFiltering malicious Web traffic
Storage
25
Azure
Analysis
Read more
Azure Databricks
Platform for rapid, simple and collaborative-based Apache Spark
HDInsight
Stock up Hadoop clusters, Spark R Server, HBase and Storm cloud
Data Factory
The integration of hybrid data across the enterprise easier
Event Hubs
Get millions of devices telemetry data
Power Embedded BI
Incorporate data visualizations fully interactive and vivid in your applications
26
R Server for HDInsight
predictive analytics, machine learning and statistical modeling for Big Data
Data Catalog
Get more value from your corporate data assets
Data base
Read more
Azure Cosmos DB
Multi-database, globally distributed and available at any scale
Data Factory
The integration of hybrid data across the enterprise easier
27
SQL Server Database Stretch
Extend dynamically local SQL Server databases on Azure
storage Table
Storage NoSQL key values using semi-structured data sets
Calculation
Read more
virtual machines
Stock up Windows and Linux virtual machines in seconds
Azure Functions
Treat events with code serverless
28
Fabric Service
Develop microservices and orchestrate containers on Windows or Linux
App Service
Quickly create powerful cloud applications for the web and mobile devices
container Instances
Easily perform on Azure containers without server management
batch
Plan tasks and management of calculations across the cloud
Azure Batch AI
Experiment and easily train your deep learning models and artificial intelligence in parallel and across
cloud services
Build applications and cloud APIs infinitely scalable and highly available
Web Apps
Quickly create and deploy across critical web applications
mobile apps
Create and host the main application for any mobile
Apps APIs
Generate and easily using cloud API
29
Linux virtual machines
Stock up virtual machines for Ubuntu, Red Hat and others
Azure CycleCloud
Create, manage, operate and optimize HPC clusters and Big Compute, whatever the scale
containers
Azure Functions
Treat events with code serverless
Fabric Service
Develop microservices and orchestrate containers on Windows or Linux
App Service
Quickly create powerful cloud applications for the web and mobile devices
container Instances
Easily perform on Azure containers without server management
container Registry
Store and manage the container images on all kinds Azure deployment
Web Apps
Quickly create and deploy across critical web applications
mobile apps
Create and host the main application for any mobile
30
Apps APIs
Generate and easily using cloud API
DevOps
Azure DevOps
Services enables teams to share code, track tasks and deliver software
Azure Pipelines
Create, test and deploy continuously on the platform and the cloud of your choice
Azure Boards
Schedule and track tasks your team and discuss about them
Azure Rest
Access an unlimited number of private Git repositories hosted in the cloud for your project
Azure Artifacts
Create, share and hosting packages with your team
31
multimedia data
Content Distribution Network
Ensure the distribution of reliable and secure content with broad general
Media Services
Rip, store and stream audio and large-scale video
encoding
studio type encoding across the cloud
Content protection
Provide secure content using AES, PlayReady, Widevine and of Fairplay
Digital Analytics
Learn with video files with voice and video services
video Indexer
Pull insights of your videos
32
Azure Advisor
Your personalized recommendation engine on best practices Azure
Scheduler
Execute tasks as simple or complex periodic planning
Automation
Simplify management of cloud by automating processes
Traffic Manager
Route the incoming traffic ensuring high performance and availability
Azure Monitor
total observability of applications, infrastructure and network
Network Watcher
diagnostic solution and network performance monitoring
Cloud Shell
Simplify the administration of Azure with a shell browser based
33
Azure Policy
Implement corporate governance and standards to scale your resources Azure
Expenses management
Optimize your spending on cloud while maximizing the potential of the cloud
Azure Migrate
Detect, assess, size and easily migrate your local virtual machines to Azure
Azure Blueprint
Enable rapid and reproducible creation of regulated environments
IA + Machine Learning
Azure Batch AI
Experiment and easily train your deep learning models and artificial intelligence in parallel and across
Azure Databricks
Platform for rapid, simple and collaborative-based Apache Spark
Search Azure
Search as fully managed services
34
Custom Search Bing
simple search tool to use, without advertising and commercial quality that can provide the desired
results
cognitive Services
Add intelligent API functionality for contextual applications
Computer Vision
Uncover relevant information from images
Content Moderator
automated moderation of images, text and videos
35
personalized speech
Overcoming barriers to speech recognition, such as speech style, background noise and vocabulary
personalized vision
Easily customize your computer vision models to fit your use case
Emotion
Customize the user experience with emotion recognition
Face
Detect, analyze, organize and identify faces in your photos
Microsoft Genomics
Power sequencing and genomics research information
Language Understanding
Train your applications to include the commands of your users
linguistic analysis
Simplify complex language concepts and parse text with the API linguistic analysis
QnA Maker
36
Distill information through conversational style of answers in which it is easy to navigate
Speaker recognition
Identify and verify the speakers by their voice
voice translation
Easily integrate voice translation in real time to your application
Speech Recognition
The API Voice recognition is part of voice services Cognitive Azure Services
Text Analysis
Easily evaluate the feelings and issues to understand what customers want
Vocal synthesis
Convert the speech to text to create more natural interfaces accessible
video Indexer
Pull insights of your videos
Identity
37
Azure Active Directory B2C
Identity management and access of consumers to the cloud
Integration
Event Grid
Enjoy reliable delivery of large-scale event
Logic Apps
Automate access your data and use of these in different clouds without writing code
API Management
Publish API safely and on a large scale for developers, partners and employees
Bus Service
Connect to private and public cloud environments
Internet of Things
Azure Functions
Treat events with code serverless
38
Accelerators Azure IoT solution
Create fully customized solutions using templates for common scenarios IoT
Azure Sphere
Connect securely powered devices microcontrollers (MCUs) silicon Cloud
Azure Maps
The API simple and secure geolocation provide geospatial context to data
Event Grid
Enjoy reliable delivery of large-scale event
Logic Apps
Automate access your data and use of these in different clouds without writing code
Hubs notification
Send push notifications to any platform from a main application
39
Azure Cosmos DB
Multi-database, globally distributed and available at any scale
API Management
Publish API safely and on a large scale for developers, partners and employees
Migration
Expenses management
Optimize your spending on cloud while maximizing the potential of the cloud
Azure Migrate
Detect, assess, size and easily migrate your local virtual machines to Azure
Data Box
secure and consolidated Appliance for Azure data transfer
Networking
Content Distribution Network
Ensure the distribution of reliable and secure content with broad general
ExpressRoute
private network dedicated fiber optic Azure
40
Azure DNS
Host your DNS domain in Azure
Virtual Network
Getting private network service with possibility of connection to local data centers
Traffic Manager
Route the incoming traffic ensuring high performance and availability
Load Balancer
Provide high availability and optimum network performance for your applications
VPN gateway
Establish a secure connectivity between local
Gateway Application
Create secure front-end Web servers, scalable and high availability in Azure
Network Watcher
diagnostic solution and network performance monitoring
Azure Firewall
Features native firewall, with high availability and integrated cloud unlimited scalability, and
maintenance-free
Virtual WAN
Optimize and automate connectivity between branches via Azure
41
Mobile
App Service
Quickly create powerful cloud applications for the web and mobile devices
Azure Maps
The API simple and secure geolocation provide geospatial context to data
Hubs notification
Send push notifications to any platform from a main application
Web Apps
Quickly create and deploy across critical web applications
mobile apps
Create and host the main application for any mobile
Apps APIs
Generate and easily using cloud API
Xamarin
Create faster cloud mobile applications
42
development tools
Visual Studio
powerful and flexible environment for developing applications in the cloud
Azure DevOps
Services enables teams to share code, track tasks and deliver software
CLI
Create, deploy, diagnose and manage scalable multiplatform applications and services
Azure Pipelines
Create, test and deploy continuously on the platform and the cloud of your choice
security
Azure Active Directory
Synchronize local directories and enable SSO
43
Azure Information Protection
Maximize the protection of sensitive information, anywhere and continually
Key Vault
Keep keys and other secrets and keep in control
Security Center
Unify security management features and enable advanced protection against threats across hybrid cloud
workloads
VPN gateway
Establish a secure connectivity between local
Gateway Application
Create secure front-end Web servers, scalable and high availability in Azure
Storage
Storage
Cloud storage, durable, highly available and scalable
Azure backup
Simplify data protection and protect against rançongiciels
StorSimple
Lower costs than with a hybrid cloud storage solution for businesses
44
Azure Data Storage Lake
Functionality Data Lake secure, massively scalable Azure Blob Storage based on
Blob storage
Object-based storage REST for unstructured data
Disk Storage
persistent disks and secure options that support virtual machines
managed disks
Persistent storage, secure disk for virtual machines Azure
Queue Storage
Changing your applications depending on traffic
File storage
File shares using the SMB 3.0 standard
Data Box
secure and consolidated Appliance for Azure data transfer
Storage Explorer
View and interact with Azure storage resources
archive storage
Best price sector for storing infrequently used data
45
Web
App Service
Quickly create powerful cloud applications for the web and mobile devices
Search Azure
Search as fully managed services
Hubs notification
Send push notifications to any platform from a main application
API Management
Publish API safely and on a large scale for developers, partners and employees
Web Apps
Quickly create and deploy across critical web applications
mobile apps
Create and host the main application for any mobile
Apps APIs
Generate and easily using cloud API
46
Data base
mongodb
Experimental
Obsolete
This Service is no longer available. Please search for Compose services in the catalog INSTEAD hand.
mysql
Experimental
Obsolete
This Service is no longer available. Please search for Compose services in the catalog INSTEAD hand.
postgresql
Experimental
Obsolete
This Service is no longer available. Please search for Compose services in the catalog INSTEAD hand.
47
Create a sample implementation That Demonstrates how to use an IBM XPages Application connected
to an IBM XPages NoSQL Database Service.
Web and Mobile
rabbitmq
Experimental
Obsolete
This Service is no longer available. Please search for Compose services in the catalog INSTEAD hand.
repeat
Experimental
Obsolete
This Service is no longer available. Please search for Compose services in the catalog INSTEAD hand.
Web and Application
Cost and Asset Management
Experimental
Hybrid Cloud Cost and Asset Management service broker
Analytics instrument
Experimental
Leverage IBM Algorithmics sophisticated financial models to price and compute analytics are financial
securities.
Investment Portfolio
Experimental
Maintain a record of your investment portfolios through time.
portfolio Optimization
Experimental
Construct gold rebalance investment portfolios is based investor goals, mandates, and preferences.
Real-Time Payments
Experimental
Manage participants, tokens and containers, and initiate and Receive real time payments.
48
Simulated Historical Instrument Analytics
Experimental
Leverage IBM Algorithmics sophisticated financial models to price and compute analytics are financial
securities for a historical day, under a scenario.
cloud computing
With global Google Cloud computing, create and evolve faster than ever before on one of the most
important private networks and the fastest in the world.
49
VM adaptable and high performance
App Engine
Kubernetes Engine
50
Run container applications
Prepare your applications to the cloud and migrate them at your own pace
Cloud Functions
Knative
51
Components for creating software Kubernetes native and modern cloud-based
VM protected BETA
Container security
Kubernetes Engine
52
fluid and scalability on demand
arrow_forward Databases
Cloud SQL
Cloud Bigtable
53
NoSQL database service oriented column
Spanner cloud
Cloud Datastore
54
completely managed memory data storage service
Spanner cloud
55
strategic and scalable relational database service
monitoring
56
Monitoring Service EARLY ACCESS
Logging
error Reporting
57
Trace
Debugger
Profile
58
transparent Service Level Indicators
Watch for Google Cloud services and their effect on your workloads
Cloud Console
59
Cloud Shell
60
Cloud API
Develop, secure, deploy and monitor your API wherever they are
61
Accelerate the development of new digital services based API FHIR
Accelerate the opening of banking services and the application of Directive PSD2
Apigee Sense
Analytics API
62
Insights on operational and business metrics for API
API Monetization
Cloud Endpoints
Developer Portal
63
Put a key platform in hand and self-service available to developers and API teams
persistent Disk
64
blocks of storage for VM instances
Migration
65
Data transfer
command line tools for developers to transfer data over the network
Transfer Appliance
Transfer data between different storage cloud services like AWS S3 and Google Cloud Storage
66
BigQuery data transfer service
67
Cloud Armor
Cloud CDN
Cloud Interconnect
68
Cloud DNS
telemetry network
69
Developer Tools arrow_forwardCloud SDK
container Registry
Cloud Build
70
Cloud Source Repositories
A unique place where your team can simultaneously store, manage and monitor the code
71
Cloud Tools for Visual Studio
72
Maven Plug-in App Engine
Firebase Crashlytics
73
Monitoring Kubernetes with Stackdriver
Include metrics, newspapers, events and metadata from Kubernetes and Prometheus
Internet of Things
74
TPU Edge EARLY ACCESS
multimedia solutions
Anvato
75
Read live video and on demand on any device
Zync Render
Offer your reports directly from your 3D modeling tools, quickly and inexpensively
Apigee
API control and visibility throughout your organization and across multiple clouds
76
Firebase
Create better mobile applications, improve the quality of your applications and grow your business
77
Data AnalysisBigQuery
Cloud Dataflow
Cloud Dataproc
78
Hadoop Spark and managed service
Cloud Datalab
Cloud Dataprep
cloud data service to explore, clean and prepare data for analysis
79
Ingest event stream wherever you are, at any scale
Cloud Compose
Genomics
80
business analysis for optimized customer experience
BigQuery
81
Data Warehouse fully managed and highly adaptable with integrated ML
Cloud TPU
82
Cloud Machine Learning Engine
83
Cloud Natural Language
Cloud Speech-to-Text
Cloud Text-to-Speech
84
Cloud Translation
Cloud Vision
85
Firebase Predictions BETA
Train high quality custom models ML, effortlessly and regardless of your level of knowledge
86
WATCH THE VIDEOLEARN MOREarrow_forward
Protect the identity of your users and help them to achieve your policy, regulatory and business goals
using Google Cloud security solutions.
Cloud Identity
Easily manage user identities, devices and applications from a single console
security arrow_forward
87
Cloud IAM
Firebase Authentication
Secure access to applications deployed on GCP through identity information and context
88
Cloud API Data Loss Prevention
89
Cloud HSM ALPHA
Protect encryption keys using a security module hardware fully managed service
Define secure access zones for sensitive data in the Google Cloud Platform services
90
Resource Manager
comprehensive platform for managing security and risk data for GCP
91
Access Transparency
Get visibility into your cloud provider through near real-time logs
Kubernetes Engine
Réimaginez your working methods. The G Suite tools used by the teams themselves allow them to
collaborate, create and innovate together iterations more quickly.
92
Suite G
An integrated suite of secure applications for collaboration and productivity, and native cloud provided
by Google IA
Communication
Gmail
diary
93
online calendars designed for teamwork
Hangouts Chat
Hangouts Meet
Google+
94
secure enterprise social network
Control
admin
Vault
95
Managing Mobile Devices
Mobile device management for Android, iOS and Windows, among others
Creation
Docs
Sheets
96
rapid creation and online advanced spreadsheets
Slides
Forms
Sites
97
easy to create team sites
Keep
Access
Drive
98
Cloud Search
Hire Google
Recruit faster. Hire is a collaborative recruiting application that integrates seamlessly with G Suite
99
Create immersive experiences locating and developing your business with comprehensive real-time
data
Maps
Give the world your users with custom maps and Street View
routes
Help users reach their destination with complete data and real-time traffic information
100
Places
Help users discover the world with detailed information on over 100 million points of interest
race Sharing
Integrate Google Maps to your application taxi services / Hybrid to provide reliable routes in real time
Games
101
Resource Tracking
Take advantage of precise global positioning data and updated in real time for your fleet, your
employees and your devices
Games
Create immersive games and true to life with updated global data
102
Specially designed for mobile workers and international, browser, meeting hardware tools, devices and
the Google Cloud OS help your team stay connected.
Chrome Enterprise
Easily manage Chromebooks through the Chrome OS and the Chrome Browser
Android Enterprise
Jamboard
103
A collaborative digital whiteboard to visualize your ideas
An efficient and fast video conferencing system for your meeting rooms
104
What is Amazon Athena: a complete overview
Amazon Athena is probably the most promising of the services announced last week in
Las Vegas. In fact, big data was one of the main topics discussed at re:Invent 2016, together
with AI and IoT. We gathered a lot of information on Athena at the special session led by
Rahul Pathak, general manager of Amazon EMR at AWS. In this post, I will cover Athena’s
main features, use cases, and pricing details.
What is Amazon Athena? It is an interactive query service that makes it easy to directly
analyze data on Amazon S3 using standard SQL. It means that you can store structured data
on S3 and query that data as you’d do with an SQL database. Athena is serverless,
meaning that there is no infrastructure to manage, no setup, servers, or data warehouses.
The power of S3 storage is fully unleashed by the new Athena query engine without the need
for maintenance. No infrastructure or administration is required: You can just create a table,
load some data, and start querying.
105
As mentioned during the session, Athena complements Amazon Redshift and Amazon
EMR.
106
time custom keys. Of course, you can connect to Athena with your favourite SQL client.
You can store data in the form of objects with several file formats:
JSON
Compressed files
Eventually, you may want to use Hive CTAS or Spark to convert data to ORC and PARQUET
formats.
As soon as you perform a query you will obtain a data stream directly from Amazon S3, just
as if you were querying a real SQL database. Queries can be executed both through APIs or
107
from the AWS Console. By using the AWS Console, you will also get the query running time
and the amount of data scanned, in bytes.
With Amazon Athena, you won’t have to worry about scaling, performance, and
maintenance. You will have enough compute resources to get fast, interactive query
performance. Athena will automatically execute queries in parallel over petabytes of data.
Therefore, most results will come back within seconds. This is made possible because Athena
uses warm compute pools across multiple Availability Zones.
108
Finally, the built-in integration with Amazon QuickSight allows you to visualize your data.
In such scenarios, the need to store gigabytes or petabytes of structured data can be a
real problem. Accessing that data in a fast, easy, and secure way is even more difficult,
painful, and time-consuming. Athena is focused on solving these problems by mixing together
the power of Amazon S3 storage and the SQL query language. This allows you to operate on
109
your data easily and without worrying about scaling. Indeed, you will get results within
seconds, even on very large datasets.
DDL statements (CREATE, ALTER, DROP), partitioning queries, and failed queries are
completely free. If you cancel a query, you will be charged only for the scanned data up to
that point. Of course, you can reduce costs by using compression, columnar formats,
and partitions. With such techniques, Athena will have to scan fewer data from Amazon S3.
In practice, there is no charge directly related to computation itself, so you can always
estimate the total cost purely based on the amount of data that you need to work with.
110
processed. When the Multi-AZ option is enabled, they will have two availability zones for their domain
and the required resources, updates, and instances are provided to the domain in both availability zones.
111
deployed. In case the amount of data in the search index is reduced, this time Amazon CloudSearch
scales down the domain either by using a smaller search instance type or reducing the number of
partitions.
Manage Search Traffic
In addition, Amazon CloudSearch also scales up or down a search domain based on the amount of
search traffic. So if there is a large volume of search requests or queries, thereby increasing the amount
of search traffic, the search service will automatically replicate the search instance used for processing
those requests and partition the search index. In this example, there is one search instance replica and
one search index partition. When the instance replica can no longer handle the search traffic, the
partition will be replicated as well as the search instance, increasing the number of partitions and
instances in the domain.
Search Domain Configuration Made Easy
Amazon CloudSearch’s architecture is comprised of three services, namely configuration service,
document service, and search service. These services define how users can interact with Amazon
CloudSearch. The configure service permits them to configure their search domain and one way of
doing that is by defining how data are being indexed. They can set up different indexing options that
enable them to map their data and indicate what data can be searched and retrieved from the search
index.
Configure Scaling Options
The configuration service also allows users to customize the scaling of their domain. Here, they can
prescale the search domain by specifying the type of search instance that will be used as well how
many times instance replication and index partition will happen whenever they are importing a large
volume of data or expecting a spike in the search traffic.
Control The Ranking Of Search Results
With Amazon CloudSearch, the ranking of search results can be controlled. Through the aid of
numerical expressions, users will be able to define factors and even combine them, and associate scores
with them to ensure that the most relevant documents and data are ranked higher in the search results.
For instance, they can set up a numerical expression for calculating rank scores based on how often a
term within a document is being searched and how popular the document is.
Document Service
Meanwhile, if users want to modify the data in their search index, they can take advantage of the
document service. This service allows them to import data into their domain. Every data they send to
the domain is stored and represented as a document which has a unique ID and index fields. The index
fields organize all the specific data that will be indexed and made accessible from the search results.
Innovative Ways To Process Search Queries
Amazon CloudSearch’s architecture makes it possible to process different types and forms of search
queries and return search results in various ways. It can drill down into specific data within index
fields, generate facet information, support complex Boolean searches, and parse search queries.
112
Overview of Amazon CloudSearch Features
Create a Search Domain
Automated Provisioning and Maintenance of Search Domain
Search Instances
Scale Based on Search Index Data and Search Traffic
Replication of Search Instances
Partitioning of Search Index
Multi-Availablity Zone Option
Configuration Service
Indexing Options
Index Fields
Text Analysis Schemes
Availability Options
Scaling Options
Suggesters
Rank Search Results through Numeric Expressions
Document Service
Change Searchable Data
Search Service
Unique Search HTTP Endpoint for every Domain
Rich Query Language
Search Features
Free Text, Boolean, and Faceted Search
Field Weighting
Geospatial Search
Support for 34 Languages
113
If you are considering Amazon CloudSearch it might also be beneficial to check out other
subcategories of Best Site Search Solutions gathered in our base of B2B software reviews.
Organizations have unique needs and requirements and no software application can be ideal in such a
condition. It is useless to try to find an ideal out-of-the-box software app that meets all your business
needs. The wise thing to do would be to customize the solution for your unique needs, employee skill
levels, budget, and other aspects. For these reasons, do not hasten and subscribe to well-publicized
popular solutions. Though these may be widely used, they may not be the perfect fit for your unique
requirements. Do your research, investigate each short-listed platform in detail, read a few Amazon
CloudSearch reviews, call the vendor for explanations, and finally settle for the application that
provides what you need.
114
$0.416 per hour
US West (Northern California) search.m3.2xlarge
$0.832 per hour
US West (Oregon) search.m1.small
$0.059 per hour
US West (Oregon) search.m3.medium
$0.094 per hour
US West (Oregon) search.m3.large
$0.188 per hour
US West (Oregon) search.m3.xlarge
$0.376 per hour
US West (Oregon) search.m3.2xlarge
$0.752 per hour
Asia Pacific (Seoul) search.m4.large
$0.255 per hour
Asia Pacific (Seoul) search.m4.xlarge
$0.511 per hour
Asia Pacific (Seoul) search.m4.2xlarge
$1.023 per hour
Asia Pacific (Singapore) search.m1.small
$0.078 per hour
Asia Pacific (Singapore) search.m3.medium
$0.132 per hour
Asia Pacific (Singapore) search.m3.large
$0.264 per hour
Asia Pacific (Singapore) search.m3.xlarge
$0.528 per hour
Asia Pacific (Singapore) search.m3.2xlarge
$1.056 per hour
115
Asia Pacific (Sydney) search.m1.small
$0.078 per hour
Asia Pacific (Sydney) search.m3.medium
$0.132 per hour
Asia Pacific (Sydney) search.m3.large
$0.264 per hour
Asia Pacific (Sydney) search.m3.xlarge
$0.528 per hour
Asia Pacific (Sydney) search.m3.2xlarge
$1.056 per hour
Asia Pacific (Tokyo) search.m1.small
$0.082 per hour
Asia Pacific (Tokyo) search.m3.medium
$0.136 per hour
Asia Pacific (Tokyo) search.m3.large
$0.272 per hour
Asia Pacific (Tokyo) search.m3.xlarge
$0.544 per hour
Asia Pacific (Tokyo) search.m3.2xlarge
$1.088 per hour
EU (Franfurt) search.m3.medium
$0.112 per hour
EU (Franfurt) search.m3.large
$0.224 per hour
EU (Franfurt) search.m3.xlarge
$0.448 per hour
EU (Franfurt) search.m3.2xlarge
$0.896 per hour
EU (Ireland) search.m1.small
116
$0.063 per hour
EU (Ireland) search.m3.medium
$0.104 per hour
EU (Ireland) search.m3.large
$0.208 per hour
EU (Ireland) search.m3.xlarge
$0.416 per hour
EU (Ireland) search.m3.2xlarge
$0.832 per hour
South America (Sao Paolo) search.m1.small
$0.078 per hour
South America (Sao Paolo) search.m3.medium
$0.128 per hour
South America (Sao Paolo) search.m3.large
$0.256 per hour
South America (Sao Paolo) search.m3.xlarge
$0.512 per hour
South America (Sao Paolo) search.m3.2xlarge
$1.024 per hour
Batch Uploads
$0.10 per 1,000 Batch
IndexDocuments Requests
$0.98 per GB
US East (N. Virigina) Data Transfer In
$0.000 per GB
US East (N. Virigina) Data Transfer Out
$0.090 per GB/First 10TB/ month
US West (Northern California) Data Transfer In
$0.000 per GB
117
US West (Northern California) Data Transfer Out
$0.090 per GB/First 10TB/ month
US West (Oregon) Data Transfer In
$0.000 per GB
US West (Oregon) Data Transfer Out
$0.090 per GB/First 10TB/ month
Asia Pacific (Seoul) Data Transfer In
$0.000 per GB
Asia Pacific (Seoul) Data Transfer Out
$0.126 per GB/First 10TB/ month
Asia Pacific (Singapore) Data Transfer In
$0.000 per GB
Asia Pacific (Singapore) Data Transfer Out
$0.120 per GB/First 10TB/ month
Asia Pacific (Sydney) Data Transfer In
$0.000 per GB
Asia Pacific (Sydney) Data Transfer Out
$0.140 per GB/First 10TB/ month
Asia Pacific (Tokyo) Data Transfer In
$0.000 per GB
Asia Pacific (Tokyo) Data Transfer Out
$0.140 per GB/First 10TB/ month
EU (Frankfurt) Data Transfer In
$0.000 per GB
EU (Frankfurt) Data Transfer Out
$0.090 per GB/First 10TB/ month
EU (Ireland) Data Transfer In
$0.000 per GB
EU (Ireland) Data Transfer Out
118
$0.090 per GB/First 10TB/ month
South America (Sao Paolo) Data Transfer In
$0.000 per GB
South America (Sao Paolo) Data Transfer Out
$0.250 per GB/First 10TB/ month
With no set-up fees and upfront commitments, Amazon CloudSearch doesn’t offer an enterprise pricing
plan. Instead, it is using a pay-as-you-go pricing method wherein you only need to pay for your search
instance usage which pricing is calculated on an hourly basis, the total number of the batch documents
you uploaded to the search domain, the amount of data you stored in the search domain when you
explicitly made IndexDocuments requests/calls, and the amount of data you transferred in and
out Amazon CloudSearch.
In addition, all of the charges are billed on a monthly basis and vary depending on which region you
belong. Here are the details:
Search Instances
US East (N. Virigina)
search.m1.small – $0.059 per hour
search.m3.medium – $0.094 per hour
search.m3.large – $0.188 per hour
search.m3.xlarge -$0.376 per hour
search.m3.2xlarge – $0.752 per hour
US West (Northern California)
search.m1.small – $0.063 per hour
search.m3.medium – $0.104 per hour
search.m3.large – $0.208 per hour
search.m3.xlarge – $0.416 per hour
search.m3.2xlarge – $0.832 per hour
US West (Oregon)
search.m1.small – $0.059 per hour
search.m3.medium – $0.094 per hour
search.m3.large – $0.188 per hour
search.m3.xlarge – $0.376 per hour
119
search.m3.2xlarge – $0.752 per hour
Asia Pacific (Seoul)
search.m4.large – $0.255 per hour
search.m4.xlarge – $0.511 per hour
search.m4.2xlarge – $1.023 per hour
Asia Pacific (Singapore)
search.m1.small – $0.078 per hour
search.m3.medium – $0.132 per hour
search.m3.large – $0.264 per hour
search.m3.xlarge – $0.528 per hour
search.m3.2xlarge – $1.056 per hour
Asia Pacific (Sydney)
search.m1.small – $0.078 per hour
search.m3.medium – $0.132 per hour
search.m3.large – $0.264 per hour
search.m3.xlarge – $0.528 per hour
search.m3.2xlarge – $1.056 per hour
Asia Pacific (Tokyo)
search.m1.small – $0.082 per hour
search.m3.medium – $0.136 per hour
search.m3.large – $0.272 per hour
search.m3.xlarge – $0.544 per hour
search.m3.2xlarge – $1.088 per hour
EU (Franfurt)
search.m3.medium – $0.112 per hour
search.m3.large – $0.224 per hour
search.m3.xlarge – $0.448 per hour
search.m3.2xlarge – $0.896 per hour
120
EU (Ireland)
search.m1.small – $0.063 per hour
search.m3.medium – $0.104 per hour
search.m3.large – $0.208 per hour
search.m3.xlarge – $0.416 per hour
search.m3.2xlarge – $0.832 per hour
South America (Sao Paolo)
search.m1.small – $0.078 per hour
search.m3.medium – $0.128 per hour
search.m3.large – $0.256 per hour
search.m3.xlarge – $0.512 per hour
search.m3.2xlarge – $1.024 per hour
Pricing is per instance-hour consumed for each search instance, from the time the instance is launched
until it is terminated. Each partial instance-hour consumed is billed as a full hour.
When you enable the Multi-AZ option for enhanced data durability and availability, Amazon
CloudSearch provisions and maintains additional search instances in a different Availability Zone.
Search traffic is distributed across all of the instances and the instances in either zone are capable of
handling the full load in the event of a service disruption. When you enable the Multi-AZ option, you
are charged for the additional search instance hours used at the regular rates for the applicable region.
Previous Generation Search Instances
Below are the prices for previous generation instances, which are only available for existing search
domains. All newly created domains will be provisioned with the higher performance newer generation
instance options.
US East (N. Virigina)
search.m1.large – $0.236 per hour
search.m2.xlarge – $0.306 per hour
search.m2.2xlarge – $0.613 per hour
US West (Northern California)
search.m1.large – $0.257 per hour
search.m2.xlarge – $0.344 per hour
search.m2.2xlarge – $0.688 per hour
121
US West (Oregon)
search.m1.large – $0.236 per hour
search.m2.xlarge – $0.306 per hour
search.m2.2xlarge – $0.613 per hour
Asia Pacific (Singapore)
search.m1.large – $0.315 per hour
search.m2.xlarge – $0.370 per hour
search.m2.2xlarge – $0.740 per hour
Asia Pacific (Sydney)
search.m1.large – $0.315 per hour
search.m2.xlarge – $0.370 per hour
search.m2.2xlarge – $0.740 per hour
Asia Pacific (Tokyo)
search.m1.large – $0.328 per hour
search.m2.xlarge – $0.359 per hour
search.m2.2xlarge – $0.719 per hour
EU (Ireland)
search.m1.large – $0.257 per hour
search.m2.xlarge – $0.344 per hour
search.m2.2xlarge – $0.688 per hour
South America (Sao Paolo)
search.m1.large – $0.315 per hour
search.m2.xlarge – $0.404 per hour
search.m2.2xlarge – $0.806 per hour
Pricing is per instance-hour consumed for each search instance, from the time the instance is launched
until it is terminated. Each partial instance-hour consumed is billed as a full hour.
When you enable the Multi-AZ option for enhanced data durability and availability, Amazon
CloudSearch provisions and maintains additional search instances in a different Availability Zone.
Search traffic is distributed across all of the instances and the instances in either zone are capable of
handling the full load in the event of a service disruption. When you enable the Multi-AZ option, you
are charged for the additional search instance hours used at the regular rates for the applicable region.
122
Batch Uploads
You are billed for the total number of document batches uploaded to your search domain. Uploaded
documents are automatically indexed.
$0.10 per 1,000 Batch Upload Requests (the maximum size for each batch is 5 MB)
IndexDocuments Requests
When you make configuration changes to your index, for example by adding a field, you will need to
rebuild the index. To do this, you use the AWS Management Console, command line tools, AWS SDKs,
or APIs to issue an IndexDocuments request. The charge for this request is:
$0.98 per GB of data stored in your search domain
Amazon CloudSearch may occasionally issue these calls for you. For example, as you add data to your
domain, Amazon CloudSearch may proactively rebuild your index to improve query performance. You
will not be charged in this case, and others, where you do not explicitly call IndexDocuments.
Data Transfer
The pricing below is based on data transferred “in” and “out” of Amazon CloudSearch.
US East (N. Virigina)
Data Transfer In
All Data Transfer In – $0.000 per GB
Data Transfer Out
First 10 TB / month – $0.090 per GB
Next 40 TB / month – $0.085 per GB
Next 100 TB / month – $0.070 per GB
US West (Northern California)
Data Transfer In
All Data Transfer In – $0.000 per GB
Data Transfer Out
First 10 TB / month – $0.090 per GB
Next 40 TB / month – $0.085 per GB
Next 100 TB / month – $0.070 per GB
US West (Oregon)
Data Transfer In
123
All Data Transfer In – $0.000 per GB
Data Transfer Out
First 10 TB / month – $0.090 per GB
Next 40 TB / month – $0.085 per GB
Next 100 TB / month- $0.070 per GB
Asia Pacific (Seoul)
Data Transfer In
All Data Transfer In – $0.000 per GB
Data Transfer Out
First 10 TB / month – $0.126 per GB
Next 40 TB / month – $0.122 per GB
Next 100 TB / month – $0.117 per GB
Asia Pacific (Singapore)
Data Transfer In
All Data Transfer In – $0.000 per GB
Data Transfer Out
First 10 TB / month- $0.120 per GB
Next 40 TB / month – $0.085 per GB
Next 100 TB / month – $0.082 per GB
Asia Pacific (Sydney)
Data Transfer In
All Data Transfer In – $0.000 per GB
Data Transfer Out
First 10 TB / month – $0.140 per GB
Next 40 TB / month – $0.135 per GB
Next 100 TB / month – $0.130 per GB
Asia Pacific (Tokyo)
Data Transfer In
124
All Data Transfer In – $0.000 per GB
Data Transfer Out
First 10 TB / month – $0.140 per GB
Next 40 TB / month – $0.135 per GB
Next 100 TB / month – $0.130 per GB
EU (Frankfurt)
Data Transfer In
All Data Transfer In – $0.000 per GB
Data Transfer Out
First 10 TB / month – $0.090 per GB
Next 40 TB / month – $0.085 per GB
Next 100 TB / month – $0.070 per GB
EU (Ireland)
Data Transfer In
All Data Transfer In – $0.000 per GB
Data Transfer Out
First 10 TB / month – $0.090 per GB
Next 40 TB / month – $0.085 per GB
Next 100 TB / month – $0.070 per GB
South America (Sao Paolo)
Data Transfer In
All Data Transfer In – $0.000 per GB
Data Transfer Out
First 10 TB / month $0.250 per GB
Next 40 TB / month $0.230 per GB
Next 100 TB / month $0.210 per GB
Data transferred between Amazon CloudSearch and AWS services in the same region is free.
Data transferred between Amazon CloudSearch and AWS services in different regions will be charged
as Internet Data Transfer on both sides of the transfer.
125
For traffic sent between Amazon CloudSearch and Amazon EC2 instances in the same region, you are
only charged for the Data Transfer in and out of the Amazon EC2 instances, and standard Amazon EC2
Regional Data Transfer charges apply. For additional information, please visit the official website of
AWS and check the EC2 pricing.
You can always see the resources you’re consuming in Amazon CloudSearch via the Account Activity
page on the AWS website, the AWS Management Console, CloudSearch command line tools, or
CloudSearch APIs.
Amazon EMR
Amazon Elastic MapReduce (EMR) is an Amazon Web Services (AWS) tool for big
data processing and analysis. Amazon EMR offers the expandable low-
configuration service as an easier alternative to running in-house cluster
computing.
Amazon EMR is based on Apache Hadoop, a Java-based programming framework that supports
the processing of large data sets in a distributed computing environment. MapReduce is a
software framework that allows developers to write programs that process massive amounts of
unstructured data in parallel across a distributed cluster of processors or stand-alone computers.
It was developed at Google for indexing web pages and replaced their original indexing algorithms
and heuristics in 2004.
Amazon EMR processes big data across a Hadoop cluster of virtual servers on
Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
126
The elastic in EMR's name refers to its dynamic resizing ability, which allows it to
ramp up or reduce resource use depending on the demand at any given time.
Amazon EMR is used for data analysis in log analysis, web indexing, data
warehousing, machine learning, financial analysis, scientific simulation,
bioinformatics and more. EMR also supports workloads based on Apache Spark,
Presto and Apache HBase -- the latter of which integrates with Hive and Pig for
additional functionality.
SearchBlox for Amazon Elasticsearch Service is an enterprise search platform for the AWS Cloud thats
uses the Amazon Elasticsearch Service, the fully managed and scalable Elasticsearch service available
on Amazon Web Services (AWS). SearchBlox for Amazon Elasticsearch Service can crawl, index and
search content across multiple datasources including file systems, websites, databases and applications.
Architecture
This service consists of two types of SearchBlox servers that are available through the AWS
marketplace. The first is SearchBlox IndexServer. The SearchBlox IndexServer can crawl and index
content in over 40 document formats including PDFs, HTML and Microsoft Word, Excel, Powerpoint
directly into Amazon Elasticsearch Service. The second type of server is the SearchBlox
127
SearchServer. The SearchBlox SearchServer provides ready-to-use, fully customizable search front-
ends including faceted search for the indexes created by the SearchBlox IndexServer in the Amazon
Elasticsearch Service.
Setup
AWS Region
Please make sure to select the same AWS Region in all the steps mentioned below. For example, we
have chosen "us-east-1" for creating elasticsearch, SearchBlox IndexServer , SearchBlox SearchServer,
etc.
1. Create VPC
Create a VPC, which needs to be mentioned while creating a SearchBlox IndexServer at AWS
Marketplace.
128
2. Create KeyPair
Create a Key Pair, and store it safely to access your AWS instance.
129
SSH
Use the key pair to SSH to the AWS instance. If you are using Windows, use puttygen to convert the
pem file to ppk file. Use this ppk file to connect to the instance using putty.
130
131
132
4. Create AWS Elasticsearch Domain
1.Give the Domain name and select Elasticsearch version 5.1.
Elasticsearch Version
SearchBlox currently supports only Elasticsearch 5.1 on Amazon Elasticsearch Service.
133
2. Give the number of instances (between 1 and 20) and select the instance type as
c4.xlarge.elasticsearch.
134
4. You can specify the start hour where Amazon AWS takes a snapshot of the cluster. Please
specify the UTC time in the field.
5. You can specify access to and from a specific domain, i.e., index and search servers, by giving
the private IPs of those servers. Select Allow access to the domain from the specific IP(s).
135
6. Specify the comma-separated IPs.
8. After configuring and connecting SearchBlox IndexServer (check the next section) you can
136
View Cluster health
View Status of Indices
View the mappings of fields within the indices
Monitor the status of the Elasticsearch service
137
5. Start SearchBlox IndexServer via Amazon Marketplace.
Go to the AWS Marketplace: https://aws.amazon.com/marketplace.
Search for SearchBlox and select IndexServer. For cluster setup, create SearchBlox SearchServer after
creating SearchBlox IndexServer.
138
Check and click continue, which will take you to the page below:
139
Select the VPC created in earlier step.
Select the Key Pair created earlier and launch the instance.
140
141
Go to EC2 Dashboard.
142
Integrate with IAM Role
This is an important step where we integrate IAM role with SearchBlox IndexServer.
Right-click the Server Instance, then go to Instance Settings -> Attach Replace IAM Role.
143
SSH into SearchBlox IndexServer
SSH into the SearchBlox IndexServer instance using the user ec2-user and the pem or ppk file.
Change user to jetty.
Shell
CopyCopiedsudo su - jetty
sudo su - jetty
144
Edit /srv/jetty/sb/webapps/searchblox/WEB-INF/elasticsearch.yml to update the properties for
AWS ES domain as follows:
YAML
CopyCopiedsearchblox.aws.region: us-east-1 searchblox.aws.url: https://search-XXXXXX.us-east-
1.es.amazonaws.com
searchblox.aws.region: us-east-1
searchblox.aws.url: https://search-XXXXXX.us-east-1.es.amazonaws.com
The aws.region is the region selected while creating SearchBlox IndexServer and the Elasticsearch
instance, which will also be available in the AWS URL in Elasticsearch. The aws.url is the endpoint
specified in the Elasticsearch instance.
145
Restart SearchBlox as follows:
Shell
CopyCopiedservice jetty restart
service jetty restart
146
Increase RAM memory for SearchBlox in AWS.
After logging on as a jetty user using the following command:
sudo su - jetty
Go to edit /etc/default/jetty file and give the memory parameters in JAVA_OPTIONS. The content of
the jetty file is given below:
12G refers 12 GB memory has been allocated to SearchBlox
Text
CopyCopiedJAVA_OPTIONS="-server -Xms12G -Xmx12G -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=70 -Djetty.http.host=0.0.0.0" JETTY_HOME=/srv/jetty
JETTY_RUN=/srv/jetty/run JETTY_USER=jetty TMPDIR=/srv/jetty/temp
JETTY_BASE=/srv/jetty/sb
JAVA_OPTIONS="-server -Xms12G -Xmx12G -XX:
+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70
-Djetty.http.host=0.0.0.0"
JETTY_HOME=/srv/jetty
JETTY_RUN=/srv/jetty/run
JETTY_USER=jetty
TMPDIR=/srv/jetty/temp
JETTY_BASE=/srv/jetty/sb
147
Kibana and Amazon Elasticsearch Service
Data indexed, as well as logs, are stored in the Elasticsearch domain. To view the logs, you can
map the Elasticsearch index named sbindexlog in Kibana and search for the entries.
The Kibana link will be available in the Domain dashboard. Refer to screenshot below:
148
Click the link and access Kibana.
149
Adding log indexes in Kibana.
The two logs that can be added in Kibana are sbindexlog and sbstatuslog. You can add both logs
in one index pattern.
150
Alternatively, you can create a separate index pattern for each log.
151
152
You can also query the logs based on URL, timestamp, etc.
153
It is also possible to delete indexes via Kibana. Go to Dev Tools in the left-hand menu. * To
delete the Elasticsearch indices, click Get to Work .
154
155
Amazon Kinesis
Amazon Kinesis is an Amazon Web Service (AWS) for processing big data in real
time.
156
Kinesis is capable of processing hundreds of terabytes per hour from high volumes
of streaming data from sources such as operating logs, financial transactions and
social media feeds. According to Amazon, Kinesis fills a gap left by Hadoop and
other technologies that process data in batches, but that don't enable real-time
operational decisions about constantly streaming data. That capability, in turn,
simplifies the process of writing apps that rely on data that must be processed in
real time.
Amazon Kinesis integrates with Amazon Redshift, Amazon Dynamo Database and
Amazon Simple Storage Service (Amazon S3), as well as with many third-party
products. Customers are billed on the standard AWS pay-as-you-go plan, with
payments based on the amount of data processed and the way in which the
information is packaged.
157
When Should I Use Amazon Aurora and When Should I use
RDS MySQL?
Now that Database-as-a-service (DBaaS) is in high demand, there is one question regarding AWS
services that cannot always be answered easily : When should I use Aurora and when RDS MySQL?
DBaaS cloud services allow users to use databases without configuring physical hardware and
infrastructure, and without installing software. I’m not sure if there is a straightforward answer, but
when trying to find out which solution best fits an organization there are multiple factors that should be
taken into consideration. These may be performance, high availability, operational cost, management,
capacity planning, scalability, security, monitoring, etc.
158
There are also cases where although the workload and operational needs seem to best fit to one
solution, there are other limiting factors which may be blockers (or at least need special handling).
In this blog post, I will try to provide some general rules of thumb but let’s first try to give a short
description of these products.
What we should really compare is the MySQL and Aurora database engines provided by Amazon RDS.
Amazon Relational Database Service (Amazon RDS) is a hosted database service which provides
multiple database products to choose from, including Aurora, PostgreSQL, MySQL, MariaDB, Oracle,
and Microsoft SQL Server. We will focus on MySQL and Aurora.
With regards to systems administration, both solutions are time-saving. You get an environment ready
to deploy your application and if there are no dedicated DBAs, RDS gives you great flexibility for
operations like upgrades or backups. For both products, Amazon applies required updates and the latest
patches without any downtime. You can define maintenance windows and automated patching (if
enabled) will occur within them. Data is continuously backed up to S3 in real time, with no
performance impact. This eliminates the need for backup windows and other, complex or not, scripted
procedures. Although this sounds great, the risk of vendor lock-in and the challenges of enforced
updates and client-side optimizations are still there.
Amazon Aurora is a relational, proprietary, closed-source database engine, with all that that implies.
RDS MySQL is 5.5, 5.6 and 5.7 compatible and offers the option to select among minor releases.
While RDS MySQL supports multiple storage engines with varying capabilities, not all of them are
optimized for crash recovery and data durability. Until recently, it was a limitation that Aurora was only
compatible with MySQL 5.6 but it’s now compatible with both 5.6 and 5.7 too.
So, in most cases, no significant application changes are required for either product. Keep in mind that
certain MySQL features like the MyISAM storage engine are not available with Amazon
Aurora. Migration to RDS can be performed using Percona XtraBackup.
159
For RDS products shell access to the underlying operating system is disabled and access to MySQL
user accounts with the “SUPER” privilege isn’t allowed. To configure MySQL variables or manage
users, Amazon RDS provides specific parameter groups, APIs and other special system procedures
which be used. If you need to enable remote access this article will help you do
so https://www.percona.com/blog/2018/05/08/how-to-enable-amazon-rds-remote-access/
Performance considerations
Although Amazon RDS uses SSDs to achieve better IO throughput for all its database services,
Amazon claims that the Aurora is able to achieve a 5x performance boost than standard MySQL and
provides reliability out of the box. In general, Aurora seems to be faster, but not always.
For example, due to the need to disable the InnoDB change buffer for Aurora (this is one of the keys
for the distributed storage engine), and that updates to secondary indexes must be write through, there
is a big performance penalty in workloads where heavy writes that update secondary indexes are
performed. This is because of the way MySQL relies on the change buffer to defer and merge
secondary index updates. If your application performs a high rate of updates against tables with
secondary indexes, Aurora performance may be poor. In any case, you should always keep in mind that
performance depends on schema design. Before taking the decision to migrate, performance should be
evaluated against an application specific workload. Doing extensive benchmarks will be the subject of
a future blog post.
Capacity Planning
Talking about underlying storage, another important thing to take into consideration is that with Aurora
there is no need for capacity planning. Aurora storage will automatically grow, from the minimum of
10 GB up to 64 TiB, in 10 GB increments, with no impact on database performance. The table size
limit is only constrained by the size of the Aurora cluster volume, which has a maximum of 64
tebibytes (TiB). As a result, the maximum table size for a table in an Aurora database is 64 TiB. For
RDS MySQL, the maximum provisioned storage limit constrains the size of a table to a maximum size
of 16 TB when using InnoDB file-per-table tablespaces.
Replication
Replication is a really powerful feature of MySQL (like) products. With Aurora, you can provision up
to fifteen replicas compared to just five in RDS MySQL. All Aurora replicas share the same underlying
160
volume with the primary instance and this means that replication can be performed in milliseconds as
updates made by the primary instance are instantly available to all Aurora replicas. Failover is
automatic with no data loss on Amazon Aurora whereas the replicas failover priority can be set.
An explanatory description of Amazon Aurora’s architecture can be found in Vadim’s post written a
couple of years ago https://www.percona.com/blog/2015/11/16/amazon-aurora-looking-deeper/
The architecture used and the way that replication works on both products shows a really significant
difference between them. Aurora is a High Availablity (HA) solution where you only need to attach a
reader and this automatically becomes Multi-AZ available. Aurora replicates data to six storage nodes
in Multi-AZs to withstand the loss of an entire AZ (Availability Zone) or two storage nodes without
any availability impact to the client’s applications.
On the other hand, RDS MySQL allows only up to five replicas and the replication process is slower
than Aurora. Failover is a manual process and may result in last-minute data loss. RDS for MySQL is
not an HA solution, so you have to mark the master as Multi-AZ and attach the endpoints.
Monitoring
Both products can be monitored with a variety of monitoring tools. You can enable automated
monitoring and you can define the log types to publish to Amazon CloudWatch. Percona Monitoring
and Management (PMM) can also be used to gather metrics.
Be aware that for Aurora there is a limitation for the T2 instances such that Performance Schema can
cause the host to run out of memory if enabled.
Costs
Aurora instances will cost you ~20% more than RDS MySQL. If you create Aurora read replicas then
the cost of your Aurora cluster will double. Aurora is only available on certain RDS instance sizes.
Instances pricing details can be found here and here.
Storage pricing may be a bit tricky. Keep in mind that pricing for Aurora differs to that for RDS
MySQL. For RDS MySQL you have to select the type and size for the EBS volume, and you have to be
sure that provisioned EBS IOPs can be supported by your instance type as EBS IOPs are restricted by
161
the instance type capabilities. Unless you watch for this, you may end up having EBS IOPs that cannot
be really used by your instance.
For Aurora, IOPs are only limited by the instance type. This means that if you want to increase IOPs
performance on Aurora you should proceed with an instance type upgrade. In any case, Amazon will
charge you based on the dataset size and the requests per second.
That said, although for Aurora you pay only for the data you really use in 10GB increments if you want
high performance you have to select the correct instance. For Aurora, regardless of the instance type,
you get billed $0.10 per GB-month and $0.20 per 1 million requests so if you need high performance
the cost maybe even more than RDS MySQL. For RDS MySQL storage costs are based on the EBS
type and size.
Percona provides support for RDS services and you might be interested in these cases studies:
Lookout Uses Percona’s Cloud Expertise to Reduce Footprint and Maintain Uptime
Madwire Achieves Performance Assurance for Amazon RDS Aurora Through Percona’s
Database Audit and Consultancy Services
When a more fully customized solution is required, most of our customers usually prefer the use of
AWS EC2 instances supported by our managed services offering.
TL;DR
If you are looking for a native HA solution then you should use Aurora
Aurora performance is great but is not as much as expected for write-intensive workloads
when secondary indexes exist. In any case, you should benchmark both RDS MySQL and
Aurora before taking the decision to migrate. Performance depends much on workload and
schema design
By choosing Amazon Aurora you are fully dependent on Amazon for bug fixes or upgrades
162
If you need to use MySQL plugins you should use RDS MySQL
Aurora only supports InnoDB. If you need other engines i.e. MyISAM, RDS MySQL is the
only option
Aurora is not included in the AWS free-tier and costs a bit more than RDS MySQL. If you only
need a managed solution to deploy services in a less expensive way and out of the box
availability is not your main concern, RDS MySQL is what you need
If for any reason Performance Schema must be ON, you should not enable this on Amazon
Aurora MySQL T2 instances. With the Performance Schema enabled, the T2 instance may run
out of memory
For both products, you should carefully examine the known issues and limitations listed here
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.KnownIssuesAndLimitati
ons.html and here
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.AuroraMySQL.html
163
Drawbacks and Alternatives to DynamoDB
A head-to-head database battle between Scylla and DynamoDB is a real David versus Goliath situation.
It’s Rocky Balboa versus Apollo Creed. Is it possible Scylla could deliver an unexpected knockout
punch against DynamoDB?
To be clear, Scylla is not a competitor to AWS at all. Many of our customers deploy Scylla to AWS, we
ourselves find it to be an outstanding platform, and on more than one occasion we’ve blogged about its
unique bare metal instances. Here’s further validation — our Scylla Cloud service runs on top of AWS.
But we do think we might know a bit more about building a real-time big data database, so we limited
the scope of this competitive challenge solely to Scylla versus DynamoDB, database-to-database.
Scylla is a drop-in replacement for Cassandra, implemented from scratch in C++. Cassandra itself was
a reimplementation of concepts from the Dynamo paper. So, in a way, Scylla is the “granddaughter” of
Dynamo. That means this is a family fight, where a younger generation rises to challenge an older one.
It was inevitable for us to compare ourselves against our “grandfather,” and perfectly in keeping with
the traditions of Greek mythology behind our name.
If you compare Scylla and Dynamo, each has pros and cons, but they share a common class of NoSQL
database: Column family with wide rows and tunable consistency. Dynamo and its Google counterpart,
Bigtable, were the first movers in this market and opened up the field of massively scalable services —
very impressive by all means.
Scylla is much younger opponent, just 4.5 years in age. Though Scylla is modeled on Cassandra,
Cassandra was never our end goal, only a starting point. While we stand on the shoulders of giants in
terms of existing design, our proven system programing abilities have come heavily into play and led to
performance to the level of a million operations per second per server. We recently announced feature
parity (minus transactions) with Cassandra, and also our own database-as-a-service offering, Scylla
Cloud.
But for now we’ll focus on the question of the day: Can we take on DynamoDB?
164
Rules of the Game
With our open source roots, our culture forces us to be fair as possible. So we picked a reasonable
benchmark scenario that’s supposed to mimic the requirements of a real application and we will judge
the two databases from the user perspective. For the benchmark we used Yahoo! Cloud Serving
Benchmark (YCSB) since it’s a cross-platform tool and an industry standard. The goal was to meet a
Service Level Agreement of 120K operations per second with a 50:50 read/write split (YCSB’s
workload A) with a latency under 10ms in the 99% percentile. Each database would provision the
minimal amount of resources/money to meet this goal. Each DB should be populated first with 1 billion
rows using the default, 10 column schema of YCSB.
We conducted our tests using Amazon DynamoDB and Amazon Web Services EC2 instances as
loaders. Scylla also used Amazon Web Services EC2 instances for servers, monitoring tools and the
loaders.
These tests were conducted on Scylla Open Source 2.1, which is the code base for Scylla Enterprise
2018.1. Thus performance results for these tests will hold true across both Open Source and Enterprise.
However, we use Scylla Enterprise for comparing Total Cost of Ownership
DynamoDB is known to be tricky when the data distribution isn’t uniform, so we selected uniform
distribution to test Dynamo within its sweet spot. We set 3 nodes of i3.8xl for Scylla, with replication
of 3 and quorum consistency level, loaded the 1 TB dataset (replicated 3 times) and after 2.5 hours it
was over, waiting for the test to begin.
Scylla Enterprise Amazon DynamoDB
Scylla Cluster Provisioned Capacity
i3.8xlarge | 32 vCPU | 244 GiB | 4 x 1.9TB 160K write | 80K read (strong consistency)
NVMe Dataset: ~1.1TB (1B partitions / size:
3-node cluster on single DC | RF=3 ~1.1Kb)
Dataset: ~1.1TB (1B partitions / size: Storage size: ~1.1 TB (DynamoDB table
~1.1Kb) metrics)
Total used storage: ~3.3TB
Workload-A: 90 min, using 8 YCSB clients, every client runs on its own data range (125M
partitions)
Loaders: 4 x m4.2xlarge (8 vCPU | 32 GiB RAM), 2 loaders per machine
Scylla workloads runs with Consistency Level = QUORUM for writes and reads.
Scylla starts with a cold cache in all workloads.
DynamoDB workloads ran with dynamodb.consistentReads = true
Sadly for DynamoDB, each item weighted 1.1kb – YCSB default schema, thus each write
originated in two accesses
165
Turns out the population stage is hard on DynamoDB. We had to slow down the population rate time
and again, despite it being well within the reserved IOPS. Sometimes we managed to populate up to 0.5
billion rows before we started to receive the errors again.
Each time we had to start over to make sure the entire dataset was saved. We believe DynamoDB needs
to break its 10GB partitions through the population and cannot do it in parallel to additional load
without any errors. The gory details:
Started population with Provisioned capacity: 180K WR | 120K RD.
We hit errors on ~50% of the YCSB threads causing them to die when using ≥50% of
write provisioned capacity.
For example, it happened when we ran with the following throughputs:
55 threads per YCSB client = ~140K throughput (78% used capacity)
45 threads per YCSB client = ~130K throughput (72% used capacity)
35 threads per YCSB client = ~96K throughput (54% used capacity)
After multiple attempts with various provisioned capacities and throughputs, eventually a
streaming rate was found that permitted a complete database population. Here are the results of
the population stage:
166
YCSB Workload / Scylla Open Source 2.1 (3x DynamoDB (160K WR | 80K RD)
Description i3.8xlarge) 8 YCSB clients
8 YCSB Clients
Population Overall Throughput(ops/sec): 104K Overall Throughput(ops/sec): 51.7K
100% Write Avg Load (scylla-server): ~85% Max Consumed capacity: WR 75%
167
Scylla on the other hand, easily met the throughput SLA, with only 58% load and latency. That was 3x-
4x better than DynamoDB and well below our requested SLA. (Also, what you don’t see here is the
huge cost difference, but we’ll get to that in a bit.)
We won’t let DynamoDB off easy, however. Now that we’ve seen how DynamoDB performs with its
ideal uniform distribution, let’s have a look at how it behaves with a real life use-case.
With that in mind, let’s see how ScyllaDB and DynamoDB behave given a Zipfian distribution access
pattern. We went back to the test case of 1 billion keys spanning 1TB of pre-replicated dataset and
168
queried it again using YCSB Zipfian accesses. It is possible to define the hot set of partitions in terms
of volume — how much data is in it — and define the percentile of access for this hot set as part from
the overall 1TB set.
We set a variety of parameters for the hot set and the results were pretty consistent – DynamoDB could
not meet the SLA for Zipfian distribution. It performed well below its reserved capacity — only 42%
utilization — but it could not execute 120k OPS. In fact, it could do only 65k OPS. The YCSB client
experienced multiple, recurring ProvisionedThroughputExceededException (code: 400)
errors, and throttling was imposed by DynamoDB.
YCSB Workload / Scylla 2.1 (3x i3.8xlarge) DynamoDB (160K WR | 80K RD)
Description 8 YCSB Clients 8 YCSB clients
Workload A Overall Throughput(ops/sec): Overall Throughput(ops/sec): 65K
50% Read / 50% Write 120.2K Avg Load (scylla-server): ~WR 42% |
Avg Load (scylla-server): ~55% RD 42%
Range: 1B partitions
READ operations (Avg): READ operations (Avg): ~21.95M
Distribution: Zipfian ~40.56M Avg. 95th Percentile Latency (ms): 6.0
Avg. 95th Percentile Latency Avg. 99th Percentile Latency (ms): 9.2
Duration: 90 min. (ms): 6.1
Avg. 99th Percentile Latency UPDATE operations (Avg): ~21.95M
Hot set: 10K partitions (ms): 8.6 Avg. 95th Percentile Latency (ms): 7.3
Hot set access: 90% Avg. 99th Percentile Latency (ms): 10.8
UPDATE operations
(Avg): ~40.56M
Avg. 95th Percentile Latency
(ms): 4.4
Avg. 99th Percentile Latency
(ms): 6.6
Why can’t DynamoDB meet the SLA in this case? The answer lies within the Dynamo model. The
global reservation is divided to multiple partitions, each no more than 10TB in size.
169
This when such a partition is accessed more often it may reach its throttling cap even though overall
you’re well within your global reservation. In the example above, when reserving 200 writes, each of
the 10 partitions cannot be queried more than 20 writes/s
170
in your database that becomes extremely hot? To explore this, we tested a single hot partition access
and compared it.
We ran a single YCSB, working on a single partition on a 110MB dataset (100K partitions). During our
tests, we observed a DynamoDB limitation when a specific partition key exceeded 3000 read capacity
units (RCU) and/or 1000 write capacity units (WCU).
Even when using only ~0.6% of the provisioned capacity (857 OPS), the YCSB client
experienced ProvisionedThroughputExceededException (code: 400) errors, and throttling was imposed
by DynamoDB (see screenshots below).
It’s not that we recommend not planning for the best data model. However, there will always be cases
when your plan is far from reality. In the Scylla case, a single partition still performed reasonably well:
20,200 OPS with good 99% latency.
171
Screenshot 1: Single partition. Consumed capacity: ~0.6% -> Throttling imposed by DynamoDB
Additional Factors
Cross-region Replication and Global Tables
We compared the replication speed between datacenters and a simple comparison showed that
DynamoDB replicated in 370ms on average to a remote DC while Scylla’s average was 82ms. Since
the DynamoDB cross-region replication is built on its streaming api, we believe that when congestion
happens, the gap will grow much further into a multi-second gap, though we haven’t yet tested it.
Beyond replication propagation, there is a more burning functional difference — Scylla can easily add
regions on demand at any point in the process with a single command:
ALTER KEYSPACE mykespace WITH replication = { 'class' :
'NetworkTopologyStrategy', 'replication_factor': '3',
'<exiting_dc>' : 3, <new_dc> : 4};
In DynamoDB, on the other hand, you must define your global tables ahead of time. This imposes a
serious usability issue and a major cost one as you may need to grow the amount of deployed
datacenters over time.
172
173
Explicit Caching is Expensive and Bad for You
DynamoDB performance can improve and its high cost can be reduced in some cases when using
DAX. However, Scylla has a much smarter and more efficient embedded cache (the database nodes
have memory, don’t they?) and the outcome is far better for various reasons we described in a
recent blog post.
Freedom
This is another a major advantage of Scylla — DynamoDB locks you to the AWS cloud, significantly
decreasing your chances of ever moving out. Data gravity is significant. No wonder they’re going after
Oracle!
Scylla is an open source database. You have the freedom to choose between our community version, an
Enterprise version and our new fully managed service. Scylla runs on all major cloud providers and
opens the opportunity for you to run some datacenters on one provider and others on another provider
within the same cluster. One of our telco customers is a great example of the hybrid model — they
chose to run some of their datacenters on-premise and some on AWS.
174
Our approach for “locking-in” users is quite different — we do it solely by the means of delivering
quality and value such that you won’t want to move away from us. As of today, we have experienced
exactly zero customer churn.
No Limits
DynamoDB imposes various limits on the size of each cell — only 400kb. In Scylla you can effectively
store megabytes. One of our customers built a distributed storage system using Scylla, keeping large
blobs in Scylla with single-digit millisecond latency for them too.
Another problematic limit is the sort key amount, DynamoDB cannot hold more than 10GB items.
While this isn’t a recommended pattern in Scylla either, we have customers who keep 130GB items in a
single partition. The effect of these higher limits is more freedom in data modeling and fewer reasons
to worry.
175
ElastiCache versus self-hosted Redis on EC2
Often, there comes a time when you have to choose between managed services versus self-hosted
services, especially in the cloud world. Both these services have their own set of pros and cons. But,
each of these services provide an added advantage, provided you know your use case well. This theory
holds true for ElastiCache versus self-hosted Redis on EC2.
This post compares practical and impractical pointers around these two services, so you can choose the
right service for your use case.
176
177
Deep Diving into the Practicalities of ElastiCache and Self-
hosted Redis on EC2
ElastiCache: Supports Fully Managed Redis and Memcached
ElastiCache seamlessly deploys, runs, and scales Redis as well as MemCached in-memory data stores.
It automatically performs management tasks like software patching, setup, configuration, hardware
provisioning, failure recovery, backups etc. There’s no risk of losing workloads, as it continuously
monitors clusters. This makes it ideal for building data-intensive apps for media sharing, social
networking, gaming, Ad-Tech, finance, healthcare, IoT, etc.
ElastiCache: Scales Automatically According to Requirements
One of the most adored features of ElastiCache is its scalability feature. It can scale-out, scale-in, and
scale-up as per application demands. In addition, write and memory scaling is supported with sharding,
while replicas provide read scaling.
ElastiCache: Instances with More Than One vCPU Cannot Utilize All the Cores
Redis uses a single thread of execution for reads/writes. Only one thread/process will take care of
reads/writes in the database. This ensures that there are no deadlocks occurring due to multiple threads
of writing/reading multiple information into a disk. This is an extremely powerful feature of Redis in
terms of performance, as it removes the need to manage locks and latches. However, this one thread
can use only one core, and vCPU does all the job. So, you do not have the freedom to use multiple
CPUs. Consequently, ElastiCache instances with more than 1 CPU face wastage of extra vCPUs.
To provide better visibility into CPU utilization, Amazon introduced ‘EngineCPUUtilization’ metric
sometime during April 2018.
Self Hosted Redis on EC2: Allows You to Update Latest Version ASAP
One of the major advantages of using self hosted Redis cluster is that the you can always stay updated
with the most recent version. You can utilize the best features of the software even before the rest of the
world can actually start using it.
Self Hosted Redis on EC2: Provides the Freedom to Modify Configurations
Self hosted Redis on EC2 provides the freedom to understand its underlying functionalities and modify
the configurations as per your requirement. For example, to modify Redis configuration to continually
take snapshots, you can:
save 900 1
save 300 10
save 60 10000
178
Other configurations like “stop-writes-on-bgsave-error” and “maxmemory” are very useful config
changes. If you are looking for more tweaking details, check this list below:
https://scaleyourcode.com/blog/article/25
https://scaleyourcode.com/blog/article/15
https://dzone.com/articles/redis-performance-benchmarks
Self Hosted Redis on EC2: Unavailability of Pertinent Metrics Makes Maintenance Tedious
Even though Redis on EC2 provides the freedom to maneuver in terms of configuration, it is difficult to
maintain. Monitoring the metrics is not easy. You either need to use a third party tool, like
AppDynamics, or call APIs manually to monitor the metrics from Redis. You can automate
scaling/updating/upgrading/security patches etc. using tools like Ansible, Chef, or Puppet. This is cost-
effective but effort intensive.
Self Hosted Redis on EC2: Instance Limitations
Amazon recommends users to only use HVM based EC2 instances. Only a handful of PV based
instances are available due to latency issues.
Amazon Elastic Compute Cloud, EC2 is a web service from Amazon that provides re-
sizable compute services in the cloud.
179
What is an Instance?
An instance is a virtual server for running applications on Amazon’s EC2. It can also be
understood like a tiny part of a larger computer, a tiny part which has its own Hard
drive, network connection, OS etc. But it is actually all virtual. You can have multiple
“tiny” computers on a single physical machine, and all these tiny machines are called
Instances.
Difference between a service and an Instance?
Let’s understand it this way:
EC2 is a service along with other Amazon Web Services like S3 etc.
When we use EC2 or any other service, we use it through an instance, e.g.
t2.micro instance, in EC2 etc.
Why not buy your own stack of servers and work independently? Because, suppose
you are a developer, and since you want to work independently you buy some servers,
you estimated the correct capacity, and the computing power is enough. Now, you have
to look after the updation of security patches every day, you have to troubleshoot any
problem which might occur at a back end level in the servers and so on. These are all
extra chores that you will be doing or maybe you will hire someone else to do these
things for you.
But if you buy an EC2 instance, you don’t have to worry about any of these things as it
will all be managed by Amazon; you just have to focus on your application. That too, at
a fraction of a cost that you were incurring earlier! Isn’t that interesting?
180
period, while improving processor type and speed. So eventually, moving to
Cloud is all more suggested.
For scaling up we have to add more servers, and if your application is new and
you experience a sudden traffic, scaling up that quickly might become a
problem.
These are just a few problems and there are many others scenarios which make the
case for EC2 stronger!
181
Compute Instances
c4
c3
Memory Instances
r3
x1
Storage Instances
i2
d2
GPU Instances
g2
Now let’s understand the kind of work that each instance is optimized for, in this AWS
EC2 Tutorial:
Burstable Performance Instances
T2 instances are burstable instances, meaning the CPU performs at a baseline,
say 20% of its capability. When your application needs more than 20% of the
performance of the CPU, the CPU enters into a burst mode giving higher
performance for a limited amount of time, therefore work happens faster.
You get these credits when your CPU is idle.
Each CPU credit gives a burst of 1 minute to the CPU.
If your CPU credits are not used they are credited to your account and they
stay there for 24 hours.
Based on your credit balance, you can decide whether the t2 instance,
should be scaled up or down.
These bursts happen at a cost, every time a burst happens in a CPU, CPU
credits are used.
EBS-optimized Instances
C4, M4, and D2 instances, are EBS optimized by default, EBS means Elastic Block
Storage, which is a storage option provided by AWS in which the IOPS* rate is
quite high. Therefore, when an EBS volume is attached to an optimized instance,
single digit millisecond latencies can be achieved.
*IOPS (Input/Output Operations Per Second, pronounced eye-ops) is a performance
measurement used to characterize computer storage devices.
Cluster Networking Instances
182
X1, M4, C4, C3, I2, G2 and D2 instances support cluster networking. Instances
launched into a common placement group are put in a logical group that
provides high-bandwidth, low latency between all the instances in the group.
A placement group is basically a logical cluster where some select EC2
instances which are a part of that group can utilize up to 10Gbps for
single flow and 20Gbps for multi flow traffic in each direction.
Instances which are not a part of that group are limited to 5 Gbps speed in
multi flow traffic. Cluster Networking is ideal for high performance
analytics system.
Dedicated Instances
They are the instances that run on single-tenant hardware dedicated to a single
customer.
They are perfect for workloads where a corporate policy or industry regulation
requires that your instance should be isolated from any other customer’s
instance, therefore they go for their own separate machines, and their instances
are isolated at the hardware level.
Let’s understand this through an example. Suppose in our company Edureka, we have the
following tasks:
Analysis of customer’s data
Customer’s website activity, etc. should all be monitored in real-time.
There will be times when the traffic on the website will be minimum,
therefore using a very powerful processor should not be considered, since
it will become expensive for the company because it will not be used for
every hour of the day. Hence, for this task, we might take t2 instances
because they give Burstable CPU performance i.e when the traffic will be
more the CPU performance will be increased accordingly to meet the
requirements.
Our auto-response emailing system
It should be quick, therefore we would require systems, where the
response time is as short as possible. This could be achieved by using EBS
optimized instances, as they offer high IOPS and hence, low latencies.
The search engine on our website
It should be able to sort the keywords and return relevant results,
therefore we might have 2 servers for this. One is the database and the
other server for processing the keywords. Therefore, the communication
between these servers should be at the maximum possible rate. To achieve
183
this, we can put them in a placement group and for that we have to use
Cluster Networking Instances.
Some processes in every organisation are highly confidential
Because these processes give us an edge over other companies, no matter
how secure the servers, maybe, some policies are still made to be sure.
Therefore, we might use Dedicated Instances for these kind of processes.
We now know about instances, let’s learn how to launch these instances?
Elastic Block Storage (EBS) is a persistent block level storage volumes which are used
with EC2. Here each block acts as a hard drive.
But why do we need EBS with EC2?
Just like your computer needs a hard drive, you need AWS EC2 Tutorial, AWS EC2
needs a storage volume to store the OS that your instance will be specifying. Options
for EBS are:
Provisioned IOPS: This category is for workloads which are mission critical, it
provides high IOPS rates.
General Purpose: It is for workloads which need a performance and cost balance.
Magnetic: It is for data which is accessed less frequently, and also retrieval time is
more.
After selecting a suitable option in EBS, we give the instance a name and then we
create a security group.
A security group acts as a firewall to control inbound and outbound traffic. Each
security group has rules according to which the traffic is governed.
Each instance, can be assigned up to 5 security groups.
184
Finally, in the last step the console shows all the sethertings that you have done,
you can verify and launch it.
Auto Scaling
Auto Scaling is a service designed by AWS EC2, which automatically launch or
terminate EC2’s instances based on user defined policies, schedules and health checks.
185
Reserved Instances
Spot Instances is a pricing option which enables you to bid on unused EC2 instances.
The hourly price for a Spot Instance is set by AWS EC2, and it fluctuates according to
the availability of the instances in a specific Availability zone.
Basically, you will set a price for an instance above which you do not wish to get
charged for.
The price that you set is for per hour basis, therefore the moment the price for
that instance becomes greater than what you have set, the instance gets shut
down automatically.
On Demand Instances are used when you want to pay for the hour, with no long term
commitments and upfront payments. They are useful for applications that may have
unpredictable workloads or for test applications that are being deployed for the first
time.
Reserved Instances provide you with significant discounts as compared to On Demand
Instances. With Reserved Instances you reserve instances for a specific period of time
with three payment options:
No Upfront
Partial Upfront
Full Upfront
And two term lengths:
One Year Term
Three Year Term
The higher the upfront payment is, the more you save money.
186
Select your preferred Region. Select a region from the drop down, the selection
of the region can be done on the basis of the criteria discussed earlier in the blog.
187
Select EC2 Service Click EC2 under Compute section. This will take you to EC2
dashboard.
188
Click Launch Instance.
Select an AMI : because you require a Linux instance, in the row for the basic
64-bit Ubuntu AMI, click Select.
Choose an Instance
Select t2.micro instance, which is free tier eligible.
189
Add Storage
Tag an Instance
Type a name for your AWS EC2 instance in the value box. This name, more correctly
known as tag, will appear in the console when the instance launches. It makes it easy
to keep track of running machines in a complex environment. Use a name that you can
easily recognize and remember.
190
Create a Security Group
191
Create a Key Pair & launch an Instance
Next in this AWS EC2 Tutorial, select the option ‘Create a new key pair’ and give a
name of a key pair. After that, download it in your system and save it for future use.
192
Check the details of a launched instance.
193
PuTTY does not natively support the private key format (.pem) generated by Amazon
EC2. PuTTY has a tool called PuTTYgen, which can convert keys to the required PuTTY
format (.ppk). You must convert your private key into this format (.ppk) before
attempting to connect to your instance using PuTTY.
Click Load. By default, PuTTYgen displays only files with the extension .ppk. To
locate your .pem file, select the option to display files of all types.
Select your.pem file for the key pair that you specified when you launch your
instance, and then click Open. Click OK to dismiss the confirmation dialog box.
Click Save private key to save the key in the format that PuTTY can use.
PuTTYgen displays a warning about saving the key without a passphrase. Click
Yes.
Specify the same name for the key that you used for the key pair (for example,
my-key-pair). PuTTY automatically adds the. ppk file extension.
Connect to EC2 instance using SSH and PuTTY
Open PuTTY.exe
194
In the Host Name box, enter Public IP of your instance.
In the Category list, expand SSH.
Click Auth (don’t expand it).
In the Private Key file for authentication box, browse to the PPK file that you
downloaded and double-click it.
Click Open.
195
Congratulations! You have launched an Ubuntu Instance successfully.
196
This is a fair question. Blockchain fulfills a need for distributed, immutable ledgers.
The technology brings trust to environments where falsification and counterfeiting
are especially dangerous. That’s why it was initially used for virtual money. The real
fear of counterfeiting in digital currency led to the development of a system where
change was difficult and trust built in.
Supply chains, however, are mostly closed. Partners, suppliers, transporters and
consumers are all generally known. The problem still exists of an unscrupulous
member of the supplier chain substituting bad products for good and the need to be
sure where and when products are, which is why the ledger is important. The
distributed part is not nearly as important because the participants are known.
197
verifiable aspects of blockchain to a centralized database. It is an append-only
database like blockchain but held centrally like a traditional database.
Amazon QLDB will enable many of the same applications as blockchain but with
less effort. For example, financial clearinghouses could use this technology instead
of blockchain because there is a central authority. Peer-to-peer payments is an
example of the type of financial application that could benefit from this technology.
The same holds true for tracking products from manufacturer to consumer.
Ultimately, it is in the retailers’ best interest to know where a product came from
and who it was sold to with confidence. The same could be said of tracking patient
histories in healthcare organizations.
None of these applications need to implement a distributed ledger since they are
closed systems. They do need to establish trust amongst participants. This is what
makes Amazon QLDB such a great idea. It allows companies to ingrain
trustworthiness in their applications without having to manage all the blockchain
overhead and developmental complexity.
Blockchain is a great technology but is too much for many applications. Amazon
QLDB provides just what the developer needs for certain classes of applications.
Amazon S3
Amazon Simple Storage Service (Amazon S3) is a scalable, high-speed, web-
based cloud storage service designed for online backup and archiving of data
198
and applications on Amazon Web Services. Amazon S3 was designed with a
minimal feature set and created to make web-scale computing easier for
developers
Amazon S3 is an object storage service, which differs from block and file cloud storage.
Each object is stored as a file with its metadata included and is given an ID number.
Applications use this ID number to access an object. Unlike file and block cloud storage, a
developer can access an object via a REST API.
The S3 cloud storage service gives a subscriber access to the same systems that Amazon
uses to run its own websites. S3 enables customers to upload, store and download
practically any file or object that is up to five terabytes (TB) in size, with the largest single
upload capped at five gigabytes (GB).
Amazon S3 features
S3 provides 99.999999999% durability for objects stored in the service and supports
multiple security and compliance certifications. An administrator can also link S3 to other
AWS security and monitoring services, including CloudTrail, CloudWatch and Macie.
There's also an extensive partner network of vendors that link their services directly to S3.
199
Data can be transferred to S3 over the public internet via access to S3 APIs. There's also
Amazon S3 Transfer Acceleration for faster movement over long distances, as well as AWS
Direct Connect for a private, consistent connection between S3 and an enterprise's own data
center. An administrator can also use AWS Snowball, a physical transfer device, to ship
large amounts of data from an enterprise data center directly to AWS, which will then
upload it to S3.
In addition, users can integrate other AWS services with S3. For example, an analyst can
query data directly on S3 either with Amazon Athena for ad hoc queries or with Amazon
Redshift Spectrum for more complex analyses.
Amazon S3 comes in three storage classes: S3 Standard, S3 Infrequent Access and Amazon
Glacier. S3 Standard is suitable for frequently accessed data that needs to be delivered with
low latency and high throughput. S3 Standard targets applications, dynamic websites,
content distribution and big data workloads.
S3 Infrequent Access offers a lower storage price for data that's needed less often, but that
must be quickly accessible. This tier can be used for backups, disaster recovery and long-
term data storage.
Amazon Glacier is the least expensive storage option in S3, but it is strictly designed for
archival storage because it takes longer to access the data. Glacier offers variable retrieval
rates that range from minutes to hours.
200
A user can also implement lifecycle management policies to curate data and move it to the
most appropriate tier over time.
Amazon does not impose a limit on the number of items that a subscriber can store;
however, there are Amazon S3 bucket limitations. An Amazon S3 bucket exists within a
particular region of the cloud. An AWS customer can use an Amazon S3 API to upload
objects to a particular bucket. Customers can configure and manage S3 buckets.
User data is stored on redundant servers in multiple data centers. S3 uses a simple web-
based interface -- the Amazon S3 console -- and encryption for user authentication.
S3 buckets are kept private by default, but an admin can choose to make them
publicly accessible. A user can also encrypt data prior to storage. Rights may be
specified for individual users, who will then need approved AWS credentials to
download or access a file in S3.
When a user stores data in S3, Amazon tracks the usage for billing purposes, but it
does not otherwise access the data unless required to do so by law.
201
Comparing AWS vs Azure vs Google Cloud Platforms For
Enterprise App Development
Enterprise companies around the world have made the switch from self-hosted infrastructure
to public cloud configurations. While most enterprises will always need some on-premise
technology, they are developing their applications directly in the cloud. This allows the
development teams to stay product focused, rather than having to work on the infrastructure
to support the application. By moving to the cloud, enterprises have an existing physical
infrastructure that is continuously maintained and updated. This gives them more resources
and time to dedicate to the mobile app development project at hand.
Currently, there are three main cloud platform providers that take up the majority of market
share. They are Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform
(GCP). While Azure and GCP are growing consistently, AWS remains the clear leader in
market share. Each platform has its own features and pricing that could match your mobile
application development requirements. Keep reading to how each platform compares against
each other.
Amazon Web Services
Features
The amazon cloud platform offers almost every feature under the cloud computing industry.
Their cloud services allow you to gain easy access to computing power, data storage or other
functionality necessary for app developers. AWS has many products that fall under many
categories. In addition to the features mentioned above, they offer developer tools,
management tools, mobile services and applications services. As you can imagine, the
202
application services combined with the computing and database infrastructure are critical
components to a successful enterprise mobile app development team.
Pricing
In addition to a wide range of services, the AWS cloud has adjusted the pricing of cloud
computing since inception in 2006. Their prices are very competitive with all of the other
cloud providers. The pricing for their cloud services has continued to decrease due to
competition and pricing structures. AWS offers free tiers of service for startups and
individuals. It’s an easy way to try before your buy. Moreover, development teams can
purchase servers by the second, rather than by the hour. Depending on what services the team
uses, you can certainly find a reasonable AWS price structure that is lower than the cost of all
that infrastructure investment.
You can calculate your pricing here:
General pricing
Free tier
Pricing calculator
Total Cost of Ownership [TCO]
Advantages
On top of that, the Amazon Web Services cloud platform offers developers over 15 years of
enterprise infrastructure. Since the admin teams as AWS continuously work to improve the
platform, your development team can benefit from their experience. When it comes to
management capabilities and skills, AWS has some of the best talent in the market. Of course,
you would want to choose a platform that has plenty of experience to build on.
Microsoft Azure
Features
Similar to AWS cloud services, Azure offers a full variety of solutions for app developer needs.
The platform gives you the ability to deploy and manage virtual machines as scale. You can
process and compute at whatever capacity you need within just minutes. Moreover, if your
custom software needs to run large-scale parallel batch computing, it can handle it too. This is
actually a unique feature to AWS and Azure over the Google Cloud Platform. The all
encompassing Azure features integrate into your existing systems and processes, offering
more power and capacity for enterprise development.
Pricing
When considering Azure pricing, you have to keep in mind that the costs will depend on the
types of products the development team needs. The hourly server cost can range from $0.099
per hour to $0.149 per hour. Of course, if you measure the costs by just per instance, the
prices might not seem consistent. However, the prices are pretty comparable to AWS when
you factor in the price per GB of RAM. As the main enterprise cloud service providers
compete for your business, the prices remain competitive across the board.
You can calculate your pricing here:
203
General pricing
Free tier
Pricing calculator
Total Cost of Ownership
Advantages
In addition to the full set of features and customizable pricing, the Azure platform is one of
the fastest cloud solutions available. If you are looking for a solution that excels in speed of
deployment, operation or scalability, then you might want to choose the Azure platform. They
are the leader in speed when it comes to cloud computing solutions.
Google Cloud Platform
Features
Once again, the Google Cloud Platform has a myriad of services for developers. As an
enterprise mobile app development team, you might be interested in the App Engine product.
This allows an app developer to create applications without dealing with the server. It’s a fully
managed solution for developing applications in an agile manner. Furthermore, you can
perform high level computing, storage, networking and databases with GCP. These are all
great products to use depending on the type of app development you are working on.
Although Google has a few less services than the competitors, you can find all the
requirements for mobile application development projects.
Pricing
Where GCP may fall behind in additional features, it makes up for in cost efficiency. The
platform also has pay as you go pricing, billing to the “per second” of usage. Setting GCP
apart, it offers discounts for long term usage that starts after the first month. This is great if
you need to start a new mobile app developmentproject and want to keep costs low. By
contrast, it could take over a year to get long term discounts on the other cloud service
providers. Clearly, Google is putting pressure on the competing cloud providers to keep
market prices lower.
You can calculate your pricing here:
General pricing
Free tier
Pricing calculator
Total Cost of Ownership
Advantages
As GCP continues to grow in the cloud industry, they offer another level of security. Since
Google is no stranger to enterprise level security, you can rely on their secure solutions. They
have over 500 employees that are dedicated to security protection. You will get data
encryption, multiple layers of authentication and third party validations. For developers who
need an extra buffer of security, the Google Cloud might be the best platform for you.
204
When comparing AWS vs Azure vs Google Cloud, you have many features and costs to
consider. Rather than trying to pick on solution, use enterprise cloud services that fit your
development needs. This can be a single cloud provider. Or, you can combine services from
two or three of these providers. Since the costs are relatively comparable, find the right mix of
solutions to fit your enterprise development requirements.
Cloud IAM:
What is Cloud IAM? In short, it refers to the ability to manage user identities and their access to IT
resources from the cloud. Why should cloud IAM be a priority? To answer that question, let’s take a
look at the evolution of traditional identity and access management (IAM) solutions and compare them
to cloud alternatives.
Evolution of Identity and Access Management
IAM solutions have been a foundational component of IT infrastructure for many years now. In fact,
the modern era of IAM dates back to 1993, when Tim Howes and his colleagues at the University of
Michigan introduced the Lightweight Directory Access Protocol (LDAP). LDAP was designed as a
lightweight replacement to the Directory Access Protocol (DAP), which was a component of the
forerunner directory services standard known as X.500. LDAP worked so well that LDAPv3 would
become the internet standard for directory services in 1997, and directly influenced two powerful IAM
platforms: OpenLDAP™ and Microsoft® Active Directory® (AD).
Today, we know that Active Directory has been far more dominant than OpenLDAP in the IAM
market. Of course, this is primarily because Microsoft Windows® was effectively the only major
205
enterprise operating system in use in the late 1990s, when both AD and OpenLDAP were introduced.
At the time, it was common for all of the systems, applications, files, and networks in an enterprise IT
environment to be Windows-based, which gave AD a built-in advantage. In most cases, IT simply
implemented AD, and they could basically manage all of the users and IT resources in their
environment.
The IT landscape started to change when a wide variety of non-Windows resources were introduced in
the mid-2000s. This included Mac® systems, web applications like Google Apps (aka G Suite™),
Linux® servers at AWS®, Samba file servers and NAS appliances, and a lot more. Even the network
itself switched from a wired connection to WiFi. All of these changes and more have rendered legacy
solutions like AD (and OpenLDAP) far less effective in the modern enterprise. As a result, IT
administrators are now looking to cloud IAM solutions as possible alternatives.
Why Cloud IAM?
The advantages of
cloud IAM platforms are easy to recognize. For example, while legacy IAM solutions such as AD were
206
primarily focused on one platform (i.e., Windows), cloud IAM platforms such as JumpCloud®
Directory-as-a-Service® support all three major platforms (Windows, Mac, Linux). In fact, the
JumpCloud platform in particular can securely manage and connect users to virtually any IT resource –
regardless of their platform, provider, protocol, or location. More specifically, that includes systems,
applications, files, and networks, which can all be managed from a single cloud-based directory
services platform that doesn’t require anything on-prem. As a result, IT admins can enjoy a centralized
identity and access management experience delivered as a cloud-based service that spans the breadth of
their IT network.
207
Cloud Pub/Sub today, its backend messaging service that makes it
easier for developers to pass messages between machines and to
gather data from smart devices. It’s basically a scalable messaging
middleware service in the cloud that allows developers to quickly
pass information between applications, no matter where they’re
hosted. Snapchat is already using it for its Discover feature and
Google itself is using it in applications like its Cloud Monitoring
service.
Pub/Sub was in alpha for quite a while. Google first (quietly)
introduced it at its I/O developer conference last year, it never made
a big deal about the service. Until now, the service was in private
alpha, but starting today, all developers can use the service.
Using the Pub/Sub
API, developers can
create up to 10,000
topics (that’s the
entity the application
sends its messages
to) and send up to
10,000 messages per
second. Google says
notifications should
go out in under a
second “even when
tested at over 1
million messages per
second.”
The typical use cases for this service, Google says, include
balancing workloads in network clusters, implementing
208
asynchronous workflows, logging to multiple systems, and data
streaming from various devices.
During the beta period, the service is available for free. Once it
comes out of beta, developers will have to pay $0.40 per million for
the first 100 million API calls each month. Users who need to send
more messages will pay $0.25 per million for the next 2.4 billion
operations (that’s about 1,000 messages per second) and $0.05 per
million for messages above that.
Now that Pub/Sub has hit beta — and Google even announced the
pricing for the final release — chances are we will see a full launch
around Google I/O this summer.
Image
( Credits: Carlos Luna / Flickr under a CC BY 2.0 license.
o
p
e
n
s 209
i
n
Google App Engine is a Platform as a Service (PaaS) product that provides Web
app developers and enterprises with access to Google's scalable hosting and tier 1
Internet service.
The App Engine requires that apps be written in Java or Python, store data in Google BigTable
and use the Google query language. Non-compliant applications require modification to use App
Engine.
Google App Engine provides more infrastructure than other scalable hosting
services such as Amazon Elastic Compute Cloud (EC2). The App Engine also
eliminates some system administration and developmental tasks to make it easier
to write scalable applications.
210
The Fundamentals of Google Compute Engine (GCE)
Google Compute Engine (GCE) is part of Google’s Infrastructure-as-a-Service (IaaS) offering, where you can build
high-performance, fault-tolerant, massively scalable compute nodes to handle your application’s needs. Virtual
Machine instances are provisioned in GCE and can be pre-packed or fully customized. In this article, we’ll cover
some of the fundamentals of VM Instances within GCE.
Machine Types
Machines types are templates of virtualized hardware that will be available to the VM instance. These resources
include the CPU, Memory, Disk capabilities, and so on.
Predefined machine types are managed by Google, and are categorized by 4 types:
Standardmachine type
Ideal for typical balanced instances with respect to RAM and CPU
Have 3.75GB of RAM per virtual CPU
High-memory machine types
Ideal for applications that require more memory
211
Have 6.5GB of RAM per virtual CPU
Shared-core machine types
These machines have one virtual CPU on a single hyper-thread of a single host CPU that is running
the instance. Ideal for non-resource intensive applications.
Very cost effective
Large machine types
Ideal for resource-intensive workloads
Up to 1TB of memory
Disks
After choosing a machine type which covers CPU and Memory, it’s time to choose a disk option. You have a few
options when choosing a disk type for your VM instance. The disk you choose will be your single root disk in which
your image is loaded during the boot process. Do you choose a persistent disk or a local disk?
212
Persistent Disks
Persistent disks are network-based “disks” abstracted to appear as a block device. Data is durable, meaning the data
will remain as you left it after reboots and shutdowns. Available as either a standard hard disk drive or as a solid state
drive (SSD), persistent disks are located independently of the VM instances, which means they can be detached and
reattached to other instances. You have the option to keep your disk when deleting your instance, or having it
terminated along with the instance.
Standard persistent disks
Ideal for efficient and reliable block storage
Max 64TB per instance
Only available within a single zone
SSD persistent disks
Ideal for fast and reliable block storage
Max 64TB per instance
Only available within a single zone
Other key features:
Redundancy is built-in, protecting your data from unforeseen failures.
GCE automatically encrypts all data on the persistent disk, protecting integrity with cipher keys
You can resize disks and migrate instances with zero downtime
Disks scale in performance as capacity increases
Supports snapshots
213
Local SSD
Local SSD disks are physically attached to VM instances. These will offer the highest possible IOPS and are used for
seriously intensive workloads.
Local SSD
Ideal for very high-performance local block storage
Available as SCSI or NVMe
Max 3TB (which is a total of eight 375GB disks)
Available only to a single Instance, meaning it cannot be reattached elsewhere
Only Persistent if you do not stop or terminate your instance.
Does not support snapshots
More often than not, you’ll be choosing a Standard persistent disk for your VMs. So, what’s next?
Images
Images contain a bootloader, Operating System, file system structure, and any software customizations needed for
your deployment. The image describes what actually gets loaded onto the root disk. Tons of public images are
available from Google and other authorized third-party vendors. Google Compute Engine (GCE) uses the selected
image to create a persistent boot disk for each instance.
Some public images include:
CentOS
Container-Optimized OS from Google
CoreOS
Debian
RHEL
SLES
Ubuntu
FreeBSD
Windows Server 2008, 2012, 2016
SQL Server on Windows Server
214
Alternatively, you can create your own custom images! Custom images can be created from a VM in your
environment that has all the necessary settings and additional software configured. You can even import custom
images from on-prem environments, or another cloud provider such as AWS.
Zones
You actually choose the zone in the very beginning of creating a new instance. We talk more about Regions and Zones
in another article, but what is important to know here is the different CPU architectures in each zone which could be a
deciding factor where you run your applications. The processor families include Broadwell, Sandy Bridge, Skylake,
Ivy Bridge, and Haswell. If you have a specific requirement for a specific processor, just visit documentation to
validate zone availability. Otherwise, just pick the zone that is closest to you or your customers.
215
What if I choose a zone and want it changed afterward?
You can absolutely move VM instances to another zone, but it will require a short outage. To do this manually, you’ll
snapshot the disk on the instance you wish to move. Next, create a new disk in the desired zone from the snapshot.
Create a new VM instance in the new zone and attach the new disk. Update any IPs and references for a clean
migration.
Alternatively, you can do this automatically with the gcloud compute instances move command.
216
A couple of easy checkboxes can get your VM setup with the proper firewall rules for HTTP and HTTPS traffic as
well.
Availability Policy
There are three categories under the Availability Policy when creating a new VM — Preemptibility, Automatic
Restart, and On host maintenance.
Preemptibility
Off by default, a preemptible VM is an affordable, short-lived instance ideal for batch jobs or fault-tolerant workloads.
They’re up to 80% cheaper than regular instances, so if your application can handle random the termination of VMs at
any time, then give this a look. Some common applications that use preemptible VMs are modeling or simulations,
rendering, media transcending, big data, continuous integration, and web crawling.
217
Automatic Restart
On by default, this feature of GCE can automatically restart VM instances if they are terminated due to a non-
preemption, non-user-initiated reason. Some examples are due to maintenance events or hardware/software failures.
On host maintenance
On by default, this feature of GCE can automatically migrate your VM instances to other hardware during
infrastructure maintenance periods to ensure your VMs operate with no downtime. This is called Live Migration and
is a key differentiator of GCP. Alternatively, you can set this to turn off the VM.
Other Options
There are various other options you can choose from, such as automation with startup scripts, metadata tags, SSH key,
and so on, but these are beyond the scope of fundamentals.
Accessing the VM
After creating the VM, you’ll certainly want to access it someway – but how? Well, if your VM is running Linux, you
can access it via the console through SSH, from another VM running CloudShell via the Cloud SDK, or from your
computer via SSH. If your VM is running Windows, you can use an RDP client or Powershell terminal.
218
Pricing
Lastly, let’s look at some GCE fundamentals when it comes to pricing. Google handles charges and discounts
differently than AWS and Azure. With GCP you’re always billed for the first 10 minutes and then for each minute
afterward for the life of the machine, rounded up to the nearest minute. The console will summarize this into costs per
month so you don’t have to do the per-minute math.
Discounts are extended for sustained use, meaning if you commit to keeping instances alive for some period of time, a
discount will be applied.
Another neat discount is what Google refers to as Inferred instance discounts. If you have multiple VMs of the same
machine type in the same zone, they are combined as if they were a single machine, giving you the maximum
available discount.
A sustained usage discount applies to custom machine types as well. Google Compute Engine will perform
calculations to match the best qualifying discount for usage in the month. In the example given in their
documentation, two custom machine type instances are split to provide discounts on vCPU and Memory when
stretched across the whole month instead of just 15 days.
Google Cloud Dataproc — Google’s managed Hadoop, Spark, and Flink offering. In what seems
to be a fully commoditized market at first glance, Dataproc manages to create significant
differentiated value that bodes to transform how folks think about their Hadoop workloads.
Jobs-first Hadoop+Spark, not Clusters-first
Typical mode of operation of Hadoop — on premise or in cloud — require you deploy a cluster,
and then you proceed to fill up said cluster with jobs, be it MapReduce jobs, Hive queries,
SparkSQL, etc. Pretty straightforward stuff.
The standard way of running Hadoop and Spark.
Services like Amazon EMR go a step further and let you run ephemeral clusters, enabled by
separation of storage and compute through EMRFS and S3. This means that you can discard
your cluster while keeping state on S3 after the workload is completed.
Google Cloud Platform has two critical differentiating characteristics:
Per-minute billing (Azure has this as well)
Very fast VM boot up times
219
When your clusters start in well under 90 seconds (under 60 seconds is not unusual), and
when you do not have to worry about wasting that hard-earned cash on your cloud provider’s
pricing inefficiencies, you can flip this cluster->jobs equation on its head. You start with a
job, and you acquire a cluster as a step in job execution.
If you have a MapReduce job, as long as you’re okay with paying the 60 second initial boot-up
tax, rather than submitting the job to an already-deployed cluster, you submit the job to
Dataproc, which creates a cluster on your behalf on-demand. A cluster is now a means to an
end for job execution.
Demonstration of my exquisite art skills, plus illustration of the jobs before clusters concept realized
with Dataproc.
Again, this is only possible with Google Dataproc, only because of:
high granularity of billing (per-minute)
very low tax on initial boot-up times
separation of storage and compute (and ditching HDFS as primary store).
Operational and economic benefits are obvious and easily realized:
Resource segregation though tenancy segregation avoids non-obvious bottlenecks and
resource contention between jobs.
Simplicity of management — no need to actually manage the cluster or resource
allocation and priorities through things like YARN resource manager. Your
dev/stage/prod workloads are now intrinsically separate — and what a pain that is to
resolve and manage elsewhere!
Simplicity of pricing — no need to worry about rounding up to nearest hour.
Simplicity of cluster sizing — to get the job done faster, simply ask Dataproc to deploy
more resources for the job. When you pay per-minute, you can start thinking in terms
of VM-minutes.
Simplicity of troubleshooting — resources are isolated, so you can’t blame the other
tenants on your problems.
I’m sure I’m forgetting others. Feel free to leave a comment here to add color. Best response
gets a collectors’ edition Google Cloud Android figurine!
Dataproc is as close as you can get to serverless and cloud-native pay-per-job with VM-based
architectures — across the entire cloud space. There’s nothing even close to it in that regard.
Dataproc does have a 10-minute minimum for pricing. Add the sub-90 second cluster creation
timer, and you rule out many relatively lightweight ad-hoc workloads. In other words, this
works for big serious batch jobs, not ad-hoc SQL queries that you want to run in under 10
seconds. I write on this topic here.(do let us know if you have a compelling use case that leaves
you asking for less than a 10-minute minimum).
The rest of the Dataproc goodies
Google Cloud doesn’t stop there. There’s a few other benefits of Dataproc that truly make your
life easier and your pockets fuller:
Custom VMs — if you know the typical resource utilization profile of your job in terms
of CPU/RAM, you can tailor-make your own instances with that CPU/RAM profile.
This is really really cool, you guys.
220
Preemptible VMs — I wrote on this topic recently. Google’s alternative to Spot
instances is just great. Flat 80% off, and Dataproc is smart enough to repair your jobs
in case instances go away. I beat this topic to death in the blog post, and in my biased
opinion it’s worth a read on its own.
Best pricing in town. Google Compute Engine is the industry price leader for
comparably-sized VMs. In some cases, up t0 40% less than EC2.
Gobs of ephemeral capacity — Yes, you can run your Spark jobs on thousands of
Preemptible VMs, and we won’t make you sign a big commitment, as this gentleman
found out (TL;DR: running 25,000 Preemptible VMs) .
GCS is fast fast fast — When ditching HDFS in favor of object stores, what matters is
the overall pipe between storage and instances. Mr. Jornson details performance
characteristics of GCS and comparable offerings here.
Dataproc for stateful clusters
Now if you are running a stateful cluster with, say Impala and Hbase on HDFS, Dataproc is a
nice offering here too, if for some reason you don’t want to run Bigtable + BigQuery.
If you are after the biggest baddest disk performance on the market, why not go with
something that resembles RAM more than SSD in terms of performance — Google’s Local
SSD? Mr. Dinesh does a great job comparing Amazon’s and Google’s offerings here. Cliff notes
— Local SSD is really, really, really good — really.
Finally, Google’s Sustained Use Discounts automatically rewards folks who run their VMs for
longer periods of time, up to 30% off. No contracts and no commitments. And, thank
goodness, no managing your Reserved Instance bills.
You win if you use Google’s VMs for short bursts, and you win when you use Google for longer
periods.
Economics of Dataproc
We discussed how Google’s VMs are typically much cheaper through Preemptible VMs,
Custom VMs, Sustained Use Discounts, and even lower list pricing. Some folks find the
difference to be 50% cheaper!
Two things that studying Economics taught me (put down your pitchforks, I also did Math) —
the difference between soft and hard sciences, and the ability to tell a story with two-
dimensional charts.
Let’s assume a worst-case scenario, in which EMR and Dataproc VM prices are equal. We get
this chart, which hopefully requires no explanation:
Which line would you rather be on?
If you believe our good friend thehftguy’s claims that Google is 50% cheaper (after things like
Preemptible VMs, Custom VMs, Sustained Use Discounts, etc), you get this compelling chart:
Same chart, but with some more aggressive assumptions.
When you’re dishing out your precious shekels to your cloud provider, think of all this extra
blue area that you’re volunteering to pay that’s entirely spurious. This is why many of
Dataproc’s customers don’t mind paying egress from their non-Google cloud vendors to GCS!
Summary
Google Cloud has the advantage of a second-comer. Things are simpler, cheaper, and faster.
Lower-level services like instances (GCE) and storage (GCS) are more powerful and easier to
use. This, in turn, lets higher-level services like Dataproc be more effective:
221
Cheaper — per-minute billing, Custom VMs, Preemptible VMs, sustained use
discounts, and cheaper VMs list prices.
Faster — rapid cluster boot-up times, best-in-class object storage, best-in-class
networking, and RAM-like performance characteristics of Local SSDs.
Easier — lots of capacity, less fragmented instance type offerings, VPC-by-default, and
images that closely follow Apache releases.
Fundamentally, Dataproc lets you think in terms of jobs, not clusters. You start with a job, and
you get a cluster as just another step in job execution. This is a very different mode of
thinking.
If 2016 was the year of microservices, 2017 is shaping up to be the year of serverless
computing, most notably through AWS Lambda and Google Cloud Functions created through
Firebase.
Cloud Functions for Firebase were announced a month ago, bringing them into direct
competition with AWS’s offerings. This, of course, inevitably invites benchmarks and
comparisons between AWS’s and Google’s offerings. Let’s walk through the two.
Wait, what is serverless computing?
Ah, the requisite explanation.
Traditional backends have been created using monolithic servers, where a single server
may have several different responsibilities under a single codebase. Request comes in, server
executes some processing, response comes out. The same server might be responsible for
authentication, handling file uploads, and keeping track of user profiles. The key mechanic is
that if two different requests come in for two different resources, it gets handled by a single
codebase. This server might run on dedicated or virtualized machinery (or several machines!),
and persistently runs over the span of days, weeks, or months.
More recently, we’ve seen the introduction of microservices as a popular architectural
decision. With a microservices approach, there are still distinct servers, but many different
servers, which of which handles a single purpose. A single service might be in charge of user
authentication, and another one may handle file uploads. Microservice architectures are
characterized by many separate codebases and incremental deployments of each individual
service. The idea here is that a service which isn’t modified often is less likely to break, along
with providing a more logical separation of responsibilities. Like monolithic deployments,
microservices are traditionally long-running processes being executed on dedicated or
virtualized machinery.
Finally, serverless architectures. Think of them as a natural evolution or extension to
microservices.
222
This is a microservice architecture driven to the extreme. A single chunk of code, or ‘function’
is executed anytime a distinct event occurs. This event might be a user requesting to login, or
a user attempting to upload a file. These functions are traditionally very short running in
nature — the function ‘wakes up’, executes some amount of with a duration of 10 milliseconds
to 10 seconds, and is then terminated automatically by the service provider. No persistence,
no dedicated machinery — in effect, you have no idea where your code is running at any given
time. The benefit to serverless architectures shares some of the benefits of a microservices
based approach, where each function has some distinct responsibility and logical separation.
The Test App
To compare the two services, I wrote a small React Native application with the intent of
providing one-time-password authentication.
Rather than expecting a user to enter a tedious email and password combination, the user is
expected to enter just their phone number. Once we have their phone number in hand, we
generate a short six-digit token then text it to the user via SMS. The user then enters the code
into our app, after which we expect them to enter the code back into our app. If they enter the
correct code, great, they are now authenticated.
Given that the code is the key authenticating factor, its something that clearly shouldn’t be
generated or stored directly on the user’s mobile device. Instead, we should generate and store
the code somewhere else, somewhere that the user doesn’t have any type of read access to.
Enter our serverless functions!
Its always important to plan out the different cloud functions that will be created. In this case,
I see three clear phases of the login process where some amount of logic must be executed in a
secure environment:
1. Create a new user (sign up)
2. Generate, save, and text a new login code (sign in)
3. Verify a login code
Each function we create is assigned a unique name, usually to identify its purpose. I followed a
simple nomenclature, opting for ‘createUser’, ‘requestOneTimePassword’, and
‘verifyOneTimePassword’.
With these three functions in mind, let’s walk through the deployment process
Function Creation — Lambda
Creation of functions with Lambda can take two forms, either direct access of the Lambda
Console or through the Serverless framework. I chose to use the Serverless framework, as it
made deployment (later) much easier.
Serverless encourages centralizing all configuration of your functions into a single YML file.
The YML file requires the function name as it will be displayed on the Lambda console, the
name of the function in your code base, and some configuration on when to execute the
function. In our case, we wanted to execute the function on an incoming HTTP request with a
method of POST.
Here’s the relevant snippet of config from the YML file for creating a new user:
functions:
userCreate:
handler: handler.userCreate
223
events:
- http:
path: users
method: post
integration: lambda-proxy
cors: true
One of the interesting aspects of AWS Lambda is that it is truly built assuming that you’ll have
any type of event driving a function invocation, not just an incoming HTTP request issued by
a client device. Other valid triggers might be a file upload to S3, or a deploy to some other
service on AWS. Even though its clear to you and me that we only want to run the function
with an incoming HTTP request, we still have to be awfully explicit.
I found writing the actual function to require a little more boilerplate than I’d like:
const firebase = require('./firebase');
const helpers = require('./helpers');
const handleError = helpers.handleError;
const handleSuccess = helpers.handleSuccess;
if (!body.phone) {
return handleError(context, { error: 'Bad Input' });
}
firebase.auth().createUser({
uid: phone
})
.then(user => handleSuccess(context, { uid: phone }))
.catch((err) => handleError(context, { error: 'Email or phone in
use' }));
}
You will notice a reference to firebase in here; I am still using Firebase for user management,
even though the app is hosted on AWS infrastructure.
Yep, the request body has to be manually parsed. You’ll also notice that I made some
‘handleSuccess’ and ‘handleError’ helpers, to avoid some otherwise awful boilerplate. Here’s
‘handleSuccess’:
handleSuccess(context, data) {
context.succeed({
"statusCode": 200,
"headers": { "Content-Type": "application/json" },
224
"body": JSON.stringify(data)
});
}
}
Again, don’t expect Lambda to handle JSON encoding or decoding for you, this is all manual.
Function Creation — Google Cloud Functions
Project creation with Cloud Functions was clearly easier. Its clear that the managers around
this project assume that the most common use case is handling incoming HTTP requests, so
there wasn’t a tremendous amount of configuration to route a particular event to a particular
function.
Generation of the initial project was done by using the firebase CLI, which I hadn’t been
previously familiar with. The CLI generates an entire Firebase project, which allows hosting
important configuration like your security rules in a VCS, rather than relying entirely upon the
console rule editor.
Definition of the functions took place inside of a Javascript file, where each export is
essentially assumed to be a deployable function. For example:
exports.createUser = functions.https.onRequest(createUser);
Fans of Express JS will immediately be at home with the req, res function signature. The
request and response objects use an identical API to Express’, which makes for a
straightforward learning curve. Also notice no need for complicated boilerplate around
handling responses.
Winner: Google Cloud Functions
Creating functions with Firebase is a clear winner. There’s less upfront configuration required,
along with a far more palatable API. Of course, the caveat is that Firebase’s amount of
configuration is smaller because there are fewer function triggers available on Firebase. No
225
need to specify that a function should be executed on an incoming HTTP request when there
are only six different ways of triggering them
Deployment
Certainly not much to say here, as the deployment process is nearly identical on both
platforms. Having set up the initial project with Serverless, deployment on the AWS side was
as easy as a terminal command:
serverless deploy
In both cases, the time from initiating the deployment to seeing the function go live was about
forty seconds. Nothing to lose sleep over.
Winner: Tie
Testing — Lambda
If function creation was easier on Firebase, I can confidently say that testing your functions in
a staging environment is far easier on AWS.
For the above project, I spent around two hours from start to finish on AWS, whereas the
same exact project took around five hours, simply because of of the atrocious debug cycle. It
all comes down to the presence of a simple tool on the AWS side — the beautiful blue Test
button.
Once your function has been deployed, you can create a ‘test’ event, by manually creating a
request to be sent directly to your function. In this case, I wanted to manually test the creation
of a new user by providing a unique phone number. Using one of the sample templates, I
manipulated the body of the request to include a phone number, then saved the test event.
Once your test event is created, that beautiful blue Test button will execute your function
instantaneously and immediately show output from the execution in plain text, including not
only the function’s request response, but also any log output coming from the function.
Testing — Google Cloud Functions
June 8 update: There is a testing mechanism for Cloud Functions, but it’s not (currently)
available in the Firebase console. If you access the “Cloud Console”
(https://console.cloud.google.com) you’ll see Cloud Functions there with a range of
capabilities, including quick testing. There is also a local emulator which allows you to debug
functions locally, and Cloud Platform also has a (free) Cloud Debugger which actually lets you
put a breakpoint on live code!
Original writeup: Let me be clear: manual testing of Cloud Functions is a pain, stemming
from two aspects:
1. Cloud Function’s don’t have a built in testing solution with a quick feedback
mechanism as AWS does
2. Getting logs to the Firebase console usually involves waiting for about one to five
minutes
226
To the first point, manual testing of Cloud Functions revolves around your favorite HTTP
request utility, be it curl or Postman. If your function fails to execute due to some hidden typo,
rest assured that you’ll get a 50x status code without much more information, rather than any
helpful debug output.
If you do want to get information out, you’ll be using Firebase’s Function console.
At the console, you’re limited to seeing only logged information, as opposed to AWS’s console
which shows both log statements and function response bodies.
But the biggest gripe I have is how long it takes to see logs appear here. With stopwatch in
hand, it would take one to five minutes of waiting to see any log information pop up from a
single request. That terrible feedback loop lead to a lot of confusion as I tried to keep the order
in which I’d execute test requests in mind. Let’s face it; when you have a long feedback loop
like that, you may immediately execute one to five manual tests, then try to decipher the
output you receive a few minutes later. Not fun.
Winner: AWS Lambda
Pricing
In general, you can count on paying for function invocations based on two metrics: the
number of invocations, and the amount of time each invocation takes to execute, modified by
the hardware that the function is executed upon.
June 8 Update: I have neglected to include Amazon’s API Gateway price, which is $3.50
per million requests and is necessary if you want to have HTTP invocation of the function.
Cloud Functions includes this for no extra charge. So the 19,193,857 requests you quoted for
AWS would actually cost ~$65, not $1, which is a pretty large difference.
Original: At the time of this writing, Cloud Functions cost $0.40 per million invocations
(after two million that are free), while Lambda clocks in at $0.20 per million invocations
(after one million that are free).
Execution environment refers to the hardware that is used to run the function. More powerful
hardware, more cost. Its a bit of an exercise in engineering economics, however. If you’re
running a computation heavy function that takes some non-zero amount of time to execute,
you might think to use a less powerful machine, as it costs less money per millisecond of
execution time. But its a double edged sword; the slower the machine, the more milliseconds
you’re spending! I’d love to do some followup work to figure out the sweet spot in machine
size for compute-heavy tasks.
Google Cloud Function’s invocation time pricing is a function of the CPU plus RAM size,
whereas AWS is a function of the RAM size only.
For example, a function that takes 100ms to execute on a 256mb memory machine with a
400mhz cpu would cost the following on Google :
(256mb/1024(gb/mb)) * .5s * $0.0000025 gb-s
+ (400mhz / 1000 ghz/mhz) * .5s * $0.0000100 gb-s
= $0.0000003125 + $0.000002
= $0.0000023125 per request
Or, put another way, you’d get 432,432 requests for $1 on Google, not including the free tier
or flat cost of invocation.
227
On AWS Lambda, a similar setup would cost
(256mb/1024(gb/mb) * .5s * $0.000000417 gb-s
= $0.0000000521
Or, put another way, you’d get 19,193,857 invocations for $1, not including the free tier or flat
cost of invocation. A factor of four, really? Someone check my math, please.
Winner: AWS
Conclusion
At this point, AWS Lambda is head and shoulders above Google Cloud Functions. The testing
cycle feels much tighter, and the pricing is currently no-contest. Function creation is a bit
easier with Google Cloud, but as soon as you get that boilerplate down you’re good to go.
Officially, Google Cloud Functions are still in beta, so we might see price reductions at some
point in time, or better tooling, but for now I can’t help but point friends over to AWS
Lambda.
228
229
Cloud Shell:
230
3. Edit your files with a GUI
Yes yes, vim and emacs and nano are great and all. But sometimes you just want a nice,
comfortable GUI to work with.
Cloud Shell ships with a customized version of the Orion editor.
While its not as good as VS Code or Eclipse, its actually a fully featured editor and I feel quite
productive with it!
4. Upload/Download files
If you have files locally you want to upload to cloud shell, just click the “Upload” button in the
menu and choose your file.
To download files, run this inside Cloud Shell:
$ cloudshell dl <FILENAME>
231
Firebase Components:
Firebase
Firebase is a mobile and web app development platform that provides developers with a
plethora of tools and services to help them develop high-quality apps, grow their user base,
and earn more profit.
A Brief History
Back in 2011, before Firebase was Firebase, it was a startup called Envolve. As Envolve, it
provided developers with an API that enabled the integration of online chat functionality into
their website.
What’s interesting is that people used Envolve to pass application data that was more than
just chat messages. Developers were using Envolve to sync application data such as a game
state in real time across their users.
This led the founders of Envolve, James Tamplin and Andrew Lee, to separate the chat system
and the real-time architecture. In April 2012, Firebase was created as a separate company that
provided Backend-as-a-Service with real-time functionality.
After it was acquired by Google in 2014, Firebase rapidly evolved into the multifunctional
behemoth of a mobile and web platform that it is today.
Firebase Services
Firebase Services can be divided into two groups:
232
Develop & test your app
Realtime Database
Auth
Test Lab
Crashlytics
Cloud Functions
Firestore
Cloud Storage
Performance Monitoring
Crash Reporting
Hosting
Grow & Engage your audience
Firebase Analytics
Invites
Cloud Messaging
Predictions
AdMob
Dynamic Links
Adwords
233
Remote Config
App Indexing
Realtime Database
The Firebase Realtime Database is a cloud-hosted NoSQL database that lets you store and
sync between your users in realtime.
The Realtime Database is really just one big JSON object that the developers can manage in
realtime.
Realtime Database => A Tree of Values
With just a single API, the Firebase database provides your app with both the current value of
the data and any updates to that data.
Realtime syncing makes it easy for your
users to access their data from any
device, be it web or mobile. Realtime
Database also helps your users
collaborate with one another.
Another amazing benefit of Realtime Database is that it ships with mobile and web SDKs,
allowing you to build your apps without the need for servers.
When your users go offline, the Realtime Database SDKs use local cache on the device to serve
and store changes. When the device comes online, the local data is automatically
synchronized.
The Realtime Database can also integrate with Firebase Authentication to provide a simple
and intuitive authentication process.
Authentication
Firebase Authentication provides backend services, easy-to-use SDKs, and ready-made UI
libraries to authenticate users to your app.
Normally, it would take you months to set up your own authentication system. And even after
that, you would need to keep a dedicated team to maintain that system. But if you use
Firebase, you can set up the entire system in under 10 lines of code that will handle
everything for you, including complex operations like account merging.
You can authenticate your app’s users through the following methods:
Email & Password
Phone numbers
Google
Facebook
Twitter
& more!
Using Firebase Authentication makes building secure authentication systems easier, while
also improving the sign-in and onboarding experience for end users.
234
Firebase Authentication is built by the same people who created Google Sign-in, Smart Lock,
and Chrome Password Manager.
Firebase Cloud Messaging (FCM)
Firebase Cloud Messaging (FCM) provides a reliable and battery-efficient
connection between your server and devices that allows you to deliver and
receive messages and notifications on iOS, Android, and the web at no cost.
You can send notification messages (2KB limit) and data messages (4KB limit).
Using FCM, you can easily target messages using predefined segments or create your own,
using demographics and behavior. You can send messages to a group of devices that are
subscribed to specific topics, or you can get as granular as a single device.
FCM can deliver messages instantly, or at a future time in the user’s local time zone. You can
send custom app data like setting priorities, sounds, and expiration dates, and also track
custom conversion events.
The best thing about FCM is that there is hardly any coding involved! FCM is completely
integrated with Firebase Analytics, giving you detailed engagement and conversion tracking.
You can also use A/B testing to try out different versions of your notification messages, and
then select the one which performs best against your goals.
Firebase Database Query
Firebase has simplified the process of retrieving specific data from the database through
queries. Queries are created by chaining together one or more filter methods.
Firebase has 4 ordering functions:
orderByKey()
orderByChild(‘child’)
orderByValue()
orderByPriority()
Note that you will only receive data from a query if you have used the on() or once()
method.
You can also use these advanced querying functions to further restrict data:
startAt(‘value’)
endAt(‘value’)
equalTo(‘child_key’)
limitToFirst(10)
limitToLast(10)
In SQL, the basics of querying involve two steps. First, you select the columns from your table.
Here I am selecting the Users column. Next, you can apply a restriction to your query using
the WHERE clause. From the below-given query, I will get a list of Users whose name is
GeekyAnts.
You can also use the LIMIT clause, which will restrict the number of results that you will get
back from your query.
235
In Firebase, querying also involves two steps. First, you create a reference to the parent key
and then you use an ordering function. Optionally, you can also append a querying function
for a more advanced restricting.
How to Store Data? => Firebase Storage
Firebase Storage is a standalone solution for uploading user-generated content like images
and videos from an iOS and Android device, as well as the Web.
Firebase Storage is designed specifically to scale your apps, provide security, and
ensure network resiliency.
Firebase Storage uses a simple folder/file system to structure its data.
Firebase Test Labs
Firebase Test Labs provides a large number of mobile test devices to help you test your apps.
Firebase Test Labs comes with 3 modes of testing:
Instrumentation Test
These are tests that you written specifically to test your app, using frameworks like Espresso
and UI Automator 2.0
Robo Test
This test is for people who just want to relax and let Firebase worry about tests. Firebase Test
Labs can simulate user touch and see how each component of the app functions.
Game Loop Test
Test Labs support game app testing. It comes with a beta support for using a “demo mode”
where the game app runs while simulating the actions of the player.
Remote Config
Remote config essentially allows us to publish updates to our
users immediately. Whether we wish to change the color scheme
for a screen, the layout for a particular section in our app or show
promotional/seasonal options — this is completely doable using
the server side parameters without the need to publish a new
version.
Remote Config gives us the power to:
Quickly and easily update our applications without the need to publish a new build to
the app/play store.
Effortlessly set how a segment behaves or looks in our application based on the
user/device that is using it.
Firebase App Indexing
To get your app’s content indexed by Google, use the same URLs in your app that you use on
your website and verify that you own both your app and your website. Google Search crawls
the links on your website and serves them in Search results. Then, users who’ve installed your
app on their devices go directly to the content in your app when they click on a link.
236
Firebase Dynamic Links
Deep links are URLs that take you to a content. Most web links are deep links.
Firebase can now modify deep links into Dynamic Links! Dynamic Links allow the user to
directly come to a particular location in your app.
There are 3 fundamental uses for Dynamic Links
Convert Mobile Web Users to Native App Users.
Increase conversion for user-to-user sharing. By converting your app’s users, when the
app is shared with other users you can skip the generic message which is shown when a
user downloads it from the store. Instead, you can show them personalised greeting
message.
Drive installs from the third party. You can use social media networks, email, and SMS
can be used to increase your target audience. When users install the app, they can see
the exact content of your campaigns.
Firestore
Cloud Firestore is a NoSQL document database that lets you easily store, sync, and query data
for your mobile and web apps — at a global scale.
Though this may sound like something similar to the Realtime Database, Firestore brings
many new things to the platform that makes it into something completely different from
Realtime Database.
Improved Querying and Data Structure
Where Realtime Database stores data in the form of a giant JSON tree, Cloud Firestore takes a
much more structured approach. Firestore keeps its data inside objects called documents.
These documents consist of key-value pairs and can contain any kind of data, from strings to
binary data to even objects that resemble JSON trees (Firestore calls it as maps). The
documents, in turn, are grouped into collections.
Firestore database can consist of multiple collections that can contain
documents pointing towards sub-collections. These sub-collections can
again contain documents that point to other sub-collections, and so on.
You can build hierarchies to store related data and easily retrieve any data that you need using
queries. All queries can scale with the size of your result set, so your app is ready to scale from
its first day itself.
Firestore’s queries are shallow. By this, I mean to say that in Firestore, you can simply fetch
any document that you want without having to fetch all of the data that is contained in any of
its linked sub-collections.
You can fetch a single document without having to grab any of its sub-collections
237
Query with Firestore
Imagine that you have created a collection in Firestore that contains a list of Cities. So, before
you can send out a query, you will have to store the database inside a variable.
Here, citiesRef is that variable that contains your collection of cities. Now, if you want to
find a list of capital cities, you would write a query like this:
Here’s another example of queries in Firestore. Say you want to see only 2 of cities from your
database whose population is more than 100,000.
But Cloud Firestore can make querying even easier! In some cases, Cloud Firestore can
automatically search your database across multiple fields. Firestore will guide you towards
automatically building an index that will help Firestore to make querying extremely simple.
Better Scalability
Though Firebase’s Realtime Database is capable of scaling, things will start to get crazy when
you app becomes really popular or if your database becomes really massive.
Cloud Firestore is based on Googles Cloud infrastructure. This allows it to scale much more
easily and to a greater capacity than the Realtime Database.
Multi-Region Database
In Firestore, your data is automatically copied to various regions. So if one data center goes
offline due to some unforeseen reason, you can be sure that your app’s data is still safe
somewhere else.
Firestore’s multi-region database also provides strong consistency. Any changes to your data
will be mirrored across every copy of your database.
Different Pricing Model
The Realtime Database charges its users based on the amount of data that you have stored in
the database.
Cloud Firestore also charges you for the same, but the cost is significantly lower than that of
Realtime Database and instead of basing the cost on the amount of data stored, Firestore’s
pricing is driven by the number of reads/writes that you perform.
238
239
Google Cloud Console:
Console that aims for the Management of your information on all the elements that come with
your application: web applications, data analysis, virtual machines, databases, network,
services for developers, etc. increase their capacity as needed and diagnose production
issues in an easy-to-use web interface. Search for resources quickly and connect to instances
via SSH within your browser. Manage the development processes where you are, with native
iOS and Android apps. Master complex development tasks with Google Cloud, your cloud-
based management machine.
Overview
In this article, we will explore why you should consider tackling IaaS and PaaS together. Many
organizations gave up on OpenStack during its hype phase, but in my view, it is time to reconsider the
IaaS strategy. Two main factors are really pushing a re-emergence of interest in OpenStack and that is
containers and cloud.
Containers require very flexible, software-defined infrastructure and are changing the application
landscape fast. Remember when we had the discussions about pets vs cattle? The issue with OpenStack
240
during its hype phase was that the workloads simply didn’t exist within most organizations, but now
containers are changing that, from a platform perspective. Containers need to be orchestrated and the
industry has settled in on Kubernetes for that purpose. In order to run Kubernetes, you need quite a lot
of flexibility at scale on the infrastructure level. You must be able to provide solid Software Defined
Networking, Compute, Storage, Load Balancing, DNS, Authentication, Orchestration, basically
everything and do so at a click of the button. Yeah, we can all do that, right.
If we think about IT, there are two types of personas. Those that feel IT is generic, 80% is good enough
and for them, it is a light switch: on or off. This persona has no reason whatsoever to deal with IaaS and
should just go to the public cloud, if not already there. In other words, OpenStack makes no sense. The
other persona feel IT adds compelling value to their business and going beyond 80% provides them
with distinct business advantages. Anyone can go to public cloud but if you can turn IT into a
competitive advantage then there may actually be a purpose for it. Unfortunately, with the way many
organizations go about IT today, it is not really viable, unless something dramatic happens. This brings
me back to OpenStack. It is the only way an organization can provide the capabilities a public cloud
offers while also matching price, performance and providing a competitive advantage. If we cannot
achieve the flexibility of public cloud, the consumption model, the cost effectiveness and provide
compelling business advantage then we ought to just give up right?
I also find it interesting that some organizations, even those that started in the public cloud are starting
to see value in build-your-own. Dropbox, for example, originally started using AWS and S3. Over last
few years they built their own object storage solution, one that provided more value and saved 75
million over two years. They also did so with a fairly small team. I certainly am not advocating for
doing everything yourself, I am just saying that we need to make a decision, does IT provide
compelling business value? Can you do it for your business, better than the generic level playing field
known as public cloud? If so, you really ought to be looking into OpenStack and using momentum
behind containers to bring about real change.
241
workload. Finally, there are other factors that could make baremetal play important role that won’t be
covered, cost/performance or isolation/security.
If we stick to virtualization technology we have one and only one choice. This again is where
OpenStack shines, at least Red Hat OpenStack. One of the components shipped is ironic (metal-as-a-
service). Ironic allows us to manage baremetal just like a virtual machine, in fact in OpenStack there is
no difference and why OpenStack refers to compute units as instances, because it could be either.
OpenStack can provide OpenShift with VM or baremetal based nodes and much, much more.
242
OpenShift on OpenStack Architectures
Important to any underlying architecture discussion is how to group OpenShift masters, infrastructure
and application nodes. OpenStack provides two different possibilities.
Resource vs AutoScaling Groups
Resource groups allow us to group instances together and apply affinity or anti-affinity policies via the
OpenStack scheduler. AutoScaling groups allow us to group instances and based on alarms, scale-up or
scale-down those instances automatically. At first glance, you would think for masters and infra nodes
use resource groups and app nodes autoscaling groups. While autoscaling sounds great, especially for
app nodes, there are a lot of possibilities that can lead to scaling either happening or not happening
when desired. My experience is this can work well with simple WordPress-type applications but not
something more complex, like a container platform or OpenShift. Also another disadvantage with
autoscaling groups is they don’t support an index. Indexes within groups are used to increment the
instance name: master0, master1 and so on. A final point is that you can easily scale resource groups, it
just needs to be triggered by an update to the Heat stack. The nice thing is you can also control scaling
and if it is to be automated, you have more flexibility than relying on alarms in Ceilometer. For all of
these reasons, I recommend creating three resource groups: masters, infras,and nodes.
Two common OpenShift architectures for OpenStack are non-ha and ha within single tenant.
Non-HA
In this architecture, we will have one master, one infra node, and x application nodes. While certainly
application availability can be achieved by deploying across multiple nodes, the master presents a
243
single point of failure for the data plane. The infra node runs the OpenShift router and as such a failure
here would mean incoming traffic to applications would be interrupted.
HA
The HA architecture typically has three masters, two infra nodes and x app nodes. There are variations
where you could have 3 infra nodes if you are running metrics and logging services that require a third
node. In addition you could also split etcd and run it independently, on three additional nodes. If
east/west traffic is not allowed between network zones, then you would likely require two infra nodes
in each zone, to handle incoming traffic for app nodes. There are many variations of course, but for
now let us keep it simple.
244
Deploying OpenStack
In order to deploy OpenShift on OpenStack we obviously need OpenStack. Here are some guides to
help.
OpenStack 12 (Pike) RDO Lab Installation and Configuration Guide on Hetzner Root Servers
OpenStack 11 (Ocata) RDO Lab Installation and Configuration Guide
Red Hat OpenStack Platform 10 (Newton) Installation and Configuration Guide
Once OpenStack is deployed you need to ensure a few things are in place.
Create Flavors
# openstack flavor create --ram 2048 --disk 30 --ephemeral 0 --vcpus 1 --
public ocp.bastion
# openstack flavor create --ram 8192 --disk 30 --ephemeral 0 --vcpus 2 --
public ocp.master
# openstack flavor create --ram 8192 --disk 30 --ephemeral 0 --vcpus 1 --
public ocp.infra
# openstack flavor create --ram 8192 --disk 30 --ephemeral 0 --vcpus 1 --
public ocp.node
245
--file /root/rhel-server-7.4-x86_64-kvm.qcow2 "rhel74"
246
start=144.76.132.226,end=144.76.132.230 --no-dhcp \
--subnet-range 144.76.132.224/29 public_subnet
Create Router
# openstack router create --no-ha router1
That is it! Everything else will be created automatically by the deployment of the OpenShift
infrastructure. If you want to include more or less you can also easily update the Heat templates
provided.
Configure Parameters
# cp sample-vars.yml vars.yml
# vi vars.yml
---
247
### OpenStack Setting ###
domain_name: ocp3.lab
dns_forwarders: [213.133.98.98, 213.133.98.99]
external_network: public
service_subnet_cidr: 192.168.1.0/24
router_id:
image: rhel74
ssh_user: cloud-user
ssh_key_name: admin
stack_name: openshift
openstack_version: 12
contact: admin@ocp3.lab
heat_template_path: /root/openshift-on-openstack-123/heat/openshift.yaml
248
Authenticate OpenStack Credentials
# source /root/keystonerc_admin
Step Two
This step is responsible for preparing OpenShift environment. The hostnames will be set, OpenShift
inventory file dynamically generated, systems will be registered to rhn, required packages installed and
docker, among other things, properly configured.
Get IP address of the Bastion Host
# openstack stack output show -f value -c output_value openshift ip_address
{
"masters": [
{
"name": "master0",
"address": "192.168.1.19"
},
{
"name": "master1",
"address": "192.168.1.16"
},
{
"name": "master2",
"address": "192.168.1.15"
}
],
"lb_master": {
"name": "lb_master",
"address": "144.76.134.230"
},
"infras": [
{
"name": "infra0",
"address": "192.168.1.10"
},
{
249
"name": "infra1",
"address": "192.168.1.11"
}
],
"lb_infra": {
"name": "lb_infra",
"address": "144.76.134.229"
},
"bastion": {
"name": "bastion",
"address": "144.76.134.228"
},
"nodes": [
{
"name": "node0",
"address": "192.168.1.6"
},
{
"name": "node1",
"address": "192.168.1.13"
}
]
}
[Bastion Host]
Change Directory to Cloned Git Repository
# cd openshift-on-openstack-123
250
PLAY RECAP
***************************************************************************
**************
bastion : ok=15 changed=7 unreachable=0 failed=0
infra0 : ok=18 changed=13 unreachable=0 failed=0
infra1 : ok=18 changed=13 unreachable=0 failed=0
localhost : ok=7 changed=6 unreachable=0 failed=0
master0 : ok=18 changed=13 unreachable=0 failed=0
master1 : ok=18 changed=13 unreachable=0 failed=0
master2 : ok=18 changed=13 unreachable=0 failed=0
node0 : ok=18 changed=13 unreachable=0 failed=0
node1 : ok=18 changed=13 unreachable=0 failed=0
Step Three
This step is responsible for configuring a vanilla OpenShift environment. By default, only the
OpenShift router and registry will be configured. OpenShift will be deployed based on the dynamically
generated inventory file in step 2. You can certainly edit the inventory file and make any changes. After
deployment of OpenShift, there is a small post-deployment playbook which will configure dynamic
storage to use OpenStack Cinder. Optional steps are defined as well to configure metrics and logging if
that is desired.
[Bastion Host]
Deploy OpenShift
[cloud-user@bastion ~]$ ansible-playbook -i /home/cloud-user/openshift-
inventory --private-key=/home/cloud-user/admin.pem -vv
/usr/share/ansible/openshift-ansible/playbooks/byo/config.yml
PLAY RECAP
***************************************************************************
**************
infra0.ocp3.lab : ok=183 changed=59 unreachable=0 failed=0
infra1.ocp3.lab : ok=183 changed=59 unreachable=0 failed=0
localhost : ok=12 changed=0 unreachable=0 failed=0
master0.ocp3.lab : ok=635 changed=265 unreachable=0 failed=0
master1.ocp3.lab : ok=635 changed=265 unreachable=0 failed=0
master2.ocp3.lab : ok=635 changed=265 unreachable=0 failed=0
node0.ocp3.lab : ok=183 changed=59 unreachable=0 failed=0
node1.ocp3.lab : ok=183 changed=59 unreachable=0 failed=0
INSTALLER STATUS
***************************************************************************
********
Initialization : Complete
251
Health Check : Complete
etcd Install : Complete
Master Install : Complete
Master Additional Install : Complete
Node Install : Complete
Hosted Install : Complete
Service Catalog Install : Complete
PLAY RECAP
***************************************************************************
***********************************************
infra0 : ok=4 changed=2 unreachable=0 failed=0
infra1 : ok=4 changed=2 unreachable=0 failed=0
localhost : ok=7 changed=6 unreachable=0 failed=0
master0 : ok=6 changed=4 unreachable=0 failed=0
master1 : ok=6 changed=4 unreachable=0 failed=0
master2 : ok=6 changed=4 unreachable=0 failed=0
node0 : ok=4 changed=2 unreachable=0 failed=0
node1 : ok=4 changed=2 unreachable=0 failed=0
Login in to UI
https://openshift.144.76.134.226.xip.io:8443
Optional
Configure Admin User
[cloud-user@bastion ~]$ ssh -i /home/cloud-user/admin.pem cloud-
user@master0
Install Metrics
Set Metrics to true in Inventory
[cloud-user@bastion ~]$ vi openshift_inventory
252
...
openshift_hosted_metrics_deploy=true
...
INSTALLER STATUS
***************************************************************************
*****************************************
Initialization : Complete
Metrics Install : Complete
Install Logging
There’s a lot of temptation to compare Pivotal’s Cloud Foundry (PCF) and Kubernetes (K8s) to each
other, we get it. They’re both platform services for deploying cloud-native apps, and they both deal
253
with containers, and the list goes on. There’s a lot of functional overlap between PCF and K8s, but it’s
important to understand how they differ from each other and when it’s best to use one rather than the
other, and when it’s best to use them together.
Nowadays, more than 10 years after the introduction of AWS, there are 3 levels of cloud-service
abstractions: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service
(SaaS).
Amazon Elastic Compute Cloud (EC2) falls into the IaaS category, giving users the basic infrastructure
needed to build and deploy an application. The next level is PaaS products, which will be the topic of
this post. PaaS products offer a higher level of abstraction, so the user won’t be exposed to the O/S,
middleware or runtime and needs only to concern him or herself with the application and data. And
lastly, SaaS products are applications built and hosted by a third-party platform and made available to
users via the internet.
254
Cloud-Native Service Models Comparison
A PaaS is a platform upon which developers can build and deploy applications. These products offer a
higher level of abstraction than we get from IaaS products meaning that, beyond networking, storage
and servers, the application’s O/S, middleware and runtime are all managed by the PaaS.
Within the PaaS market, two of the major players are Pivotal Cloud Foundry and Kubernetes. They are
both open source cloud PaaS products for building, deploying and scaling applications. Despite having
large areas of functional overlap, these two systems offer very different capabilities to their users and
are each better suited for different circumstances.
255
application, and on the other hand, we have the platform abstraction at the container level, building and
deploying containers as a part of a complete application.
PCF is one example of an “application” PaaS, also called the Cloud Foundry Application Runtime, and
Kubernetes is a “container” PaaS (sometimes called CaaS).
The bottom line is that it doesn’t have to be an ‘OR,’ it can be an ‘AND’. The question isn’t necessarily
whether you should use Cloud Foundry OR Kubernetes, the question is when you may need one AND/
OR the other. Because of a few key differentiators, they can be used together demonstrated in the way
they complement each other in the Cloud Foundry Container Runtime, an open-source collaboration
between Pivotal and Google (more on this later).
(source: pivotal.io) Pivotal Cloud Foundry architecture – open source and enterprise
256
Features
Applications run on Cloud Foundry are deployed, scaled and maintained by BOSH (PCF’s
infrastructure management component). It deploys versioned software and the VM for it to run on and
then monitors the application after deployment. Although the learning curve for BOSH is considered to
be fairly high, once mastered it adds considerable value by boosting team productivity.
More basic features of Pivotal Cloud Foundry include:
Cloud Controller to direct application deployment
Deploy using Docker Images and Buildpacks
Automated routing of all incoming traffic to appropriate component
Instant (vertical or horizontal) application scaling
Cf CLI (PCF command line interface)
Cluster scheduler
Load balancer and DNS
“Loggregator” – Logging and metrics aggregation
Installation and Usability
Before beginning the installation process for Pivotal Cloud Foundry, Pivotal documentation directs
users to configure their firewall for PCF and establish IaaS user role guidelines. After that, installation
is guided by a web user interface.
Best Use Cases
Cloud Foundry’s platform is a higher-level abstraction and so it offers a higher level of productivity to
its users. With productivity, though, comes certain limitations in what can be customized in the
runtime.
PCF is ideal for new applications, cloud-native apps and apps that run fine out of a buildpack. For
teams working with short lifecycles and frequent releases, PCF offers an excellent product.
Kubernetes
Kubernetes is a container scheduler or orchestrator. With container orchestration tools, the user creates
and maintains the container themselves. For many teams, having this flexibility and control over the
application is preferred.
Instead of focusing only on the app, the developer needs to create the container and then maintain it in
the future, for example, when anything on the stack has an update (a new JVM version, etc.).
257
(source: x-team.com) Kubernetes architecture
Features
Kubernetes is a mature container orchestrator that runs in the same market as Docker Swarm and
Apache Mesos. In Kubernetes, containers are grouped together into pods based on logical dependencies
which can then be easily scaled at runtime.
More basic features of Kubernetes include:
Master node for global control (scheduling, API server, data center)
Worker nodes (VM or physical machine) with services needed to run container pods
Auto-scaling of containers and volume management
Flexible architecture with replaceable components and 3rd party plugins
Stateful persistence layer
Kubectl (Kubernetes command line interface)
Active OSS community
Installation and Usability
A common complaint that users have about Kubernetes is the difficulty of the setup process. First of
all, you have to plan ahead when starting to implement K8s because you have to define your nodes in
258
advanced, which can be very time consuming. On top of that, setting up Kubernetes varies for each OS,
and the documentation isn’t sufficient in cases when users need to build custom environments. Add to
all of that manual integrations that are required, and even thinking about going through the setup
process can give you a headache.
Once it’s ready to use, though, Kubernetes offers the most mature and most popular service on the
market in terms of container orchestration tools. It also has an active community offering support and
resources to users.
Best Use Cases
Kubernetes is a lower-level abstraction in the PaaS world meaning greater flexibility to implement
customizations and build your containers to run how you want them to run. Unfortunately, this also
means more work for your engineering teams and decreased productivity.
When moving to any new system or product, a good rule of thumb is to use the highest level
abstraction that will solve your problem without putting any unnecessary limitations on the workload.
If you need more flexibility to do customizations, and you’re willing to put in the work, stick with
Kubernetes (or check out Kubo below).
259
Conclusion:
To conclude we saw that we have several methods for deploying native applications on
cloud platforms via PaaS in Open Source platform we have to Infrastructure as a Service
for configuration even for advanced administration in prioretary cloud platfroms.
AWS is rich by its tools that can be used in many fields sattelite robotics and even
blockchain,
If we want to use large scale data storage or large scale plateforms we can use GCP for
its costs
Azure begun collaborating we the OpenSource world in last few years try to reach
concurrency.
Blue Mix is still in an experimental step. Other platforms are also solid having the
advantage to be reserved to advanced users such as System Administrators and
experienced IT engineers.
260
Bibliography:
https://blog.overops.com/pivotal-cloud-foundry-vs-kubernetes
https://intellipaat.com/blog
https://searchaws.techtarget.com
https://shout.setfive.com
https://developer.searchblox.com
https://www.percona.com/blog
https://read.acloud.guru
http://blog.totalcloud.io
https://www.edureka.co/blog
https://www.cmswire.com/information-management
https://jumpcloud.com/blog
https://www.networkmanagementsoftware.com
https://blog.affini-tech.com
https://blog.overops.com
261
262