Académique Documents
Professionnel Documents
Culture Documents
2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Amazon Kinesis: Managed Service for Streaming Data Ingestion & Processing
o
Aggregator
Continuous
Processing
Storage
Analytics +
Reporting
A Customer View
1 Accelerated Ingest-Transform-Load
IT infrastructure, Applications logs, Social media, Fin. Market data, Web Clickstreams, Sensors, Geo/Location data
Software/
Technology
Digital Ad Tech./
Marketing
Financial Services
Consumer Online/
E-Commerce
Storm
Continuous Processing FX
Highly Scalable
Durable
Elastic
Elastic
Replay-able Reads
Data
Sources
Availability
Zone
Data
Sources
Data
Sources
Availability
Zone
S3
App.2
AWS Endpoint
Data
Sources
Availability
Zone
[Aggregate &
De-Duplicate]
Shard 1
Shard 2
Shard N
[Metric
Extraction]
DynamoDB
App.3
[Sliding
Window
Analysis]
Redshift
Data
Sources
App.4
[Machine
Learning]
Use the Get APIs for raw reads of Kinesis data streams
GetRecords {Limit, ShardIterator}
Kinesis Connectors
Enabling Data Movement
ITransformer
Kinesis
Defines the
transformation
of records
from the
Amazon
Kinesis stream
in order to suit
the userdefined data
model
IFilter
Excludes
irrelevant
records from
the
processing.
IBuffer
Buffers the set
of records to
be processed
by specifying
size limit (# of
records)& total
byte count
IEmitter
Makes client
calls to other
AWS services
and persists
the records
stored in the
buffer.
S3
DynamoDB
Redshift
Clickstream Archive
Aggregate
Statistics
In-Game Engagement
Trend Analysis
Clickstream Processing App
Ad Analytics Management
Billing Auditors
Kinesis Pricing
Simple, Pay-as-you-go, & no up-front costs
Pricing Dimension
Value
$0.015
$0.028
Real-time Performance
Perform continual processing on streaming
big data. Processing latencies fall to a few
seconds, compared with the minutes or
hours associated with batch processing.
Low Cost
28
http://aws.amazon.com/documentation/kinesis/
https://forums.aws.amazon.com/forum.jspa?forumID=169#
@adityak
Thank You
Shard 1
Spout Task A
Shard 2
Fetching data from Kinesis:
Fetches a batch of records via GetRecords API calls
Buffers the records in memory
Shard 3
Spout Task B
A spout task will fetch data from all shards it is responsible for
Shard 4
Emitting tuples and checkpointing
Converts each Kinesis data record into a tuple
Emits the tuple for processing
Checkpoints periodically (checkpoints stored in Zookeeper)
Shard n
Spout Task C
StormSubmitter.submitTopology(
topologyName,
topoConf,
builder.createTopology());
Import Export
S3
EMR
EC2
DynamoDB
Redshift
Data Pipeline
Glacier
Kinesis
S3