Vous êtes sur la page 1sur 29

BIG DATA ANALYTICS

REFERENCE ARCHITECTURES AND


CASE STUDIES
Relational vs. Non-Relational Architecture

Relational Non-Relational

Rational Agile
Predictable Flexible
Traditional Modern
2
Agenda

Tips for
Big Data
Big Data Case Designing
Reference
Challenges Architectures Studies Big Data
Solutions

3
Big Data Challenges
UNSTRUCTURED

STRUCTURED

HIGH

MEDIUM

LOW

Archives Docs Business Media Social Public Data Machine Sensor


Apps Networks Web Storages Log Data Data
Complexity Velocity Variety Volume

Archives Media Data Storages


Scanned documents, statements, Images, video, audio etc. RDBMS, NoSQL, Hadoop, file systems
medical records, e-mails etc.. etc.

Docs Social Networks Machine Log Data


XLS, PDF, CSV, HTML, JSON etc. Twitter, Facebook, Google+, Application logs, event logs, server
LinkedIn etc. data, CDRs, clickstream data etc.

Business Apps Public Web Sensor Data


CRM, ERP systems, HR, project Wikipedia, news, weather, public Smart electric meters, medical
management etc. finance etc devices, car sensors, road cameras
etc.

4
Big Data Analytics

Traditional Analytics(BI) vs BigDataAnalytics

Focus on Descriptive analytics Predictive analytics


Diagnosis analytics Data Science

Limited data sets Large scale data sets


Data Sets Cleansed data More types of data
Simple models Raw data
Complex data models

Supports Causation: what happened, Correlation: new insight


and why? More accurate answers

5
Big Data Analytics Use Cases
LowLatency
Reliability

RealTime
Intelligence
Consumers Intelligent Agents

Volume DataQuality
Performance Data Business SelfService
Discovery Reporting

Data Scientists/ Business Users


Analysts

6
Big Data Analytics Reference Architectures

ArchitectureDrivers: ReferenceArchitectures:
Volume Extended Relational
Sources Non-Relational
Throughput Hybrid
Latency
Extensibility
Data Quality
Reliability
Security
Self-Service
Cost

7
Relational Reference Architecture

Data Sources Integration Data Storages Analytics Presentation

Data Query & Web


Structured ETL Warehouses Reporting Browsers

Semi- Native
Structured Messaging Data Marts OLAP Cubes Desktop

Operational Advanced Mobile


Unstructured API/ODBC Data Stores Analytics Devices

Replication Web Services

8
Extended Relational
Reference Architecture
Data Sources Integration Data Storages Analytics Presentation

Data Query & Web


Structured ETL Warehouses Reporting Browsers

Semi- Native
Structured Messaging Data Marts OLAP Cubes Desktop

Operational Advanced Mobile


Unstructured API/ODBC Data Stores Analytics Devices

Replication Web Services

Key components affected with Big Data challenges 9


Non-Relational Reference Architecture

Data Sources Integration Data Storages Analytics Presentation

NoSQL Query & Web


Structured ETL Reporting Browsers
Databases

Semi- Distributed File Native


Structured Messaging Map Reduce Desktop
Systems

Mobile
Unstructured API Search Engines Devices

Advanced
Analytics Web Services

Key components introduced with non-relational movement 10


Extended Relational vs. Non-Relational Architecture
Extended
ArchitectureDrivers NonRelational
Relational
Largedatavolume

Selfservice (adhocreporting)

Unstructureddataprocessing

Highdatamodelextensibility

Highdataqualityandconsistency

Extensivesecurity

Reliabilityandfaulttolerance

Lowlatency(nearrealtime)

Lowcost

Skillsavailability

11
Extended Relational vs. Non-Relational Architecture
Extended
ArchitectureDrivers NonRelational
Relational
Largedatavolume

Selfservice (adhocreporting)

Unstructureddataprocessing

Highdatamodelextensibility

Highdataqualityandconsistency

Extensivesecurity

Reliabilityandfaulttolerance

Lowlatency(nearrealtime)

Lowcost

Skillsavailability

12
Extended Relational vs. Non-Relational Architecture
Extended
ArchitectureDrivers NonRelational
Relational
Largedatavolume

Selfservice (adhocreporting)

Unstructureddataprocessing

Highdatamodelextensibility

Highdataqualityandconsistency

Extensivesecurity

Reliabilityandfaulttolerance

Lowlatency(nearrealtime)

Lowcost

Skillsavailability

13
Relational vs. Non-Relational Architecture

Relational Non-Relational

Rational Agile
Predictable Flexible
Traditional Modern
14
Big Data Analytics Use Cases

RealTime
Intelligence
Consumers Intelligent Agents

Performance
Volume Data Business
Discovery Reporting

Data Scientists Business Users

15
Data Discovery: Non-Relational Architecture

Data Sources Integration Data Storages Analytics Presentation

NoSQL Query & Web


Structured ETL Reporting Browsers
Databases

Semi- Distributed File Native


Structured Messaging Map Reduce Desktop
Systems

Mobile
Unstructured API Search Engines Devices

Advanced
Analytics Web Services

16
Big Data Analytics Use Cases

RealTime
Intelligence
Consumers Intelligent Agents

DataQuality
Data Business SelfService
Discovery Reporting

Data Scientists Business Users

17
Business Reporting: Hybrid Architecture

Data Sources Integration Data Storages Analytics Presentation

Relational SQL Query & Web


Structured ETL Reporting Browsers
DWH/DM

Semi- Distributed File Native


Structured Messaging Map Reduce Desktop
Systems

Mobile
Unstructured API Search Engines Devices

Advanced
Analytics Web Services

Extended Relational components Non-relational components 18


Big Data Analytics Use Cases
LowLatency
Reliability

RealTime
Intelligence
Consumers Intelligent Agents

Data Business
Discovery Reporting

Data Scientists Business Users

19
Lambda Architecture

Source:

20
Case Study #1: Usage & Billing Analysis
Business Goals:
Provide visual environment for building
Business Area:
custom mobile application Cloud based platform for building, deploying,
Charge customers based on the platform hosting and managing of mobile applications
they are using, number of consumers
applications etc.

21
Architectural Decisions
Architecture Drivers:

Volume (> 10 TB) Reliability (24/7)


Sources (Semi-structured - JSON) Security (Multitenancy)
Throughput (> 10K/sec) Self-Service (Ad-Hoc reports)
Latency (2 min) Cost (The less the better )
Extensibility (Custom metrics) Constraints (Public Cloud)
Data Quality (Consistency)

Trade-off:
Extended
// Non-Relational
Relational
Extensibility + ExtendedRelationalArchitecture
Data Quality + ExtensibilityviaPreallocated
Self-Service + Fields pattern

22
Technologies:
Solution Architecture Amazon Redshift
Amazon SQS
Amazon S3
Elastic Beanstalk
Jaspersoft BI Professional
Python

23
Case Study #2: Clickstream for retail website
Business Goals:
Build in-house Analytics Platform for ROI measurement Business Area:
and performance analysis of every product and feature
delivered by the e-commerce platform;
Retail. A platform for e-commerce and
Provide the ability to understand how end-users are collecting feedbacks from customers
interacting with service content, products, and features on
sites;
Do clickstream analysis;
Perform A/B Testing

24
Architectural Decisions
Architecture Drivers:

Volume (45 TB) Reliability (24/7)


Sources (Semi-structured - JSON) Security (Multitenancy)
Throughput (> 20K/sec) Self-Service (Canned reports, Data
Latency (1 hour) science)
Extensibility (Custom tags) Cost (The less the better )
Data Quality (Not critical) Constraints (Public Cloud)

Trade-off:
Extended Non-
//
Relational Relational
Volume/Scalability +/ + NonRelationalArchitecture
Throughput + + ReportingviaMaterializedView
pattern
Self-Service + +/
Extensibility +

25
Technologies:
Solution Architecture Amazon S3
Flume
Hadoop/HDFS, MapReduce
HBase
Oozie
Hive

Node1

Node2

NodeN

26
Tips for Designing Big Data Solutions

Understand data users and sources


Discover architecture drivers
Select proper reference architecture
Do trade-off analysis, address cons
Map reference architecture to technology stack
Prototype, re-evaluate architecture
Estimate implementation efforts
Set up devops practices from the very beginning
Advance in solution development through small wins
Be ready for changes, big data technologies are evolving
rapidly

27
Clients include:

Leading global Product and


Application Development partner
founded in 1993

3,300+ employees across North


America, Ukraine and Western
Europe

Thousands of successful outsourcing


projects!

SaaS/Cloud Solutions . Mobility Solutions . UX/UI


BI/Analytics/Big Data . Software Architecture . Security
28
Thank You!

SoftServe US Office
One Congress Plaza,
111 Congress Avenue, Suite 2700 Austin, TX
78701
Tel: 512.516.8880

Contacts
Serhiy Haziyev: shaziyev@softserveinc.com
Olha Hrytsay: ohrytsay@softserveinc.com

29

Vous aimerez peut-être aussi