Bienvenue sur Scribd !

Ignorer le carrousel

Impala Case Study

Transféré par

Pappu Khan

0% ont trouvé ce document utile (0 vote)

15 vues2 pages

impala

Copyright

Formats disponibles

TXT, PDF, TXT ou lisez en ligne sur Scribd

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Signaler ce document

impala

Droits d'auteur :

Formats disponibles

Téléchargez comme TXT, PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

0% ont trouvé ce document utile (0 vote)

15 vues2 pages

Impala Case Study

Transféré par

Pappu Khan

impala

Droits d'auteur :

Formats disponibles

Téléchargez comme TXT, PDF, TXT ou lisez en ligne sur Scribd

Signaler comme contenu inapproprié

Passer à la page

Vous êtes sur la page 1sur 2

Rechercher à l'intérieur du document

Impala Case Study: Web Traffic

Problem Statement
Data that has been generating over the network is increasing exponentially. But the
existing data warehouse systems does not provide much scalability at less cost with
higher performance.

Proposed Solution
Instead of using costly warehouse systems, with the help of commodity hardware and
distribution process we can serve the customers at any scale. Even if the Data
generated is exponential to 10, it could be scalable simply by using Hadoop. In
this case we just need to add few more nodes to increase the Size of the cluster.
Because, storage is cheaper than processor.

Technical Prerequisites
Impala is a distributed process runs on top of HDFS. It requires

Running Hadoop Cluster with all the services.

Active Hive
MySQL or PostgreSQL which will be used as metastore for both Hive and Impala.
Java Virtual Machine (JVM). Oracle JVM is suggested.
For hardware requirements Impala H/W
Multiple components need to be installed on various nodes of the cluster.
(Suggested way of installing is by using Cloudera Manager, which gives automatic
installation of components via graphical user interface)
Learn Hadoop by working on interesting Big Data and Hadoop Projects for just $9

Solution Design
Existing system runs on top of DB2 system. Data needs to be copied to HDFS on daily
basis and tables in Impala needs to be updated(Configurable).

Code

Sqoop Command

sqoop import �connect jdbc:db2://networkio.com:50001/testdb �username dezyre

�password password �table webtraffic_server �m 1 �target-dir
/user/dezyre/internetio.db/webtraffic �fields-terminated-by �,�
Impala

Now we have readily available data in hadoop file system, and we need to create
table on top of it so that analysts can perform some operations and provide some
sort of suggestions which could help in improving the network bandwidth and to take
any performance action. Create a table using below command:-

create table webtraffic (count bigint, timestamp_server string,from_host string,

to_host string) row format delimited fields terminated by ',' location
�/user/dezyre/airport.db/webtraffic�;
The above query creates a table in default database(default.db is default database
if you don�t specify else it will create in the current database you are working).
It has a column structure with various fields which describes the network source
and target link and count.

Now let us fetch some sample records from the above created table just to see the
format and data.

select * from webtraffic limit 5;

1 1257033601 theybf.com w.sharethis.com

1 1257033601 agohq.org

3 1257033601 twistysdownload.com adserving.com

1 1257033601 459.cim.meebo.com 459.cim.meebo.com

1 1257033601 boards.nbc.com change.menelgame.pl

So, now will perform some analytics and find out what is the host that is being
used by most of the customers or which creates more requests to that host. Through
this we can put this as a platform and we can make business by advertising the way
most companies do.

select to_host,SUM(count) as tot from webtraffic where to_host is not null group by
to_host order by tot desc limit 1;

| to_host| tot|

| facebook.com | 21055155 |

Using these interactive results business can take quick decisions and can provide
faster solutions in real time. The main advantage is, if statistics says that the
network crashes after x requests, we can have a check in real time for x and we can
preventive measures.

Conclusion
Impala could be the best for analytics and for business to have quick insights on
customer behavior and network traffic. These information could be plotted as a
graph with nodes as routes traversed and edges as domains/hosts which will clearly
explain the traffic with a pictorial representation.

Vous aimerez peut-être aussi

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
D'Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Évaluation : 4 sur 5 étoiles
4/5 (5794)
Welding PPE
Document24 pages
Welding PPE
Pappu Khan
Pas encore d'évaluation
The Yellow House: A Memoir (2019 National Book Award Winner)
D'Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Évaluation : 4 sur 5 étoiles
4/5 (98)
Welding PPE
Document24 pages
Welding PPE
Pappu Khan
Pas encore d'évaluation
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
D'Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Évaluation : 3.5 sur 5 étoiles
3.5/5 (231)
Print Reading
Document4 pages
Print Reading
Pappu Khan
Pas encore d'évaluation
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
D'Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Évaluation : 4 sur 5 étoiles
4/5 (895)
JavaScript Optional Chaining
Document2 pages
JavaScript Optional Chaining
Pappu Khan
Pas encore d'évaluation
The Little Book of Hygge: Danish Secrets to Happy Living
D'Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Évaluation : 3.5 sur 5 étoiles
3.5/5 (400)
ITOps vs. DevOps
Document3 pages
ITOps vs. DevOps
Pappu Khan
Pas encore d'évaluation
Shoe Dog: A Memoir by the Creator of Nike
D'Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Évaluation : 4.5 sur 5 étoiles
4.5/5 (537)
Keep A Code Journal
Document3 pages
Keep A Code Journal
Pappu Khan
Pas encore d'évaluation
Never Split the Difference: Negotiating As If Your Life Depended On It
D'Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Évaluation : 4.5 sur 5 étoiles
4.5/5 (838)
Devops
Document2 pages
Devops
Pappu Khan
Pas encore d'évaluation
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
D'Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Évaluation : 4.5 sur 5 étoiles
4.5/5 (474)
Into The Core
Document2 pages
Into The Core
Pappu Khan
Pas encore d'évaluation
Grit: The Power of Passion and Perseverance
D'Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Évaluation : 4 sur 5 étoiles
4/5 (588)
IDing and Describing Project Stakeholders and Personas
Document1 page
IDing and Describing Project Stakeholders and Personas
Pappu Khan
Pas encore d'évaluation
Yes Please
D'Everand
Yes Please
Amy Poehler
Évaluation : 4 sur 5 étoiles
4/5 (1891)
Introduction To TensorFlow
Document2 pages
Introduction To TensorFlow
Pappu Khan
Pas encore d'évaluation
The Emperor of All Maladies: A Biography of Cancer
D'Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Évaluation : 4.5 sur 5 étoiles
4.5/5 (271)
Introduction To Container Orchestration
Document5 pages
Introduction To Container Orchestration
Pappu Khan
Pas encore d'évaluation
On Fire: The (Burning) Case for a Green New Deal
D'Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Évaluation : 4 sur 5 étoiles
4/5 (74)
Introduction To Classification Algorithms
Document10 pages
Introduction To Classification Algorithms
Pappu Khan
Pas encore d'évaluation
Team of Rivals: The Political Genius of Abraham Lincoln
D'Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Évaluation : 4.5 sur 5 étoiles
4.5/5 (234)
Introduction To Serverless Monitoring
Document1 page
Introduction To Serverless Monitoring
Pappu Khan
Pas encore d'évaluation
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
D'Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Évaluation : 4.5 sur 5 étoiles
4.5/5 (266)
Introduction To TensorFlow
Document2 pages
Introduction To TensorFlow
Pappu Khan
Pas encore d'évaluation
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
D'Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Évaluation : 4.5 sur 5 étoiles
4.5/5 (344)
Intrinsic and Extrinsic Motivation
Document3 pages
Intrinsic and Extrinsic Motivation
Pappu Khan
Pas encore d'évaluation
Rise of ISIS: A Threat We Can't Ignore
D'Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Évaluation : 3.5 sur 5 étoiles
3.5/5 (137)
Hexagonal Architecture Is Powerful
Document5 pages
Hexagonal Architecture Is Powerful
Pappu Khan
Pas encore d'évaluation
The World Is Flat 3.0: A Brief History of the Twenty-first Century
D'Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Évaluation : 3.5 sur 5 étoiles
3.5/5 (2259)
Containers
Document3 pages
Containers
Pappu Khan
Pas encore d'évaluation
Fear: Trump in the White House
D'Everand
Fear: Trump in the White House
Bob Woodward
Évaluation : 3.5 sur 5 étoiles
3.5/5 (738)
Implementing Event Sourcing With Axon and Spring Boot
Document6 pages
Implementing Event Sourcing With Axon and Spring Boot
Pappu Khan
Pas encore d'évaluation
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
D'Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Évaluation : 4 sur 5 étoiles
4/5 (1090)
Integrating Azure Functions
Document2 pages
Integrating Azure Functions
Pappu Khan
Pas encore d'évaluation
Principles: Life and Work
D'Everand
Principles: Life and Work
Ray Dalio
Évaluation : 4 sur 5 étoiles
4/5 (599)
Hybrid Cloud vs. Multi-Cloud
Document1 page
Hybrid Cloud vs. Multi-Cloud
Pappu Khan
Pas encore d'évaluation
John Adams
D'Everand
John Adams
David McCullough
Évaluation : 4.5 sur 5 étoiles
4.5/5 (2409)
IDing and Describing Project Stakeholders and Personas
Document1 page
IDing and Describing Project Stakeholders and Personas
Pappu Khan
Pas encore d'évaluation
The Unwinding: An Inner History of the New America
D'Everand
The Unwinding: An Inner History of the New America
George Packer
Évaluation : 4 sur 5 étoiles
4/5 (45)
Important Business Metrics
Document13 pages
Important Business Metrics
Pappu Khan
Pas encore d'évaluation
The Glass Castle: A Memoir
D'Everand
The Glass Castle: A Memoir
Jeannette Walls
Évaluation : 4.5 sur 5 étoiles
4.5/5 (1713)
Ideas From Another Field
Document3 pages
Ideas From Another Field
Pappu Khan
Pas encore d'évaluation
Angela's Ashes: A Memoir
D'Everand
Angela's Ashes: A Memoir
Frank McCourt
Évaluation : 4.5 sur 5 étoiles
4.5/5 (440)
Hexagonal Architecture Is Powerful
Document5 pages
Hexagonal Architecture Is Powerful
Pappu Khan
Pas encore d'évaluation
Steve Jobs
D'Everand
Steve Jobs
Walter Isaacson
Évaluation : 4.5 sur 5 étoiles
4.5/5 (806)
Hacktoberfest in Neo4j
Document1 page
Hacktoberfest in Neo4j
Pappu Khan
Pas encore d'évaluation
Bad Feminist: Essays
D'Everand
Bad Feminist: Essays
Roxane Gay
Évaluation : 4 sur 5 étoiles
4/5 (1016)
How To Read Log Files
Document2 pages
How To Read Log Files
Pappu Khan
Pas encore d'évaluation
The Outsider: A Novel
D'Everand
The Outsider: A Novel
Stephen King
Évaluation : 4 sur 5 étoiles
4/5 (1839)
How To Maximize Dispersed IT Teams
Document2 pages
How To Maximize Dispersed IT Teams
Pappu Khan
Pas encore d'évaluation
The Light Between Oceans: A Novel
D'Everand
The Light Between Oceans: A Novel
M.L. Stedman
Évaluation : 4.5 sur 5 étoiles
4.5/5 (789)
How To Fail at Test Automation
Document2 pages
How To Fail at Test Automation
Pappu Khan
Pas encore d'évaluation
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
D'Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Évaluation : 4.5 sur 5 étoiles
4.5/5 (121)
Hardest Part About Microservices
Document7 pages
Hardest Part About Microservices
Pappu Khan
Pas encore d'évaluation
Brooklyn: A Novel
D'Everand
Brooklyn: A Novel
Colm Tóibín
Évaluation : 3.5 sur 5 étoiles
3.5/5 (1937)
Third Space Learning - Fluent in Five - Spring Term 2 Week 5
Document11 pages
Third Space Learning - Fluent in Five - Spring Term 2 Week 5
ZoonieFR
Pas encore d'évaluation
The Woman in Cabin 10
D'Everand
The Woman in Cabin 10
Ruth Ware
Évaluation : 3.5 sur 5 étoiles
3.5/5 (2322)
STHAMB or Pillars of The Hindu Temples
Document31 pages
STHAMB or Pillars of The Hindu Temples
uday
Pas encore d'évaluation
A Man Called Ove: A Novel
D'Everand
A Man Called Ove: A Novel
Fredrik Backman
Évaluation : 4.5 sur 5 étoiles
4.5/5 (4609)
Handout E.15 - Examples On Transient Response of First and Second Order Systems, System Damping and Natural Frequency
Document14 pages
Handout E.15 - Examples On Transient Response of First and Second Order Systems, System Damping and Natural Frequency
Rishikesh Bhavsar
Pas encore d'évaluation
The Perks of Being a Wallflower
D'Everand
The Perks of Being a Wallflower
Stephen Chbosky
Évaluation : 4.5 sur 5 étoiles
4.5/5 (2104)
Pascal
Document3 pages
Pascal
senthilgnmit
Pas encore d'évaluation
Wolf Hall: A Novel
D'Everand
Wolf Hall: A Novel
Hilary Mantel
Évaluation : 4 sur 5 étoiles
4/5 (3811)
FFT Analysis of Poorly Organized Nanopores
Document5 pages
FFT Analysis of Poorly Organized Nanopores
July Gonzalez Bonagas
Pas encore d'évaluation
Little Women
D'Everand
Little Women
Louisa May Alcott
Évaluation : 4 sur 5 étoiles
4/5 (104)
2 Parameter Circular (61-80)
Document6 pages
2 Parameter Circular (61-80)
Papan Sarkar
Pas encore d'évaluation
Manhattan Beach: A Novel
D'Everand
Manhattan Beach: A Novel
Jennifer Egan
Évaluation : 3.5 sur 5 étoiles
3.5/5 (792)
Introduction To The Visual Basic Programming Language
Document24 pages
Introduction To The Visual Basic Programming Language
api-3749038
100% (1)
The Art of Racing in the Rain: A Novel
D'Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Évaluation : 4 sur 5 étoiles
4/5 (4200)
Doppler Log Notes
Document4 pages
Doppler Log Notes
ss_tayade
100% (1)
The Constant Gardener: A Novel
D'Everand
The Constant Gardener: A Novel
John le Carré
Évaluation : 3.5 sur 5 étoiles
3.5/5 (104)
MPDF
Document1 page
MPDF
Vipul Wankar
Pas encore d'évaluation
A Tree Grows in Brooklyn
D'Everand
A Tree Grows in Brooklyn
Betty Smith
Évaluation : 4.5 sur 5 étoiles
4.5/5 (1929)
ISORIA 20 Brochure
Document16 pages
ISORIA 20 Brochure
mariodal
Pas encore d'évaluation
Her Body and Other Parties: Stories
D'Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Évaluation : 4 sur 5 étoiles
4/5 (821)
Battery Testing Standards and Best Practices
Document45 pages
Battery Testing Standards and Best Practices
navi_0403
Pas encore d'évaluation
Sing, Unburied, Sing: A Novel
D'Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Évaluation : 4 sur 5 étoiles
4/5 (1103)
Pway Design Guide 2011-!!!
Document48 pages
Pway Design Guide 2011-!!!
REHAZ
Pas encore d'évaluation
Update DES Day Pit 9 Per Jam 08
Document1 page
Update DES Day Pit 9 Per Jam 08
Iccang Dewi
Pas encore d'évaluation
C100-F Installation Manual
Document55 pages
C100-F Installation Manual
pvfcqrtqcr
Pas encore d'évaluation
Learning Activity Sheet In: Computer Systems Servicing
Document12 pages
Learning Activity Sheet In: Computer Systems Servicing
Carvalds 0315
100% (1)
2022 Cayley Contest: The Centre For Education in Mathematics and Computing Cemc - Uwaterloo.ca
Document282 pages
2022 Cayley Contest: The Centre For Education in Mathematics and Computing Cemc - Uwaterloo.ca
tony doo
Pas encore d'évaluation
Understanding Water Discounts and Lye Solution in Soapmaking
Document7 pages
Understanding Water Discounts and Lye Solution in Soapmaking
Ioana
Pas encore d'évaluation
KHDtrack Standard 7
Document30 pages
KHDtrack Standard 7
Aziz Aziz
Pas encore d'évaluation
Disomat Tersus: Instruction Manual
Document268 pages
Disomat Tersus: Instruction Manual
Mohamed Hamad
100% (1)
3 Types of Emergency Shutdown and Emergency Isolation Valves
Document4 pages
3 Types of Emergency Shutdown and Emergency Isolation Valves
Mateusz Konopnicki
Pas encore d'évaluation
Final Assignment CSE 425 Concept of Programming Language Section 7 Spring 2020 North South University Submitted To: Prof. Md. Ezharul Islam (Ezm)
Document27 pages
Final Assignment CSE 425 Concept of Programming Language Section 7 Spring 2020 North South University Submitted To: Prof. Md. Ezharul Islam (Ezm)
Avisheik Barua 1721158
Pas encore d'évaluation
CSBS Syllabus Book 01 11 2021 1
Document117 pages
CSBS Syllabus Book 01 11 2021 1
PRITHVI P. K SEC 2020
Pas encore d'évaluation
Statistical Characteristics of Extreme Rainfall Events in Egypt
Document9 pages
Statistical Characteristics of Extreme Rainfall Events in Egypt
Ahmed El-Adawy
Pas encore d'évaluation
CNS Technician
Document2 pages
CNS Technician
Sohail
Pas encore d'évaluation
Synthetic Fiber Reinforced Concrete
Document13 pages
Synthetic Fiber Reinforced Concrete
Sahir Abas
0% (1)
DSP Lab 6
Document7 pages
DSP Lab 6
Ali Mohsin
Pas encore d'évaluation
Learning Objectives: Lecture 3: Moving Averages and Exponential Smoothing
Document6 pages
Learning Objectives: Lecture 3: Moving Averages and Exponential Smoothing
curiousdumbo
Pas encore d'évaluation
Triple Chocolate Layer Cake Triple Chocolate Layer Cake
Document3 pages
Triple Chocolate Layer Cake Triple Chocolate Layer Cake
Juhyun Lee
Pas encore d'évaluation
CS6411 Network Lab Manual - 2013 - Regulation PDF
Document71 pages
CS6411 Network Lab Manual - 2013 - Regulation PDF
jayaprasanna123
Pas encore d'évaluation
Oxyacids of Sulphur
Document29 pages
Oxyacids of Sulphur
Sumaira Yasmeen
100% (1)
Fusion Strategy: How Real-Time Data and AI Will Power the Industrial Future
D'Everand
Fusion Strategy: How Real-Time Data and AI Will Power the Industrial Future
Vijay Govindarajan
Pas encore d'évaluation
Grokking Algorithms: An illustrated guide for programmers and other curious people
D'Everand
Grokking Algorithms: An illustrated guide for programmers and other curious people
Aditya Bhargava
Évaluation : 4 sur 5 étoiles
4/5 (16)
Dark Data: Why What You Don’t Know Matters
D'Everand
Dark Data: Why What You Don’t Know Matters
David J. Hand
Évaluation : 4.5 sur 5 étoiles
4.5/5 (3)
Blockchain Basics: A Non-Technical Introduction in 25 Steps
D'Everand
Blockchain Basics: A Non-Technical Introduction in 25 Steps
Daniel Drescher
Évaluation : 4.5 sur 5 étoiles
4.5/5 (24)
Access 2019 For Dummies
D'Everand
Access 2019 For Dummies
Laurie A. Ulrich
Pas encore d'évaluation
Microsoft Access Guide for Success
D'Everand
Microsoft Access Guide for Success
Kevin Pitch
Évaluation : 5 sur 5 étoiles
5/5 (2)
ITIL 4: High-velocity IT: Reference and study guide
D'Everand
ITIL 4: High-velocity IT: Reference and study guide
Mark Smalley
Pas encore d'évaluation
Excel 2021
D'Everand
Excel 2021
JIAYI SIMONDS
Évaluation : 4 sur 5 étoiles
4/5 (11)
Joe Celko's SQL for Smarties: Advanced SQL Programming
D'Everand
Joe Celko's SQL for Smarties: Advanced SQL Programming
Joe Celko
Évaluation : 3 sur 5 étoiles
3/5 (1)