Académique Documents
Professionnel Documents
Culture Documents
Problem Statement
Data that has been generating over the network is increasing exponentially. But the
existing data warehouse systems does not provide much scalability at less cost with
higher performance.
Proposed Solution
Instead of using costly warehouse systems, with the help of commodity hardware and
distribution process we can serve the customers at any scale. Even if the Data
generated is exponential to 10, it could be scalable simply by using Hadoop. In
this case we just need to add few more nodes to increase the Size of the cluster.
Because, storage is cheaper than processor.
Technical Prerequisites
Impala is a distributed process runs on top of HDFS. It requires
Solution Design
Existing system runs on top of DB2 system. Data needs to be copied to HDFS on daily
basis and tables in Impala needs to be updated(Configurable).
Code
Sqoop Command
Now we have readily available data in hadoop file system, and we need to create
table on top of it so that analysts can perform some operations and provide some
sort of suggestions which could help in improving the network bandwidth and to take
any performance action. Create a table using below command:-
Now let us fetch some sample records from the above created table just to see the
format and data.
select to_host,SUM(count) as tot from webtraffic where to_host is not null group by
to_host order by tot desc limit 1;
| to_host| tot|
| facebook.com | 21055155 |
Using these interactive results business can take quick decisions and can provide
faster solutions in real time. The main advantage is, if statistics says that the
network crashes after x requests, we can have a check in real time for x and we can
preventive measures.
Conclusion
Impala could be the best for analytics and for business to have quick insights on
customer behavior and network traffic. These information could be plotted as a
graph with nodes as routes traversed and edges as domains/hosts which will clearly
explain the traffic with a pictorial representation.