Académique Documents
Professionnel Documents
Culture Documents
FUNDAMENTALS
OF BIG DATA
ENGINEERING
WHAT IS BIG DATA? 1
UPGRAD 1
Velocity
Data is generated by individuals and organizations around the world at such a great
speed that is impossible to comprehend. Track and monitoring this data in a timely fashion
requires advanced techniques such as smart sensors and RFID tags which can handle
large quantities of data in real-time.
Variety
Big Data is different from traditional data in the sense that it doesn’t discriminate
between data and includes under its vast canvas all structured, semi-structured, and
unstructured data. Unlike traditional data that is essentially structured and can be stored
neatly in a relational database, Big Data requires an infrastructure that enables the storing,
processing, and analyzing of huge chunks of metadata.
Now that we’ve established what Big Data is, it’s time to talk about the
business connotations of Big Data.
With data piling up by the minute, Big Data has opened up new vistas
of possibilities for the business world. The spell of Big Data has
become so pronounced that every institution wants to use it, from a
small startup to a Fortune 500 company. According to stats, in 2018
itself, the global Big Data market is expected to generate an annual
revenue of over $42 billion with the biggest share of the revenue
coming from services spending (which was 40% of the total market
share in 2017).
Every company, however small, produces data. This data can generate
from multiple sources - from social media comments and mentions, to
credit card payments and customer feedback. This is where business-
es strike gold. By diving into this data, businesses can uncover valu-
able information about the latest market trends, consumer behavior
towards their products/services, their taste and preference patterns,
and much more. Once businesses have this vital information in their
hands, they can use it to their advantage. For instance, by knowing the
preference patterns of consumers, a business can focus on developing
such products/services that can successfully address the pain points
of the consumers. Then again, a company can take help of the latest
market data to know about their potential competitors and the kind of
products/services they are offering. Consequently, it can come up with
better products or services.
UPGRAD 2
Like we said before, Big Data isn’t your conventional data - it requires
specialized infrastructure and tools to be stored, processed, and analyzed.
Since traditional approaches don’t suffice, this is where Big Data
platforms come in.
Cloudera
One of the earliest commercial Hadoop-based platforms, Cloudera is a
fully optimized platform for the cloud. It integrates Machine Learning
and advanced analytics within its infrastructure to enable businesses
to convert complex data into data-driven, actionable insights.
Hortonworks
Hortonworks is one of the few Hadoop-based platforms that offer a 100% open-source
global data management services to allow businesses to seamlessly manage and monitor
the complete lifecycle of their data. It comes without the restrictions of proprietary software,
thus, enabling you to store, manage, and scale data both in the cloud or on premises. It
was the first platform to integrate Apache HCatalog support to create metadata, thereby,
simplifying the process of data sharing across multiple layers.
UPGRAD 3
IBM
IBM collaborated with Hortonworks to develop Apache Hadoop - an open-source plat-
form specifically designed to enable distributed processing of Big Data. Although a rela-
tively new solution, Apache Hadoop has garnered a huge fan following owing to the fact
that it offers a highly reliable and scalable environment for the distributed processing of
huge datasets. It also comes with tools for data governance, security, data federation,
and advanced query and data management.
Microsoft
Microsoft’s own brand of Hadoop platform, Azure HDInsights is an enterprise-grade,
open-source analytics package. It has been designed to allow businesses to easily run
other open-source frameworks including Spark, Apache Hadoop, and Kafka. Similar to
AWS, Azure, too, offers a wide range of products including AI/ML, analytics, computing,
databases, containers, developer tools, IoT, and Azure Stack, among other things.
NoSQL
Just as structured data can be easily handled and processed using SQL, to handle
unstructured data you require NoSQL (Not Only SQL). NoSQL has been designed to
function as an alternative to conventional relational databases wherein a data schema is
prepared to carefully arrange the data in tables. So, without following the established
convention of relational schema, NoSQL systems can accommodate a wide range of data
models, documents, and graph formats.
Hive
Apache Hive is a distributed data warehouse and management software that has been
built on top of Apache Hadoop. It can be used for mining, reading, writing, data query,
and analysis of huge datasets stored in distributed databases. However, if you wish to
execute SQL queries and applications over distributed data in Hive, you must implement
traditional SQL queries in the MapReduce API.
UPGRAD 4
360° Customer View and Sentiment
Analysis
UPGRAD 5
These conventional models needed to be run using complex SQL queries from historical data (bills
and claims) after which one had to wait for weeks for the results. Unlike this time-consuming
process, Big Data analytics, predictive analytics, and Machine Learning are prompt in detecting
anomalies in real-time. These advanced tools can sift through vast loads of historical data, analyze,
and learn about an individual’s transaction behavior from past records, and immediately offer red
flag alerts when they detect a pattern that matches a recognized fraud scheme or activity.
For the smart consumers of today, it’s all about customization and personalization. And hence, you
might have witnessed your own personalized ‘recommendation list’ on e-commerce platforms such
as Amazon, eBay, Netflix, and even Facebook. Both e-commerce and social media platforms are
now using Big Data analytics to wean useful insights from their troves of user data to craft unique
and personalized recommendations for individual users based on their distinct taste and preference
patterns.
Thus, using Big Data analytics brings in a twofold advantage - first, it allows e-commerce platforms
to enrich their product data for an optimized search experience both on mobile devices and desk-
tops, and second, it enables them to customize products in a way that ensures maximum conver-
sion. As for the users, they get to take full advantage of specially curated watch lists or buy lists
which not only saves their time but also contributes to an enriches search/buying experience.
These three popular use cases of Big Data establish a clear fact that while customer/sentiment analysis
is pivotal for businesses in the IT/Tech and E-commerce sector, fraud detection is the dominant use
case in the Telecom, Finance, and Insurance sectors.
UPGRAD 6
Here are some of the top-notch job positions
in Big Data:
Data Scientist
UPGRAD 7
understandable and actionable insights proficiency in analytical and statistical
that can add value to a business. Once tools, along with a basic knowledge of
the relevant data is gathered, Big Data Machine Learning.
Engineers must build the basic architecture
that’s required for data analysis and
processing. After the data is processed,
Big Data Engineers have to integrate it
within the production and management
infrastructure to generate data-driven
and innovative business solutions.
UPGRAD 8
Management Policy.’ With this policy, the CAG hopes to expand the capacity of the Indian Audit and
accounts departments by exploiting the data from both the state and union governments. Carrying
the baton of Big Data forward, today, DISCOMS in India are gathering data from data sensors to
monitor and analyze power consumption and compare it with the historical records of power usage
patterns to deduce preventive measures for combating AT&C (Aggregate Technical & Commercial)
losses.
In the light of this steadily increasing Big Data sector in India, K.S Viswanathan, VP of NASSCOM
maintains that the market value of Big Data sector is expected to reach $16 billion by 2025, with a
CAGR of 26%. If the Big Dara market continues to expand and grow at this rate, by 2025, India will
become a 32% shareholder of the global market.
Big Data might be the most talked about buzzword these days, but what are
the thoughts of experts on Big Data and its capabilities? How does the future
of Big Data look like, according to these experts? Let’s talk a bit about that!
Daniel looks at Big Data as a large and careful collection of multiple languages
brought together in a complex environment. He says,
“In order for us to truly deal with speed and velocity, we need to look at the
process of querying big data with proper metadata management tools to
extract value, meaning the orchestration tools need to simplify the process of
managing complex datasets needed to speed up productionising BI and
analysts tasks through process improvements.”
While there are clear winners when it comes to managing Big Data - the likes of Google, Facebook,
Amazon, and such - Daniel believes that “it would be interesting to see other leverage data into
meaningful strategic assets and business models.”
Another well-known expert in the field of Big Data and Analytics, Andrew Chen, Head of Rider
Growth at Uber, says,
“It’s important to leverage data the same way, whether it’s a strategic or tactical issue: Have a vision
for what you are trying to do. Use data to validate and help you navigate that vision, and map it
down into small enough pieces where you can begin to execute in a data-informed way. Don’t let
shallow analysis of data that happens to be cheap/easy/fast to collect nudge you off-course in your
entrepreneurial pursuits.”
UPGRAD 9
While these two professionals focus on the applicability aspect of Big Data, yet another expert has
something to say about the monetary aspect of it.
A Ph.D., Kirk Borne is the Principal Data Scientist at Booz Allen Hamilton. He’s also one amongst the
top 10 Data Science and Big Data influencers.
According to Kirk, we’ll see much more focus on Big Data and Machine Learning - from the
standpoint of ROI - in the coming years. To quote him,
“The marketing hype on these topics has been intense for a few years, and I believe that the data
community (and its observers) have developed ‘hype fatigue”
Kirk feels that it’s the time for Big Data to demonstrate value. According to him, the most important
“V” of Big Data is “value”. Adding further, he says “That [value] refers to value creation and
innovation across all data and information assets, Our stakeholders will demand to see more
discussion, demonstration, and proofs of value and ROI from big data in the coming year.”
All in all, when it comes to the future of Big Data, there is a consensus amongst all the experts -
there’ll only be uphill from here. And while it means the path will be tougher than it has been
(because of the ever expansion in the quantity of data), it also means that everything will only get
improved and advanced from here on.
By this point, it’s extremely lucid that Big Data is indeed the present and the
future. The market has also shaped up beautifully in this field, opening up an
increased number of job roles and responsibilities. What must be the need of
the hour, then?
The world that we’re in today demands Big Data experts. And if you have
even a slight inclination towards numbers, computer science, data, statistics,
and basically working on the most modern technology, you should definitely
look to explore what this field has to offer!
To help you with just that, BITS Pilani and UpGrad offer a PG Program in Big
Data Engineering - the first of its kind. Comprising of a comprehensive
curriculum curated by various industry experts along with Birla Institute of
Technology and Science, Pilani, this PG program will help you upskill in Big
Data.
UPGRAD 10
Big Data is a field where just theoretical knowledge isn’t enough. Keeping that at the forefront
of this program, UpGrad included five real-life projects across a number of industries.
Sponsored by Saavn, these projects will not only build an extremely solid foundation for the rest
of your career but will also make the whole learning experience utterly enjoyable. Other than
this, you also get a 24x7 cloud lab access to AWS. The instructors for this course are experts and
leaders in academia belong to BITS, Impetus, American Express, and more.
This program is aimed at everyone from people currently working in the IT industry to Big Data
enthusiasts. The curriculum is structured in such a way so as to ensure that you get started from
the scratch, and pave your way to the very top. So, no matter what your background is, if you’re
interested in building a career in Big Data and are looking for a way to become an expert in the
field, this is where your search should stop!
UPGRAD 11
G e t a P G Ce r ti fi c a ti o n i n
Big Data Engineering
FIND US HERE:
Ha v e Q u e s ti o n s?
P l e a s e f e e l f r e e t o d r o p u s a l i n e a t i n f o @ u p g ra d . c o m
and we will be there to help you.
CO P Y R I G H T @ U P G R A D E D U C AT I O N P R I VAT E L I M I T E D