Académique Documents
Professionnel Documents
Culture Documents
While dealing with the data analytics projects, there are some fixed tasks that should be
followed to get the expected output. So here we are going to build a data analytics project
cycle, which will be a set of standard data-driven processes to lead data to insights effectively.
The defined data analytics processes of a project life cycle should be followed by sequences
for effectively achieving the goal using input datasets. This data analytics process may include
identifying the data analytics problems, designing, and collecting datasets, data analytics, and
data visualization.
The data analytics project life cycle stages are seen in the following diagram:
Let's get some perspective on these stages for performing data analytics.
Today, business analytics trends change by performing data analytics over web datasets for
growing business. Since their data size is increasing gradually day by day, their analytical
With the help of web analytics, we can solve the business analytics...
5th chapter –content-topic
In this section, we have included three practical data analytics problems with various stages of
data-driven activity with R and Hadoop technologies. These data analytics problem definitions
are designed such that readers can understand how Big Data analytics can be done with the
Predicting the sale price of a blue book for bulldozers (case study)
which may categorizedpopularity wise as high, medium, or low (regular), based on the visit
count of the pages. While designing the data requirement stage of the data analytics life cycle,
we will see how to collect these types of data from Google Analytics.
This is one attempt to describe the Data Life Cycle. It takes the
position that a life cycle consists of phases, and each phase has its
own characteristics. Einstein, when he was a teenager tried to think
what it would be like to ride a beam of light. There is no chance that
we can emulate Einstein, but perhaps we can put his idea to use.
What would happen if we could ride on a piece of data as it moved
through the enterprise? What new experiences would the piece of
data have? What phases would it pass though?
1. Data Capture
The first experience that an item of data must have is to pass within
the firewalls of the enterprise. This is Data Capture, which can be
defined as:
The act of creating data values that do not yet exist and have
never existed within the enterprise.
There are three main ways that data can be captured, and these
are very important:
There may well be other ways, but the three identified above have
significant Data Governance challenges. For instance, Data
Acquisition often involves contracts that govern how the enterprise
is allowed to use the data it obtains in this way.
2. Data Maintenance
3. Data Synthesis
4. Data Usage
So far we have seen how our in single data value has entered the
enterprise via Data Capture, and has been moved around the
enterprise, perhaps being transformed and enriched in Data
Maintenance, and possibly being an input to Data Synthesis. Next,
it reaches a point where it is used in support of the enterprise. This
is Data Usage, which can be defined as:
5. Data Publication
In being used, it is possible that our single data value may be sent
outside of the enterprise. This is Data Publication, which can be
defined as:
6. Data Archival
Our single data value may experience many rounds of usage and
publication, but eventually the end of its life begins to loom
large. The first part of this is to archive the data value. Data
Archival is:
7. Data Purging
Critique
The terms we have used may be disputed. “Life Cycle” is not really
accurate because data does not reproduce or recycle itself, which
happens in real life cycles. “Data Life History” might be closer to the
truth, but is not a familiar term. “Life History” is used to describe the
phases of growth in an organism like a butterfly, but again data is
different. Therefore, “Data Life Cycle” might as well be used.
Finally, data does not have to pass through all phases. Early
mainframe systems had nothing more than Data Capture and Data
Usage. Today, the full Data Life Cycle is more common.
What is important is that we define the Data Life Cycle because
each phase has distinct Data Governance Needs. Greater clarity
about the Data Life Cycle will help the mission of Data Governance.
Remember: Managing data in a research project is a process that runs throughout the
project. Good data management is one of the foundations for reproducible research.
Good management is essential to ensure that data can be preserved and remain
accessible in the long-term, so it can be re-used and understood by future researchers.
Begin thinking about how you’ll manage your data before you start collecting it.
http://data.library.virginia.edu/data-management/lifecycle/
Other Life Cycles
Data Curation Centre: Curation Lifecycle Model
DataONE Best Practices Through the Data Life Cycle
DDI Alliance: Research Life Cycle
Life Cycle of Research, by Charles Humphrey
ICPSR Data Life Cycle
Research Lifecycle at UCF
https://www.capgemini.com/service/digital-services/insights-data/insights-data-
strategy/insights-data-architecture/
How do you make the big promise of the new data landscape a reality – and ensure
your strategy can be executed? Answer: Make it an architected journey. Our
pragmatic approach to Insights & Data architecture provides you with a solid, yet
agile foundation for change, renewal and innovation. So that your solutions are
designed for digital right from the start.
Create real, sustainable value from the new data landscape
With the emergence of the new data landscape, the change potential for
organizations is bigger than ever before. There are no filters on what data can be
acquired and stored, no restrictions on what can be analyzed, and no waiting time
for presenting real-time, tailor-made and highly actionable insights. However,
advances in technology are moving at lightning speed and it seems more difficult
than ever to select the right technology components, while preserving the crucial
assets of the existing data estate.
How do you reap the benefits of the new data landscape right now, while being
prepared for new insights and data opportunities that may be unknown today?
How do you create an architecture for change that is fresh, pragmatic and does
justice to the new ways of thinking of the digital enterprise? This asks for a new
approach to Insights & Data architecture, one that enables new, unexplored
opportunities for the insights-driven enterprise.
But above all, architecture is a tool to bring business and technology together. The change objectives
of the organizations always have the central role and our architecture visualizations are compelling
and understandable for all stakeholders. We believe architecture should tell stories and bring
simplicity, rather than introducing piles of documentation and layers of additional complexity.
Focus on business outcomes: We turn your digital vision into an Insights & Data
architecture that is geared towards one and one thing only: turning data into tangible,
measurable business benefits
Pragmatic and compelling: We produce architectural assets that are exactly to the point,
precisely what are needed for the change, and convincing in their visualization
Leveraging open standards: Capgemini is a leader in both using and developing
architectural open standards. We have the world’s largest community of certified TOGAF
architects, have donated major contributions to TOGAF 9, and are actively involved in the
Open Platform 3.0 forum of The Open Group, particularly focusing on open standards for big
data and the Business Data Lake
Accelerated: We use tools such as our Accelerated Solution Environments (ASEs),
TechnoVision innovation framework, the WARP industrialized assessment approach, and our
powerful big data architecture reference models to have a flying start and deliver quick
results
Full lifecycle, full landscape: We take all aspects of the Insights & Data lifecycle into the
scope of architecture, all the way from acquiring and marshaling data, to analytics,
visualization and action. But also all the way from infrastructure to business services
Delivers step-by-step: With an architected digital platform vision at the heart of your
Insights & Data strategy, you are not only equipped to deliver quick, compelling results right
now but also to address any future business opportunities.
https://blogs.sas.com/content/hiddeninsights/2013/10/11/how-well-are-you-managing-the-
analytical-life-cycle/
Keep track of model versioning. Analysts don’t just develop one model to solve a
business problem. They develop a set of competing models and use different
techniques to address complex problems. They will have models at various stages of
development and models tailored for different product lines and business units. As a
result, your organization can quickly find itself managing thousands of models.
Do structured and rapid deployment of new models and data. The model and data
environment is anything but static. Models will be continually re-deployed as they are
tested and as new results and data sources become available. As such, the model
deployment process is a much more iterative process than the traditional IT process
of building applications.
Embed models into decision processes and decisions into model development. Models and
data are not of any use if they only feed reports and dashboards. The results of
analytics should guide business decisions, and the results of those business decisions
should be fed back into models and model development. In a distributed and loosely
managed modeling environment, this is hard to achieve. When different data sets and
variables are used to create the models and there is little validation or back testing
results become inconsistent.
As a consequence of the above, managers must make decisions based on the
model results they receive, and everyone hopes for the best. To solve these and
similar challenges in a systematic fashion, you will need to establish proper
governance of the analytics life cycle, i.e. the processes and technology support to
ensure that your organisation's operational use of analytical models sustains and
extends strategies and objectives.
In practice, you should consider establishing:
Sources: