Académique Documents
Professionnel Documents
Culture Documents
Data Topics
Big Data | BI / Data Science | Database | Data Architecture | Data Strategy | Data
Modeling | EIM | Governance & Quality | Smart Data
Homepage > Data Education > Enterprise Information Management > Information Management Articles
> Data Curation 101: The What, Why, and How
Risks of poor or no Data Curation include factually inaccurate information, incorrect guidelines,
and knowledge gaps. This scenario has and continues to replay. For example, out of 401
items sent for a child passenger safety , by 101 organizations, about 25 percent of the
evaluated items contained complete and accurate information. Each item could be thought of
as a data collection. Less than 1 percent of the items seemed developed for other relatives or
audiences transporting children, indicating knowledge gaps.
The resulting electronic collection and insights into the curated data, provided by individualized
institutions, continued its use long after the study ended. A collection of about 400 materials,
siloed and leading to inappropriate selection and installation of child seats may seem small,
compared to using Big Data to make inaccurate financial decisions and impacting millions of
customers. Good Data Curation is a must.
Data Curation is a means of managing data that makes it more useful for users engaging in
data discovery and analysis. Data curators collect data from diverse sources, integrating it
into repositories that are many times more valuable than the independent parts. Data Curation
includes data authentication, archiving, management, preservation retrieval, and
representation.
Social Signals: Data’s usefulness depends on human interaction. Aaron Kalb, the Head
of Product at Alation calls this social signals or behavioral interactions. Just as Amazon
presents recommendations based on what users choose, Data Curation leverages
human responses towards customized knowledge. Data Analysts install their own
methodology in interpreting and manipulating data. Data Curation provides access to this
kind of human knowledge, which can be valuable on how others do their work. As
Stephanie McReynolds, VP of marketing at Alation , says:
“The process of ideating around data and having it be an open communication around all
the aspects of data brings the entire organization up to another level of data literacy so
that we can really find useful solutions rather than get stuck in our own little silo.”
As well as reducing duplication of effort in research data creation, Data Curation enhances
the long-term value of existing data by making it available for further high-quality research.
Data Curation does the following for the Data Industry:
Making Machine Learning More Effective: Machine Learning algorithms have made
great strides towards understanding the consumer space. AI consisting of
“neural networks” collaborate, and can using Deep Learning to recognize patterns.
However, Humans need to intervene, at least initially, to direct algorithmic behavior
towards effective learning. Stephanie McReynolds, VP of marketing at Alation says
“Curations are about where the humans can actually add their knowledge to what the
machine has automated.” This results in prepping for intelligent self-service processes,
setting up organizations up for insights. Forrester research shows that insights-driven
firms are 69 percent more likely to report year-over-year revenue growth of 15 percent or
more.
Dealing with Data Swamps: A Data Lake strategy allows users to easily access raw
data, to consider multiple data attributes at once, and the flexibility to ask ambiguous
business driven questions. But Data Lakes can end up Data Swamps where finding
business value becomes like a quest to find the Holy Grail. Such Data swamps minus
well be a Data graveyard. The Geological Survey of Alabama (GSA) has first-hand
experience with this. The GSA has been reviving decades of dark (dead) data that could
provide value. As part of that effort, the GSA has undertaken Data Curation to discover
which of this data has locked-in value, even if it is old, that can be redirected to the
benefit of users. This has led to a new GSA website with customized Data Collections.
Educating Audiences: Data Curation provides intrinsic value in educating users. Take
the legal profession. “Ultimately , the goal of any attorney is to get the jury to
understand the case facts as they see them, so anything you can do to educate the jury
to the forensics is extremely helpful,” says Jason Fries, CEO of 3D-Forensic . Through
using the curated information provided by 3D-Forensic the jury learns how forensics
created the analysis and have explanations of expert’s opinions involved in the case.
Ensuring Data Quality: Data Curators clean and undertake actions to ensure the long
undertake actions to ensure the long-term preservation and retention of the authoritative
nature of digital objects.
“Through the curation process, data are organized, described, cleaned, enhanced, and
preserved for use, much like the work done on paintings or rare books to make the works
accessible now and in the future,” according to ICPSR .
The value of these Data Curation activities and its resulting attention to quality improve Data
Research and Management. For example, Data Curation tasks pertaining to Biodiversity
have led to a framework to assess data’s fitness for use and increased data value. As a result,
two Global Biodiversity Information Facility (GBIF) task groups have more useful data on
Species Distribution Modeling and Agro-biodiversity for collaboration.
Speeding Innovation: Organizations are looking to identify ways they can manage
data most effectively, while establishing the collaborative ecosystem to enable this
efficiency. Data Curation enhances collaboration by opening and socializing how data is
used. This results in innovation, as mentioned by Harvard Business Review . This article
describes how the head of the U.S. Army’s Rapid Equipping Force built a curation
process, including an internal and external collaboration, to help technology solutions be
deployed rapidly. In this case, Data Curation helped the U. S. Army identify who the
customers for possible solutions would be, who the internal stakeholders would be, and
even what initial minimum viable products might look like.
Shacklett notes “ Data Curation is just now starting to enter corporate vocabulary because of
Big Data and the need to aggregate data from diverse sources to form a unique picture of a
business situation.” Why now? Industry prognosticators and companies are beginning to think
about their data as a corporate asset. Companies are beginning to understand that they can’t
just continue to blindly “store up” the vast piles of data streaming into them without developing
a way to value this data and to determine which data has present or potential value, and which
will always virtually remain useless. Data Curation provides organizations the means to get
useful data by leveraging expertise and knowledge of its own data assets.
However, Data Curation requires a huge investment, as Dianne Esbar , associate partner and
brand leader at Digital McKinsey in San Francisco. It requires companies to find the right
people to curate data and give them the right tools. This presents a challenge to many
companies. “Either they overinvest in tools that don’t work with each other or don’t give them
what they need, or they have an army of people who in ten years’ time won’t be as valuable.”
Towards establishing successful Data Curation, Kathy Rondon cleverly laid out the fact that
Data Curation is about “contextual Metadata,” and presented four primary requirements of
setting up a successful Data Curation program, at the DATAVERSITY®
Enterprise Data World 2017 Conference in Atlanta, Georgia. By staying educated and
informed on Data Curation best practices, including data reviews with end users, companies
can reap its benefits.
LOG IN WITH
OR SIGN UP WITH DISQUS ?
Name
ALSO ON DATAVERSITY
What is Data Value and Should it be Viewed The Future History of Time in Data Models
as a Corporate Asset? 1 comment • 5 days ago
2 comments • 5 months ago Frank@ — Mr. Frisendal,Thank you for your
Asha Saxena — Thanks Nicholas Avatarexcellent and informative article.About the 5
Avatar Dr. Codd's Rules, in fact, he enunciated 12
✉ Subscribe d Add Disqus to your siteAdd DisqusAdd 🔒 Disqus' Privacy PolicyPrivacy PolicyPrivacy
DATAVERSITY Community
1:20
Advertisement
Data Architecture
Summit
Chicago, IL
Learn More
DG Vision
Washington, D.C.
December 9-12, 2019
Learn More
Follow Us