Vous êtes sur la page 1sur 9

Home Conferences Online Training More

Data Topics
Big Data | BI / Data Science | Database | Data Architecture | Data Strategy | Data
Modeling | EIM | Governance & Quality | Smart Data

Homepage > Data Education > Enterprise Information Management > Information Management Articles
> Data Curation 101: The What, Why, and How

Data Curation 101: The What, Why, and How


By Michelle Knight on November 30, 2017

Humans have an imperative to practice


Data Curation . People have and continue to
gather, maintain, and archive data at ever
greater volumes, and they always have. They
drive to get useful data for today and tomorrow.

As Mike Schmoker elegantly states, “Things get


done only if the data we gather can inform and
inspire those in a position to make difference.”
But, organizations struggle with getting things
done and operationalizing Big Data well. Especially where 41 percent, of 150 executives at
large companies, said that their data was too siloed. Without access to good Data Curation,
business effectiveness decreases.

Risks of poor or no Data Curation include factually inaccurate information, incorrect guidelines,
and knowledge gaps. This scenario has and continues to replay. For example, out of 401
items sent for a child passenger safety , by 101 organizations, about 25 percent of the
evaluated items contained complete and accurate information. Each item could be thought of
as a data collection. Less than 1 percent of the items seemed developed for other relatives or
audiences transporting children, indicating knowledge gaps.
The resulting electronic collection and insights into the curated data, provided by individualized
institutions, continued its use long after the study ended. A collection of about 400 materials,
siloed and leading to inappropriate selection and installation of child seats may seem small,
compared to using Big Data to make inaccurate financial decisions and impacting millions of
customers. Good Data Curation is a must.

What is Data Curation?

Data Curation is a means of managing data that makes it more useful for users engaging in
data discovery and analysis. Data curators collect data from diverse sources, integrating it
into repositories that are many times more valuable than the independent parts. Data Curation
includes data authentication, archiving, management, preservation retrieval, and
representation.

Characteristics of Data Curation include:

Social Signals: Data’s usefulness depends on human interaction. Aaron Kalb, the Head
of Product at Alation calls this social signals or behavioral interactions. Just as Amazon
presents recommendations based on what users choose, Data Curation leverages
human responses towards customized knowledge. Data Analysts install their own
methodology in interpreting and manipulating data. Data Curation provides access to this
kind of human knowledge, which can be valuable on how others do their work. As
Stephanie McReynolds, VP of marketing at Alation , says:

“The process of ideating around data and having it be an open communication around all
the aspects of data brings the entire organization up to another level of data literacy so
that we can really find useful solutions rather than get stuck in our own little silo.”

Active Management throughout the Data Lifecycle:


The University of Illinois’ Graduate School of Library and Information Science defines
Data Curation as “the active and ongoing management of data through its life cycle of
interest and usefulness.” This lifecycle comprises steps of conceptualizing, creating,
accessing, using, appraising, selecting, disposing, ingesting, reappraising, storing,
reusing, and transforming Data. During this process , data might be annotated, tagged,
presented, and published for various purposes. Data Curation means active
management of data reducing threats to their long-term value and mitigating digital
obsolescence.
Complimentary Work with Data Governance: Data Curation compliments Data
Governance, but does not replace it. According to
DAMA International Data Management Book of Knowledge , “Data Governance is
defined as the exercise of authority and control (planning, monitoring and enforcement of
data assets.” Implement a Data Governance program results in policies on how to
handle data. Data Curation may make use of a Data Governance when customizing
information. However, Data Curation produces customized business data, like a modern
corporate library. The resulting Data Collections allow for more relevant information that
is easier to search, not just a set of policies.

What is Data Curation Doing for the Data Industry?

As well as reducing duplication of effort in research data creation, Data Curation enhances
the long-term value of existing data by making it available for further high-quality research.
Data Curation does the following for the Data Industry:

Making Machine Learning More Effective: Machine Learning algorithms have made
great strides towards understanding the consumer space. AI consisting of
“neural networks” collaborate, and can using Deep Learning to recognize patterns.
However, Humans need to intervene, at least initially, to direct algorithmic behavior
towards effective learning. Stephanie McReynolds, VP of marketing at Alation says
“Curations are about where the humans can actually add their knowledge to what the
machine has automated.” This results in prepping for intelligent self-service processes,
setting up organizations up for insights. Forrester research shows that insights-driven
firms are 69 percent more likely to report year-over-year revenue growth of 15 percent or
more.
Dealing with Data Swamps: A Data Lake strategy allows users to easily access raw
data, to consider multiple data attributes at once, and the flexibility to ask ambiguous
business driven questions. But Data Lakes can end up Data Swamps where finding
business value becomes like a quest to find the Holy Grail. Such Data swamps minus
well be a Data graveyard. The Geological Survey of Alabama (GSA) has first-hand
experience with this. The GSA has been reviving decades of dark (dead) data that could
provide value. As part of that effort, the GSA has undertaken Data Curation to discover
which of this data has locked-in value, even if it is old, that can be redirected to the
benefit of users. This has led to a new GSA website with customized Data Collections.
Educating Audiences: Data Curation provides intrinsic value in educating users. Take
the legal profession. “Ultimately , the goal of any attorney is to get the jury to
understand the case facts as they see them, so anything you can do to educate the jury
to the forensics is extremely helpful,” says Jason Fries, CEO of 3D-Forensic . Through
using the curated information provided by 3D-Forensic the jury learns how forensics
created the analysis and have explanations of expert’s opinions involved in the case.
Ensuring Data Quality: Data Curators clean and undertake actions to ensure the long
undertake actions to ensure the long-term preservation and retention of the authoritative
nature of digital objects.

“Through the curation process, data are organized, described, cleaned, enhanced, and
preserved for use, much like the work done on paintings or rare books to make the works
accessible now and in the future,” according to ICPSR .

The value of these Data Curation activities and its resulting attention to quality improve Data
Research and Management. For example, Data Curation tasks pertaining to Biodiversity
have led to a framework to assess data’s fitness for use and increased data value. As a result,
two Global Biodiversity Information Facility (GBIF) task groups have more useful data on
Species Distribution Modeling and Agro-biodiversity for collaboration.

Speeding Innovation: Organizations are looking to identify ways they can manage
data most effectively, while establishing the collaborative ecosystem to enable this
efficiency. Data Curation enhances collaboration by opening and socializing how data is
used. This results in innovation, as mentioned by Harvard Business Review . This article
describes how the head of the U.S. Army’s Rapid Equipping Force built a curation
process, including an internal and external collaboration, to help technology solutions be
deployed rapidly. In this case, Data Curation helped the U. S. Army identify who the
customers for possible solutions would be, who the internal stakeholders would be, and
even what initial minimum viable products might look like.

Data Curation: Advantages and Challenges

Shacklett notes “ Data Curation is just now starting to enter corporate vocabulary because of
Big Data and the need to aggregate data from diverse sources to form a unique picture of a
business situation.” Why now? Industry prognosticators and companies are beginning to think
about their data as a corporate asset. Companies are beginning to understand that they can’t
just continue to blindly “store up” the vast piles of data streaming into them without developing
a way to value this data and to determine which data has present or potential value, and which
will always virtually remain useless. Data Curation provides organizations the means to get
useful data by leveraging expertise and knowledge of its own data assets.

However, Data Curation requires a huge investment, as Dianne Esbar , associate partner and
brand leader at Digital McKinsey in San Francisco. It requires companies to find the right
people to curate data and give them the right tools. This presents a challenge to many
companies. “Either they overinvest in tools that don’t work with each other or don’t give them
what they need, or they have an army of people who in ten years’ time won’t be as valuable.”

Towards establishing successful Data Curation, Kathy Rondon cleverly laid out the fact that
Data Curation is about “contextual Metadata,” and presented four primary requirements of
setting up a successful Data Curation program, at the DATAVERSITY®
Enterprise Data World 2017 Conference in Atlanta, Georgia. By staying educated and
informed on Data Curation best practices, including data reviews with end users, companies
can reap its benefits.

Photo Credit: Casezy idea/Shutterstock.com


2 Comments DATAVERSITY 
1 Login

 Recommend t Tweet f Share Sort by Best

Join the discussion…

LOG IN WITH
OR SIGN UP WITH DISQUS ?

Name

Serge Gelalian • 2 months ago


Personally, I think that Data Curation is another term for Knowledge Management. It's a
prettified expression at the age of big data and AI.
△ ▽ • Reply • Share ›

Pat Hennel • 2 years ago


It’s true that there are both advantages and disadvantages to data curation. Although it can
take a considerable investment, it is also beneficial because it comes from a need to combine
data from a variety of sources. Curation should at least be considered.
△ ▽ • Reply • Share ›

ALSO ON DATAVERSITY

How to Become a Data-Driven Enterprise A Brief History of Metadata


2 comments • 5 months ago 2 comments • 6 months ago
Robert Harrison — Electronic Corp of CYBER WebberJ — That's a good read. Those people
Avatardiscovery .I'm definitely interested. Avatarwho started the metadata idea back in 70's
were good futuristic for the good use of

What is Data Value and Should it be Viewed The Future History of Time in Data Models
as a Corporate Asset? 1 comment • 5 days ago
2 comments • 5 months ago Frank@ — Mr. Frisendal,Thank you for your
Asha Saxena — Thanks Nicholas Avatarexcellent and informative article.About the 5
Avatar Dr. Codd's Rules, in fact, he enunciated 12

✉ Subscribe d Add Disqus to your siteAdd DisqusAdd 🔒 Disqus' Privacy PolicyPrivacy PolicyPrivacy

Search the Site


Search … Search

DATAVERSITY Community

1:20

Advertisement

MORE FROM DATAVERSITY

Data Architecture

Summit
Chicago, IL

October 14-17, 2019

Learn More

DG Vision

Washington, D.C.
December 9-12, 2019

Learn More
Follow Us

DATAVERSITY.net TDAN.com DMRADIO.biz

Conferences Online DATAVERSITY


Enterprise Data World Conferences Resources
Data Architecture Summit Enterprise Data DATAVERSITY Community

Governance Online White Papers


DG Vision
Data Architecture Online What is…?

Enterprise Analytics Online Concept and Object

Modeling Notation (COMN)

Company Newsletters DATAVERSITY


Information DATAVERSITY Weekly Education
About Us DATAVERSITY Community Data Conferences

Advertise With Us Weekly Trade Journal

Contact Us TDAN.com Online Training

Press Room DM Radio Weekly Upcoming Live Webinars


© 2011 – 2019 DATAVERSITY Education, LLC | All Rights Reserved. Cookies Settings
Privacy Policy Terms of Service

Vous aimerez peut-être aussi