How Big Data Can Be Used in The Public Sector

2017-8
∣Case Report∣
How Big Data Can Be Used
In the Public Sector
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
is published by the National

Information Society Agency (NIA) for the purpose of identifying
the social issues that will emerge in the intelligent information
society and presenting data-driven policy directions.
∙ The publication of any part of this report must be accompanied by an express indication that
this report was produced under the Broadcasting and Communication Development Fund
Project of the Ministry of Science and ICT.
∙ Any unauthorized reproduction of this report is prohibited. If it is reprocessed or cited, the
source must be indicated as the National Information Society Agency (NIA).
∙ The content of this report does not necessarily represent the official views of the NIA.
∙ This report is available on the NIA website (eng.nia.or.kr).
NEAR & Future INSIGHT 2017-8
∣Case Report∣
How Big Data Can Be Used
In the Public Sector
Written by Song Gyeong-bin, Executive Principal
Future Strategy Center, Department of ICT Policy,
National Information Society Agency (NIA)
Editorial Kwon Eui-jeong, Associate Research Fellow

Healthcare Data Convergence Department, Health Insurance
Board Review & Assessment Service
Kwon Ki-young, Producer
Traffic Broadcasting Network Ulsan, Road Traffic Authority
Son Ki-jun, Institute Director of Research
Data Science Research, The IMC
Kim Seung-hyeon, Manager
Future Strategy Center, Department of ICT Policy,
Koo Hyun-mo, Deputy Manager
Small and Medium Enterprise Support HQ, Korea
Trade-Investment Promotion Agency
An Sung-hee, Vice-Manager of
Big Data Center Consulting Department, Shinhan Card
This report is an English translation of the report

‘공공부문 빅데이터 활용 우수사례: 빅데이터, 이렇게 쓸 수 있다’
(NEAR&Future INSIGHT Vol.1)
Case 1. Preventing Epidemics with Prescription Data 05
Case 2. Forecasting Car Accidents 21
Case 3. Facilitating International Trade 40
Case 4. Understanding the Business Cycle 57

with Credit Card Transaction Data
How Big Data
Can Be Used
In the Public
Sector
Case 1
Preventing Epidemics with

Prescription Data
Project title Development of the Early Patient Detection System

Period June to November 2016
Description Establishing a real-time monitoring system, based on drug utilization
review (DUR) data, for early detection of epidemics
Data used 5.1 billion prescriptions (since 2010)
and 2.9 trillion diagnostic records
Developers HIRA (Health Insurance Review & Assessment Service),
DB Discover, and Open Mate
※ Conducted as part of the Data-Based Future Strategy Policy Support
Program of the NIA’s Future Strategy Center.
1. Background
No one is safe in the age of global epidemics
Active international exchange and global transportation have increased the

prevalence of epidemics to an unprecedented level. Once they break out,
epidemics can no longer be confined within national borders and quickly
spread to affect millions.
2002 2009 2014 2015 2016

186 diagnosed, 38 dead in Korea
• KRW 178.1 billion paid out in
compensation
SARS H1N1 influenza Ebola MERS Zika virus • Estimated to have caused KRW
20 trillion in society-wide losses
in just three months.
The Middle Eastern Respiratory Syndrome (MERS), which broke out and
acutely affected Korea in 2015, led to the quarantine of 16,693 persons
suspected of infection in total: 186 were diagnosed and 38 ended up dying.
The epidemic also caused financial losses of KRW 6.3 trillion in Korea at
the time.
Environmental changes, including particulate matter and heat waves caused

by climate change, are also increasing the risk of new epidemics. A rise
of one degree Celsius in the average temperature raises the likelihood of
epidemics by 4.27 percent (KIHASA). The unprecedented heat wave that
swept across Korea in 2016, the first one in 22 years, radically increased
6
the number of people with heat stroke, and even resurrected cholera, which
was believed to have been all but extinguished for 14 years.
Increasing concern over the safe handling of drugs and treatments
In the absence of a nationwide monitoring system on medical treatments

not covered by the National Health Insurance (NHI) and patient hygiene and
safety, the reuse of injection needles and unauthorized platelet-rich plasma
(PRP) treatment led to the infection of 97 persons and over 200 patients,
respectively, with hepatitis-C in 2016.
Demand is growing for a timely and accurate disease monitoring system
With the increasing frequency of epidemics and public health crises, there
is growing demand for a more effective system for monitoring and
countering the spread of disease. Korea now needs a proactive system
capable of monitoring and predicting disease in a timely and accurate
manner. The Project for the Development of an Early Patient Detection
System (DEPDS) was launched with a view to developing such a monitoring
and prevention system.
7
2. What Was Produced
A system capable of monitoring epidemics in real time basis
In order to deal effectively with the outbreak and spread of epidemics, it

is critical to monitor the situation, paths of spread, and damage associated
with each given epidemic in real time, and also to prevent spread to regions
or populations likely to be targeted next. The DEPDS Project thus sought
to create a system capable of monitoring and preventing the paths of
epidemics based on real-time analysis of the massive amounts of public
health data provided by the Health Insurance Review and Assessment
Agency (HIRA) and other related organizations.
Guessing epidemics from prescription drug data
Promptness is the first and foremost virtue of such an epidemic monitoring

system. HIRA collects a wide range of data on prescriptions, diagnoses,
and other such medical activities from hospitals and clinics nationwide with
no delay, and is the nationally-designated institution to manage a drug
utilization review (DUR) system. The DUR system in Korea, the only one
of its kind in the world, lets doctors transmit patient and prescription data
to the HIRA and access a patient’s past prescription and data on drug safety
in less than 0.5 seconds. While most public health data is gathered and
compiled after patients are diagnosed and issued prescriptions, the DUR
system is the only public health database that gathers data on a real-time basis.
8
DUR
Prescription issued
Changes and details

Hospital of prescriptions
checked
Pharmacy
HIRA
Prevents patient Provides real-time Provides a system for
exposure to safety information proactive monitoring of Database on
inappropriate drugs upon prescription inappropriate drugs patients’ past
treatment records
The data provided by the DUR system, however, lacks the information
diagnostic codes. Prescription data contain only information on provisional
diagnoses. In order to use the DUR data in epidemic monitoring, it is
therefore important to make judgment calls on what diagnosable disease
each prescription was targeting. The DEPDS Project thus involved analyzing
past billing data and creating a table of possible diagnoses indicated by
prescription data. The project developers then applied the patterns of
prescriptions with confirmed diagnoses to the real-time data gathered by
the DUR system to identify currently prevailing disease.
A GIS solution-based spatial visualization service
Information on the real-time status of infectious diseases was then mapped

by region using a geographic information system (GIS) so as to allow for
visible monitoring. The GIS also allowed for intensive monitoring of
suspected regions, tracing of possible origins, and prediction of possible
paths of spread based on information on patient locations.
9
Real-Time Epidemic Monitoring System
Outcome: The system traces the locations of hospitals used by patients in a given region
to support analysis of their possible movements.
The system provides monitoring

results in real time to the Korea
Centers for Disease Control
(KCDC) for early action against
possible epidemics.
10
Due to the time and financial constraints, the DEPDS Project system was
first developed as a pilot instrument for tracing and monitoring influenza,
foot-and-mouth disease, and scrub typhus only. The system, however, can
be expanded to prevent and monitor all legal communicable diseases in the
future. It is also suited to the early detection of anomalies in certain regions,
such as the infection of multiple people with hepatitis C, thereby allowing
for early intervention.
11
3. Data Analysis
Probabilistic analysis based on historical data
Based on the information we now have, we can guess what is to come.

Past behavior is often the most important predictor of what future behavior
will be. By identifying and analyzing the patterns of past data concerning
certain phenomena, we can guess what we have not yet experienced with
incredible accuracy.
The same probabilistic process was used in the DEPDS Project. Information
on diagnoses rendered in the past was used to determine the diseases that
specific combinations of prescribed drugs target. The diseases so identified
were then applied to prescription data without fixed diagnoses.
12
Data Analysis for Monitoring of Epidemics
[Step 1] Determining target diseases: Prioritize data to be monitored
[Step 2] Analyzing prescription details: Analyze billing data on given

target diseases over the past three to five years
[Step 3] Identifying prescription patterns: Determine sets of drugs

commonly prescribed together
[Step 4] Validating prescription patterns: Reverse-verify diseases that

have been identified based on prescription patterns
[Step 5] Applying to DUR system: Detect suspected epidemics based

on prescription patterns noted
The first disease targeted in developing this new system was influenza,
which occurs quite frequently and also spreads quickly in Korea. In order
to identify the patterns of influenza-related prescriptions, data was sorted
from records on the diagnosis of influenza only. In the case of influenza
alone, hundreds of thousands of different drug combinations were
prescribed. However, some of these prescriptions were also used to treat
other similar diseases. It was thus necessary to verify that the prescription
patterns identified targeted influenza exclusively. The combinations of drugs
targeting influenza were then matched with the DUR data to find all
corresponding patterns so the resulting system could monitor influenza
outbreaks in real time. The same process was used with respect to
foot-and-mouth disease and scrub typhus.
13
Regional Prescription Patterns for Influenza
Seoul
Busan
An analysis process that was more complex than it looked
The developers of the DEPDS Project launched it with the initial goal of
identifying and monitoring at least 10 of the major epidemic diseases in
Korea, but had to end the project after designing it to target only three
illnesses. The quantity of data the project was given was far greater than
any of the developers had imagined. The analysis process itself therefore
took up the bulk of time. It took more than half of the project’s schedule
to identify prescription patterns concerning one single disease. Moreover,
the statistics analysts in charge of the project had frequently to seek out
the advice of medical experts while analyzing and interpreting medical
data.
14
There were initial plans to apply machine learning to the project so that
prescription patterns could be identified without human intervention at every
turn. In order for the available machinery to identify patterns on its own,
however, the given data had to converge together at some points. The actual
data used, on the other hand, confirmed that doctors’ prescriptions were
far more diverse and inconsistent than expected. Machine learning was
finally abandoned. Nevertheless, given the emergence of new drugs and
changing trends in diagnoses and treatments, it will be necessary to
automate the pattern identification task at some point in the future.
Additional research will be needed to achieve that automation.
Data Used in Analysis
Type Information Provided Database Name Capacity
Diagnosis records • Billing details Billing Details 47 TB

• Main diagnoses and volume of (584 billion records)
diagnosis per illness or injury
Review DW 80 TB
(1.4688 trillion records)
Illness Statistics 478 MB

(1.86 million records)
Drug information • Information on drug safety DUR 6.5 TB

(contraindications, age restrictions, (11.6 billion records)
etc.)
Medical resources • Information on medical institutions Recovery 0.1 GB

(number of beds, intensive care Institutions (80,000 records)
units, operating rooms, etc.)
• Personnel information (number and
qualifications of doctors, nurses,
medical technicians, etc.)
15
4. Expected Outcomes
Prompt response to epidemics, almost a week earlier than

existing possible
The number of persons suspected of having influenza estimated by the

DEPDS Project, based on past data analysis and pattern matching, and the
actual number of persons with influenza identified by the Korea Centers
for Disease Control (KCDC), are compared below.
Comparison with KCDC Monitoring Results
16
Notwithstanding the slight lag, the graphs closely overlap each other. Note
that the DEPDS Project’s information on suspected cases predate the actual
confirmed cases of KCDC by one week or even two. This is because KCDC
relies on post-diagnostic information reported by doctors themselves, while
the DEPDS Project estimates the likely number of patients based on
analysis of real-time data. Because many doctors do not confirm diagnosis
until their second or third examination of a person, KCDC data is bound
to lag a week or two. The system developed by the DEPDS Project
therefore can identify and enable response to epidemics at least one week
before KCDC can.
Early prevention of epidemics necessary to safeguard public health
KCDC intends to use the system developed by this project as part of its
Integrated Epidemic Control System. The majority of infectious diseases
whose spread and prevalence cannot be easily predicted require that the
government secure and distribute sufficient quantities of vaccine in a timely
manner. The fact that this project’s system can identify and respond to
epidemic outbreaks at least a week earlier than the system in use at present
will significantly help the Korean government secure sufficient supplies of
needed vaccines and treatments.
The value of this project has already been well recognized, and beginning
in 2017, the system will be applied to a significantly expanded range of
infectious and other types of diseases that threaten public health.
17
5. Future Research Are
PIPA allows only probabilistic analysis
Korea’s Personal Information Protection Act (PIPA) presents a major obstacle

to the sharing and use of big data. Medical records, such as NHI billing
data, diagnosis records, and prescription details, are particularly difficult to
access as they contain information of a personal and sensitive nature.
Recent amendment of the Act has made it possible to trace the medical
records of patients who have been diagnosed with epidemic diseases. Those
who are only suspected of having contracted such diseases, however, are
still out of reach. The PIPA regime thus makes it impossible to ensure the
precise prevention of epidemics.
Accordingly, this DEPDS Project was confined to monitoring the frequency

of prescriptions issued for such persons and probabilistically estimating the
spread of those diseases around those persons’ locations. This project did
not and could not monitor or track the actual movements of such persons.
Many a good policy and business initiative that could significantly serve the
public interest has been thwarted by the taboo against violating privacy,
backed up by the PIPA. The HIRA did develop a mobile application that would
have allowed patients themselves to check their medical records on their
smart devices at any time, but ultimately, the application could not be
launched because of the PIPA, which, in fact, was one of the major factors
behind the Korean government’s inability to contain the spread of MERS.
However, the public’s interest in preventing health crises and in protecting
their personal information should be weighed carefully on a case-by-case
basis. It is important to reform the Korean law and policy system so as
18
to ensure rational and flexible responses to major health crises before
banning any sharing of information whatsoever.
Success of the pilot project expected to expedite the system’s expansion
It is better for big data projects to start small and have short-run schedules
in mind rather than as full-scale projects. Keeping it small is crucial to
verifying their potential. The experiences of these small successes are
needed to form the basis for the effective use of big data.
This project was run as a small-scale, short-run pilot project targeting only
a few disease. The success of the project, nonetheless, was enough to
affirm the potential of a full-scale project for success. This project is also
significant in that it established a rational, stage-by-stage model through
which big data projects should be undertaken.
Medical
Real-time disease prediction
records on service
entire
population Public signal service
• *Provides basic information
for conducting pathological
DUR data
research on suspected
organizations and regions
• *Provides early alarms in
the event of outbreaks
Meteorological
and spread Government
information • *Supports policy measures (Ministry of
on climate to prevent epidemics, Health/KCDC/
change including a priori Ministry of
quarantining of regions Food & Drug
likely to develop diseases Safety, etc.)
Resident due to environmental
registration factors
data
19
Laying the cornerstone for data-based epidemic prevention systems
MERS was a public health disaster in Korea, with a fatality rate of 20.4
percent, mainly because the Korean government was unable to intervene
in the early stages of the outbreak due to the inaccessibility of information.
It is critical to establish an effective system of collaboration between
departments and a nationwide system to counter epidemics in their early
phases promptly and effectively. The public health authorities in Korea at
first could not trace the movements of persons diagnosed with MERS, and
had to rely on private telecommunication service providers for information.
Prompt responses are of the utmost importance in dealing with

unprecedented, rapidly spreading, and/or highly contagious disease.
Data-based national epidemic prevention systems befit this day and age
of intelligent information and will provide powerful bulwarks against such
epidemics. The DEPSD Project is significant as the first step toward
establishing a thoroughgoing nationwide and public epidemic prevention
system. Through effective collaboration with various public agencies over
securing and analyzing data, we can develop a far more sophisticated
national prevention system than we have now. Active cooperation from all
involved agencies and organizations is crucial to keeping Korea safe and
healthy.
20
How Big Data
Can Be Used
In the Public
Sector
Case 2
Forecasting Car Accidents
Project title Development of a Data-Based Car Accident Forecasting System

Description ․ Designing a system capable of assessing the risk of car accidents
by location and hour based on analysis of accident reports and
records and meteorological information
․ Providing location-based forecasting broadcasts on traffic channels
in Daegu and Busan
Data used ․ Data on reported accidents (approx. 320,000 between 2010 and
2016) from the Traffic Broadcasting Network (TBN) of the Road
Traffic Authority (RTA);
․ Data on car accidents (approx. 80,000 between 2010 and 2015)
and traffic warnings (approx. 260,000 between 2010 and 2015)
from the RTA;
․ Meteorological data (approx. 2.86 million between 2010 and 2015)
from the National Meteorological Administration (NMA).
Developers The IMC and TBN Daegu of the RTA
Program of the NIA’s Future Strategy Center.
1. Background
The serious socioeconomic cost of car accidents
According to statistics from the National Police Agency (NPA), over 230,000
car accidents take place in Korea every year. The Korea Transport Institute
(KOTI) estimates that these accidents cost Korean society KRW 42 trillion
a year.
Approx. KRW 41.8415 trillion

Road
Number of accidents: 1,119,280 Fatalities: 5,092
Railroad Approx. KRW 52.6 billion

Number of accidents: 152 Fatalities: 37
Shipping Approx. KRW 143 billion

Number of accidents: 1,093 Fatalities: 101
Approx. KRW 277 billion
Airline
Number of accidents: 13 Fatalities: 10
Source: KOTI (2016), Monthly Transport (May)
Car accidents cause both massive losses at the individual level, and serious
socioeconomic costs on the entire nation. Multiple organizations have
created and expanded transportation safety infrastructure to prevent car
accidents. Nevertheless, the number of car accidents has been growing
steadily by three to four percentage points every year, calling for innovative
and radical solutions.
22
Why should car accidents get reported only after the fact?
The Traffic Broadcasting Network (TBN) and its channels nationwide provide
useful information on road conditions around the clock, reporting on major
accidents, ongoing construction, and other planned and sudden conditions
interfering with traffic. However, the network broadcasts these events only
after or while they take place, expecting viewers to grasp the risks involved
themselves. Of course, the current structure of information broadcasting
is useful as it is, as it tells drivers to anticipate road conditions, decide on
alternative routes where possible and take care not to get into an accident.
Nevertheless, the current structure provides information in a passive and
ex post manner.
TBN Daegu of the RTA has decided to tackle this problem from a novel
perspective. It raised a bold question: Why should car accidents get reported
only after the fact? With the massive amounts of data at our disposal today,
why don’t we try to predict likely accidents beforehand? Certainly, telling
drivers information on likely risks and accidents would do far more to prevent
them than simply reporting accidents that have already occurred.
23
Radio broadcasting highly likely spots for accidents
The experiment may sound all too futuristic, like something that could be
realized only in the world of the movie, Minority Report. The vision, however,
is not such an outlandish idea. It just was a service system capable of
analyzing accumulated data to predict the risk of accidents by location and
time using a probabilistic approach. The concept is quite simple, but it could
make significant difference in people’s everyday lives. After all, the system
would be designed to warn against, and thereby prevent, car accidents.
Assessing current risk based on historical data
TBN as an institution possesses a far greater volume of data on reported

car accidents than other official and governmental sources. TBN channels
nationwide each hire hundreds of correspondents and reporters. The
real-time updates they provide on local car accidents throughout the day
easily outnumber the car accidents registered with official authorities. These
channels also receive significant numbers of reports from viewers or
listeners as well, running to hundreds of thousands a year.
24
Web Page for Real-Time Reports on Car Accidents Nationwide
The data on traffic situations TBN channels have accumulated over time formed
the primary source of information for the system, and was combined with
secondary data on car accidents and weather conditions provided by the Road
Traffic Authority (RTA) and the National Meteorological Administration (NMA)
to assess the likelihood of accidents by location, hour, and weather conditions.
The resulting index of car accident risks was used to provide comprehensible
information for the public, such as in the form of the Daily Risk Index (for
the morning and the afternoon), the 10 Riskiest Spots for Car Accidents, the
risky hours, and daily updates on the spots where accidents were likely.
Infographic-based intuitive information service
Much of the information generated by the system was processed into

various visual aids, such as graphs, icons, and maps, so that broadcasters
and viewers alike could grasp it intuitively. An exclusive car accident forecast
website was also launched as part of the broadcast content.
25
Car Accident Forecast Service
26
This project is remarkable for establishing a system that immediately uses
the results of big data analysis to provide objective and convincing
information for the public, with the people in Daegu and Busan being the
first such audiences. This pilot operation proved the potential for success
of the service model, which will be expanded into the Seoul-Gyeonggi
region before the rest of the nation.
27
3. Data Analysis
Defining the variables in car accidents and their correlations
The concept of data analysis modeling involves identifying the variables that
influence a given problem, assigning appropriate weights to the variables,
and developing a formula to add up the relevant terms. This process sounds
deceptively simple, but it is not so easy to define relevant factors and
accurately assess their respective importance.
Machine learning can be used to automate the assignment of weights to

diverse variables during this modeling process. The machine adjust the
weight of each variable and finds the weighting that shows the greatest
degree of accuracy. Most data reflects real-world phenomena and changes
in content constantly. A data analysis model therefore needs to be updated
and calibrated accordingly. Machine learning allows the model to update and
correct itself when the predicted degrees of accuracy outrun the given
parameters. An analysis model created this way is far more durable than
models created without machine learning.
The machine learning technique applied to the analysis model in this project
is the Bayesian network, which is a complex technique that first identifies
the core variables influencing a given problem, and then defines the
correlations between them before deciding the weight to be assigned to
each variable. Despite its complexity, the Bayesian network technique was
used because it works relatively well with incomplete data and can reflect
the causal relationship between variables. The variables used in car accident
analysis included the number of accidents by month, day of the month,
28
day of the week, temperature, precipitation, humidity level, and the
discomfort index.
Data Analysis for Forecasting Car Accidents
NMA
Target regions
identified
First, the past data on car accidents accumulated by TBN, the NPA, and
other such authorities was used to analyze the frequency of accidents by
location, hour, and day of the week. The accident data was also mapped
with corresponding meteorological data, such as temperature and humidity
level, in order to analyze the correlation between car accidents and weather
conditions. The weighting of each variable was then determined. Moreover,
the “variability” of the risk index at each given spot was used as an additional
variable to make it possible to predict unlikely spots for car accidents as
well. To this model were added time factors, locations, and weather data
so as to estimate the probability of car accidents.
29
Car Accident Report Data
Spot of accident, Type of accident
Processing the data
Like multiple other big data projects, this project, too, required extensive
processing and refinement of the given data. The text-type information on
the locations of car accidents, such as “From Dongsingyo Bridge on
Gukchaebosang-ro toward the Jonggak Negeori Junction, in front of the
Jung-gu District Office,” had to be translated into coordinates. The
morphemes and entity names included in the text-based natural language
also had to be processed further. Because the TBN channels in Daegu and
Busan did not use consistent data formats, all relevant data also had to
be refined.
30
Time
Type Source Quantity Information Provided
Span
Time, day, coordinates, location,

TAAS car Daegu: 42,427
RTA 1 year type, and fatalities from each
accident data Busan: 38,056
accident
Time, day, severity (number of

UTIS 5 Daegu: 104,087
RTA fatalities), type, coordinates, and
traffic updates minutes Busan: 156,753
location of each accident
Time of day (morning/afternoon),

Local weather Daegu: 164,544 temperature, precipitation, humidity,
NMA 3 hours
forecasts Busan: 117,135 and wind velocity at each location/
hour
Time of day (morning/afternoon),

Weather Daegu: 1,118,546 temperature, precipitation, humidity,
NMA 1 hour
updates Busan: 1,749,105 and wind velocity at each location/
hour
Time of report, day of the week,

Breaking traffic 1 Daegu: 161,065
RTA-TBN night or day, spot, coordinates,
news minute Busan: 157,199
starting and end points
Collected by Reports of car accidents shared by

Web and social Daegu: 4,238,782
project 1 day the public on social media and
media Busan: 522,343
developers other places on the web.
31
Preventing car accidents by issuing risk warnings
The system developed by this project is already providing services in Daegu

and Busan. TBN channels in these cities air the provided information eight
times a day, under the title “Today’s Big Data Analysis for Transportation
Safety” (TBN Daegu: FM 103.9MHz, http://daegu.tbnbp.or.kr, TBN Busan:
FM 94.9MHz, http://busan.tbnbp.or.kr).
32
The broadcast information may be simple in structure, merely telling
listeners about the likely spots for car accidents on the basis of past data.
Listeners, however, are attentive as the program provides scientific and
objective information based on analysis of millions of data items. Drivers
who are near or passing through the forecasted accident spots, in particular,
will naturally take extra caution. This, in turn, will help reduce car accidents.
Actual decrease in the number of car accidents
The project has been running for only a short period of time so far, and
requires continued monitoring in the future. Nevertheless, since the program
started airing in 2016, the car accident death toll has dropped by six
percentage points and the number of injured persons by 10 percent, while
the total number of accidents in downtown Daegu has also decreased by
nine percentage points. This is significant, given that the number of car
accidents is increasing by three or four percentage points annually
elsewhere in Korea.
Car Accident Statistics in Daegu (Daegu Police Agency)
Year No. of Accidents Fatalities No. of Injured Persons
2014 14,519 185 20,814
2015 14,228 (-2%) 161 (-13%) 20,433 (-2%)
2016 12,925 (-9%) 151 (-6%) 18.363 (-10%)
※ Figures in parentheses represent the percentage increase or decrease over the previous year.
The data for 2016 span the months January through October only.
The RTA expects that the system developed by this project can significantly
help reduce the number of car accidents in Korea when it starts services
nationwide.
33
5. Future Research Are
Success factors
There are mainly two factors that have led to the success of this project.
First is the clarity of the objective. Neither the quality of the data used nor
the sophistication of the analysis model was as important. The project was
conducted with the sole and very specific objective of generating forecasts
on likely car accidents and providing them in less than a minute on radio.
The clarity of this objective guided the analysis and assessment of car
accident risk in certain cities by hour and location.
As a result, data engineers were able to stay focused while collecting,

processing, and converting data and analysts were able to narrow down
the scope of variables to be considered. Service developers were able to
conceptualize and plan, with clarity, the specific types of services that they
needed to develop. The project also involved the participation of radio
producers who were the intended users of the system.
Second was the applicability of the information generated. Accessibility and

comprehensibility is crucial in servicing traffic-related information. As
important as accurately processing and analyzing data is providing
information so generated for actual drivers in a timely manner via appropriate
channels. Because this project targeted radio broadcasting from the very
beginning, the parties involved knew what information to provide and in what
way. Even in big data projects, it is just as important to use the data
generated as to produce such data.
34
The dilemma of accuracy
Slight errors in forecast would not land a pilot project like this one in a heap
of criticism. After all, it is better to be safe than sorry. Car accident forecasts
are fundamentally intended to warn drivers so they can avoid accidents.
The accuracy of the forecasts is of secondary concern. The most important
indicator of performance should therefore be how effective it has been in
reducing car accidents.
Nevertheless, as this is a big data project making short-term forecasts, we

are naturally interested in how powerful its forecasting performance is.
Therein lies the inevitable dilemma: If the model used often provides
erroneous forecasts, people would doubt its performance. If the model
forecasts too accurately, however, people would doubt whether it is actually
being heard by the intended audience. The ideal outcome would be for the
model to provide information to the general public so that drivers would
take caution and thereby significantly reduce the number of car accidents.
In other words, the more people heed what they hear on radio, the less
accurate the forecasts will seem. The “inaccuracy” of the forecasts attests
to how well and widely they are being used.
Evaluation of performance of projects like this one that seek to forecast

and prevent negative consequences should therefore involve simulating or
testing the models without distributing the information they generate to
outside parties for certain periods of time. This project evaluated system
performance by simulating past periods of time (see the graph on the next
page for the results). In order to measure forecast accuracy, a more complex
approach would be needed, differentiating forecasts between when the
ratings are high and when the ratings are low.
35
Limits inherent to raw data
The limits inherent to raw data often impose great constraints on the data
analysis process itself. Even with extra-handling, such as deleting erroneous
data and replacing omissions with estimates, it is nearly impossible to
overcome the limitations.
This project was able to forecast car accidents with only certain levels of
frequency. Forecasting accidents with less frequency would have required
far more data. As the project developers sought to provide forecasts on
likely accidents throughout the given area, this insufficiency presented a
major obstacle. The results of simulating the forecasting accuracy of this
project’s data analysis model are listed below.
Top Ranking Accuracy
This project targeted at least 85 percent accuracy in its forecasts,

succeeding with respect to the top 30 spots in Daegu and the top 80 spots
in Busan. Because car accidents tend to recur more at certain spots than
others, forecast accuracy increases with respect to spots with record of
36
many accidents (data). However, forecast accuracy declines with respect
to spots with record of few accidents due to the insufficiency of data. Busan,
which has far more accidents per location than Daegu, afforded relatively
accurate forecasts regarding the 100 most frequent accident spots. Forecast
accuracy, however, fell below the reliable level even concerning the 50th
most frequent spot and beyond in Daegu due to the relatively less data
available. Of course, it is unlikely for spots with record of few accidents
to figure at the high end of the list of likely accident spots. Nevertheless,
in order to ensure wide use of the model in the future, it is important to
devise a finer-tuned and more accurate technique for analysis to enhance
accuracy with these spots as well.
Policy measures needed to improve the quality of data
The quality of data is central to the success of big data projects. It is not
uncommon for such projects to falter and fail in their early stages due to
the less than satisfactory quality. Although big data analysis has emerged
as a pivotal paradigm in technology today, data used to be accorded only
secondary importance until recently, necessary insofar as it supported
application services. Policymakers neglected its importance as a resource,
and failed to prepare and develop systemic infrastructure in advance.
Numerous types of data generated today are of such inferior quality that
they cannot be used in proper analysis. This reflects the absence of a
standard system or rules concerning their collection and generation. Much
data today is generated according to arbitrary rules established by past
system designers or administrators. Even within a single system, different
devices generate data of different formats. The absence of a well-organized
system for managing collected data also leads to its loss or the creation
of unsorted data that cannot be restored successfully. The lack of
consistency in data significantly undermines its usability, whether in the
public or private sector. It is therefore critical for organizations to prepare
and enforce systems that will ensure the systematic generation and
management of quality data.
37
Standard systems for its generation and management need to be

established for all different domains, and additional software systems
(logical systems or data use networks) need to be developed to ensure its
uninterrupted transmission and collection across diverse fields of tasks. The
Korean government needs to undertake major projects for developing and
improving the nationwide system of data generation.
Facilitating the use of big data analysis in traffic safety
The data analysis model developed in this project will evolve and expand
into a better one capable of reflecting diverse local and regional
characteristics in car accidents. The Korean Ministry of Science, ICT, and
Future Planning (MSIP) has recognized the utility and potential of this service
model, and begun to consider supporting its development into a national
version. Negotiations are currently under way to expand the scope of radio
broadcasting services out of Daegu and Busan into Daejeon, Gwangju,
Incheon, and other such major cities, while making the car accident
forecasts available on personal GPS devices in the Seoul-Gyeonggi region.
The RTA intends to complete the evolution of the model into a national
system by 2019, while also extending the reach of forecasts from downtown
areas onto major expressways.
Numerous attempts have been made to diversify the applications of big data
analysis, but few have yet attained tangible success. This project provides
an inspiring example of how big data analysis can be used to promote public
safety. The radio broadcasts will raise public awareness of the value of big
data, and also catalyze the use of its analysis in many diverse fields over
and beyond traffic safety.
38
How Big Data

Can Be Used
In the Public
Sector
Case 3
Facilitating International Trade
Project title Development of a Data-Based All-In-One Export Service

Description Developing a consulting program capable of providing personalized
advice on products and target countries or regions for medium-sized
exporting businesses
Data used Global capability evaluation records of businesses (approx. 20,000
from 2013 to 2016) from KOTRA; records on export consulting
provided for businesses (approx. 90,000 from 2013 to 2016) from
KOTRA; data on export volumes by Korean businesses (approx.
34,000 items from 2013 to 2016) from the KCS; and data on the
trends in international markets by country (approx. 600,000 items
from 2013 to 2016) from the International Trade Center.
Developers KOTRA, Encore, Forcewin, and Tobesoft.
40 ※ Conducted as part of the Data-Based Future Strategy Policy Support
Program of the NIA Future Strategy Center.
1. Background
Supporting small and medium-sized export business
Small and medium-sized enterprises (SMEs) often lack the benefit of

economic and business analysis institutes to which major corporations have
access. Major corporations with well-established channels and markets
abroad often use the analyses provided by their own research institutes in
devising their export strategies, and employ wide networks of informants
and correspondents to design fail-proof export policies. These are luxuries
that SMEs cannot dare imagine having. For most such owners, who operate
their businesses based on firsthand experience and rumor-like information
from personal connections, maintaining the status quo is their most
important task. Misaeng, a Korean web comic series that has become a
sensation among working people, is popular because it realistically portrays
the daily struggles of small businesses and their employees.
“We no longer have the analyses and subscription

research data provided by numerous research institutions
that we used to have when we were with a bigger company.”
“KOTRA provides
the Trade Map on
its Korean website,
“According to a recent report, through partnership
mixers and grinders are enjoying with the ITC.”
growing popularity at trade centers
in Bangkok (Thailand), Hanoi
(Vietnam), and Chungking (China).
The WTO’s International Trade

Center (ITC) provides more
accurate statistics on international
trade under the title, ‘Trade Map’.”
Excerpt from Misaeng, Season 2
41
Information is more important to SMEs, whose entire future prospects can

be ruined by a few simple mistakes. The reality, however, is not so easy
to deal with. Jang Gu-rae, the hero of Misaeng who has recently left a
big trading company to join a small struggling one, realizes the bulk and
depth of the obstacles facing his new employer. He has a hard time
accepting the fact that quality information, even in this day and age of
informatization, is so hard to obtain.
Korean government sets out to provide assistance to 100,000

exporting SMEs
The South Korean economy is extremely dependent on international trade.

In an effort to break through the current stagnation in exports and achieve
KRW 1 trillion in total export value, the Korean government has launched
an ambitious program to support 100,000 exporting SMEs in the country.
Yet the poor access these SMEs have to quality information on overseas
markets continues to hinder their expansion on the international market.
According to a business survey from 2015, SMEs most needed personalized
market information (49.8 percent) and business information (21.6 percent).
This suggests that the majority of them struggle even to decide which
products to export and which countries to target.
42
Providing personalized export information for SMEs
The information gap inevitably translates into a competitiveness gap in the

world of exports. With the goal of narrowing down this competitiveness
gap between large corporations and SMEs in Korea, the government took
to developing an export information infrastructure. The Data-Based
All-In-One Export Service (DBAES) is the result. The Korea Trade
Investment Promotion Agency (KOTRA), a public corporation, combines its
own data with that collected from a variety of other sources, including the
ITC and the Korea Customs Service (KCS), to generate and adapt export
information to different types of businesses. The service system then
provides SMEs with this adapted export information via consultants and
online channels.
Comprehensive information for successful export strategization
All exporting businesses want to know which products they should sell and
to which countries. A myriad of factors should be considered to answer
these questions, including which countries have the demand for what
products, how large the markets are, what the tariff and exchange conditions
are, whether there are any political issues between those countries and
Korea, what the cultural and customary characteristics of those countries
are, and how far those countries are, geographically, from Korea. These
43
questions are repeated in deciding which specific products should be

exported to those countries.
KOTRA DBAES
Tariff rates: Economic Business trip Industrial Product

Businesses can indicators by information: database: database:
access and find country: Key information Easy access to Easy access to
information kept Economic before making information on information on
by KOTRA on performance of overseas the industrial overseas markets
tariff rates 152 countries at business trips structure of each for each product
a glance country
KOTRA possesses extensive data on all consultations and advice it has

provided for SMEs in Korea on matters of export since 2013. It also employs
correspondents around the world who monitor new developments and
trends in international trade. The reports relayed by these correspondents
provide detailed and up-to-date information on trade situations, which is
published and distributed via the KOTRA website. The KCS also provides
information on changes in exports from Korean companies around the world,
while the ITC provides trade information country by country.
The data and information provided by these sources are of immense value
to exporting SMEs. At present, however, they remain fragmented and
compartmentalized. It is difficult for SMEs to find the best sources of
information they need. Even if they succeed in finding that information,
accurately understanding and analyzing it would still require the skills of
expert analysts that most SMEs do not have the financial wherewithal to
employ.
If this data can be integrated and analyzed by a well-functioning analysis

model, quality information would be generated that could significantly help
struggling smaller companies devise effective export strategies. KOTRA
sought to produce such a system through this DBAES Project.
44
DBAES at Work: Sample Page
Through this project, KOTRA sought to establish an extensive framework

of data infrastructure, generating and providing a comprehensive range of
important information to support exporting SMEs. The infrastructure involved
the creation of an index for assessing the potential market appeal of each
product on each national market. The index is meant to help SMEs decide
which products to export and to which markets.
An SME intent on using this service can enter the name of the product
it wishes to export into the website. The website then automatically displays
a list of recommended target countries for the product, as well as detailed
information on why those countries are recommended.
45
How the DBAES actually helped an SME
• Company: a medium-sized manufacturer of door-lock devices
After the DBAES system was launched

Before the DBAES was available
➡
The company sought the GCL consultation
The company decided
service of KOTRA and accordingly
to target Taiwan and attempted
changed its target country to India,
to export its products there,
successfully exporting USD 20,000 worth
but to no fruitful end.
of goods as of the latter half of 2016.
46
3. Data Analysis
Developing a comprehensive index
In order for KOTRA to provide information and consultation specific for each
business, it first needs to know the business in detail. KOTRA already
possesses data on the global competence level (GCL) of 20,000 or so of
the 90,000 exporting SMEs in Korea. The GCL data includes details on each
company’s management status, personnel capacity, annual revenue, share
of revenue made up by exports, and the types of products it exports. KOTRA
uses this data to assess the export capacity of each business first.
GCL Analysis Page
47
Next, KOTRA also needs to ascertain the trade conditions of all the countries
with which Korea trades, and make detailed assessments of each. This
process requires consideration of a wide range of information. Accordingly,
KOTRA has developed the Country-Specific Market Appeal Index (CMAI)
to evaluate the potential market appeal of various export products on each
country’s market. This index shows accessibility, attractiveness, growth
potential, and competitiveness of each market abroad. Accessibility is
measured in terms of the tariff rate, trade barriers (including import-
restricting policies) and physical distance from Korea. Attractiveness is
measured in terms of market size and popularity of the given product.
Growth potential is measured using the rates of increase or decrease in
the quantities of the given product a market imports and the rates of change
in the market’s imports from Korea in general. Finally, competitiveness is
measured using the trade balance associated with the given product as well
as its recent performance on the market.
Once the CMAI score is obtained, the GCL scores of the given business
and of the product it wishes to export are added to identify the
recommended target countries. As the DBAES system already stores most
of the information needed to assess each business’ export potential, the
user can simply enter the given business ID and the product code into the
website to generate information well-suited to the needs and characteristics
of that business. For instance, the system recommends markets with
relatively high entry barriers, but that offer great potential, such as Canada
and China, for a lipstick manufacturer with a strong export record. The
system recommends other markets that are smaller and more accessible,
such as Malaysia, for another lipstick manufacturer with a smaller export
record. The system also provides guidance on other support programs and
resources available from the Korean government for small exporters.
48
Linking the GCL & Business Data for Comprehensive Analysis
Time-consuming and costly data processing
Not only was it difficult to collect a wide range of diverse data and information
necessary for comprehensive analysis, it was even more difficult to determine
and assign accurate weights to all the variables involved. The four dimensions
of the CMAI―accessibility, attractiveness, growth potential, and
competitiveness of the given market―were crucial to the accuracy of analysis,
and required extensive consideration. There were numerous variables involved,
but the correlations between these variables were also significant, as was the
complexity of those correlations. In order to design a reliable analysis model,
KOTRA sought the participation of academic experts through a separate
research project. The analysis model thus developed was submitted to rigorous
verification with a view to minimizing errors and distortion.
The project had to overcome considerable internal resistance early on,

including skepticism over the reliability and validity of the analysis model
and its results. Now that the DBAES system has successfully been
launched, favorable reactions are on the rise. Many agree that the system
49
has helped to improve and standardize the quality of expert consultations

that KOTRA had been providing long before its creation.
As with most other big data projects, inconsistency and poor data quality
also made things time-consuming and costly. Integrating data from multiple
sources often raises problems over inconsistency. Organization A, for
example, possesses information on manufacturers of certain products, while
Organization B possesses information on the demand for the same products
in each major country. The two organizations, however, use different product
codes or versions of data. In the meantime, Organization C also claims to
possess information on the demand for given products, but its data do not
match Organization B’s demand data. Data analysts spend extensive amounts
of time processing and refining this inconsistent data. The refining process
alone took up more than a month of the given schedule in this project.
Type Name Source Description
Exporting SME 80,000 SMEs (overviews, export records,

Information records of services provided)
Export Capacity GCL analysis on 16,000 SMEs

Evaluation KOTRA (across eight dimensions)
Client
information Customer
90,000 consultations provided annually
Consultation
(by telephone or online)
Records
Business Data on 6 million+ businesses in Korea

KED, NICE, etc.
Assessment (across 154 categories, including overviews)
Overseas Market
KOTRA Information on performance of export products
Market Information
information Worldwide Export UN Comtrade, Global trade statistics provided by UN
Statistics etc. Comtrade, ITC, etc.
KOTRA Projects Information on over 2,000 support projects

Resource KOTRA
Database a year
information
Project information Other Information on other support projects available
50
Improving access to quality information for SMEs’ export business
Not every segment of Korean society possesses a system for making timely
and informed decisions on the basis of big data analysis. DBAES is thus
expected to play a pioneering role in paving and expanding the information
infrastructure for exporting SMEs in Korea struggling to find and access
quality information on trade and exports and also help them cut the costs
and time required for data analysis. KOTRA intends to undertake similar
projects in the future to help exporting SMEs in Korea grow into strong
players on the international market.
Present 2nd 3rd

Generation Generation
Establishing new
global standards
Enhancing global
competence
Services
tailored to GCL
Fostering businesses
capable of setting global
standards
Enhancing global
competence
Enhancing export
capability
To foster SMEs into strong, sustainable players on the international market

by providing them with GCL-tailored market information and services.
51
Paving the way for business innovation with big data analysis
KOTRA already possesses immense quantities of export-related data, which

it continues to collect and generate today. Prior to DBAES, however, it did
not possess a system that could ensure the timely and effective use of
this data. In the past, the agency’s expert consultants had to advise their
client businesses according to the information that they had individually
collected and kept on their personal computers. The wide variety of data
accumulated by KOTRA was dispersed across multiple departments without
a centralized and overarching management system, while KOTRA itself did
not have any plans for systematizing this dispersed data into useful
resources.
The DBAES Project therefore helped to bring the quality of this information
infrastructure up a notch, greatly boosting KOTRA’s confidence in its ability
to make use of big data, and inspiring it to launch plans for future
collaborative projects in this field with various other agencies and
organizations on matters not limited to exports, but also including the
economy and trade in general.
52
5. Future Prospects Are
Improving performance through collaboration with human experts
With the public’s improved understanding and awareness of big data, the
inconsistency in the sizes of data or the types of data-collecting platforms
has been on the decline. Few today think of big data projects as something
to be handled exclusively by ICT experts. Organizations intent on enhancing
their analysis capacity consider increasing the number of statisticians, data
engineers, and data analysts they hire. It is important to internalize these
human resources in order to enhance that organization’s capacity with big
data on a continued basis.
However, equally important to ensure the effective use of big data analysis
is making continued use of existing experts who continue to apply traditional
approaches to problem-solving. These human experts have accumulated
professional experience and knowledge in their respective fields over
decades, and are now seasoned veterans at making reliable decisions using
their unique and traditional methods. Big data may seem to present radically
innovative solutions and even appear capable of catching up with these
human experts’ performance in short spans of time. No big data projects,
however, can ultimately succeed without the active participation of these
human experts. Such projects can maximize their prospects for success
when human experts take on a leading role, with data analysts providing
help as needed.
The DBAES Project, too, made effective use of the consulting records and
data that multiple expert consultants had accumulated over the years. The
data analysis model itself was designed by taking into account their opinions.
53
These experts were also the first to try the pilot system and suggest
improvements. Although some expressed doubts regarding the utility of the
DBAES system at first, continued interaction and discussion with them did
considerably much to abate their initial resistance. The collaboration
between these expert consultants and the data specialists went a long way
toward maximizing effectiveness of the resulting system and improving the
quality of advice provided to exporting SMEs.
Copyright issues interfering with the wide use of the DBAES system
The information provided by the ITC on the changing volumes of goods

traded worldwide was made accessible to KOTRA personnel only, allowing
outside businesses to obtain the information only indirectly via consultants.
This was because of the copyright issues involved. The ITC specifically
prohibits the public disclosure of the data unless it has been modified and
processed. The problem is the ambiguity over the concept of “data
modification”. The project team sought legal advice on the matter, but could
not arrive at a clear-cut answer. Discussing with the ITC would have taken
extra time for the project to finish. Disclosing the data publicly could invite
legal disputes. As a result, the team decided to make the ITC data available
in a limited and passive manner only.
Other sources also restrict the public disclosure of their data, “except for
public purpose and in the public interest.” This, too, invites controversy over
what the abstract idea of “the public interest” actually encompasses, and
therefore encourages developers to settle for a very conservative and narrow
reading of the restriction.
The use of diverse forms of data often raises such copyright issues. The
lack of consistency among data providers’ policies on the scope of public
disclosure often presents an obstacle to big data projects. As the data
industry grows, copyright disputes among data providers will likely increase.
It is therefore important to prevent these disputes by enacting proper
54
legislative and policy guidelines.
Promoting the use of big data in exports
Supporting global businesses and contributing to national happiness and

human progress
The DBAES Project, while still in its nascent stage, has allowed KOTRA to
bring together fragmented data from diverse sources and thereby generate
new analyses and insights necessary to provide quality and trustworthy
information for exporting SMEs. KOTRA intends to accumulate more data
for the system and expand its functions and services so that businesses
themselves can conduct the analysis they need of diverse export conditions.
DBAES is expected to help SMEs increase their exports and explore new
markets overseas, while also prompting policymakers to develop new
measures to support their global expansion. Quality information is key to
saving the 90,000 exporting SMEs in Korea from additional unnecessary
hardship.
55
How Big Data
Can Be Used
In the Public
Sector
Case 4
Understanding the Business Cycle

with Credit Card Transaction Data
Project title Development of a Data-Based Monitoring & Early Warning System

for the Business Cycle
Period October to December 2015 (first phase); June to November 2016
(second phase)
Description Developing a system and consumption index capable of analyzing
and monitoring the business cycle using statistics of credit card
transactions.
Data used Weekly and monthly statistics on the amounts of approved payments
made with Shinhan credit and debit cards (from 2008 to today) and
Statistics Korea’s Consumer Price Index
Developers Shinhan Card, Korea Environmental Economics Association, Statistics
Korea, and Gaion
Program of the NIA Future Strategy Center. 57
1. Background
Need for a timely index on updates in the business cycle
Statistics Korea, an official government organization in charge of amassing

various statistics for policy and public purposes, continues to analyze and
manage a variety of indicators on the Korean economy, including the
Consumer Price Index (CPI) and spending propensity. However, the
organization updates its statistics monthly or even across longer intervals.
The information it provides often fails to support urgent government
intervention in unforeseen crises, such as the Middle Eastern Respiratory
Syndrome (MERS) and earthquakes. Nor does it allow for prompt
confirmation of the effects of new laws and policies implemented, such
as the Improper Solicitation and Graft Act.
When Policymakers Require Prompt Updates on Business Cycle Statistics
National emergencies and disasters
- How did last week’s disaster/emergency affect domestic consumption?

- What are the industries and regions in need of emergency aid to stimulate consumption?
- How quickly or slowly is the pace of consumption recovering?
Policy events
- How do special national policy events stimulate consumption?

- What changes are required to improve the effectiveness of such policy events?
58
Although Statistics Korea and related agencies have been struggling to
overcome these limitations by diversifying the range of data they collect,
it is still difficult to find a database that affords prompt analysis of the
business cycle.
MasterCard has developed an index known as the Spending Pulse, based

on its card transactions data, to provide updates on spending propensity
by industry and region in the United States. The index is not only consistent
with the US government’s official statistics, but also provides information
at least a week ahead of the government’s announcements. This project
was thus undertaken to develop a similar system that can quickly monitor
and analyze spending propensity in Korea.
59
A system providing weekly updates a domestic consumption
Credit card transaction data forms an optimal source of timely and accurate
information on changes in the business cycle. Almost 73 percent of all
transactions in 2016 in Korea were made with credit cards. This project
accessed and analyzed the credit card transaction data of Shinhan Card,
boasting the largest credit market share in Korea, in order to develop a
domestic consumption index that provides business cycle information by
region, industry, and income quantile on a weekly and monthly basis.
Shinhan Card’s Big Data
- 22 million members in total

93 TB
- 2.7 million affiliated stores
1.6 trillion records
- Average of 200 million
transactions approved monthly
- 5.8 million mobile application
members
- 3 million customer calls handled Amounting to
by customer service center 130 million books
- 6.4 million visitors to website
3,000 iPads
- More consistent data than
available from other credit card
companies
60
Economy Scanner: A system monitoring the business cycle 24/7
The first phase of the project involved the Korea Environmental Economics
Association (KEEA) and Shinhan Card co-developing a business cycle index
on the basis of card transaction data. The index was based on six monthly
indicators (overall, local, industrial, income quantile, affiliate store size, and
age) and two weekly indicators (overall and income quantile). In the second
phase of the project, the weekly indicators were diversified into six like
the monthly indicators. In addition, 35 service pages were developed to
enable the user to calculate the index scores automatically and visualize
them.
Economy Scanner Service Page
61
3. Data Analysis
Comparative analysis of past and current retail consumption
The consumption index at the heart of this project was developed by a team
of expert economists associated with the KEEA, using data provided by
Shinhan Card, the Credit Finance Association, Statistics Korea, and the Bank
of Korea, and testing the diverse models and approaches in the process.
The resulting consumption index compares the current volume of retail
consumption in Korea to that of 2010, and adjusts it by taking into account
the rate of inflation and seasonal conditions. The raw data on credit card
transactions was modified and processed into data sets by age, income
quantile, affiliate store size, and the like. For accuracy, data on non-retail
transactions, such as utility bill payments and wholesale transactions, were
excluded from analysis.
62
Service Description
- Information on ordinary, actual, and seasonal

adjustment indices of previous month
Overall (1) • Overall index
- Weekly MMM (marketing mix modeling) index
and monthly index forecast
- Graphs on current ordinary, actual, and seasonal

adjustment indices and recent trends
Monthly index • Overall monthly index - Rates of monthly change in ordinary, actual, and
(18) • Detailed monthly indices seasonal adjustment indices along local,
industrial, store size, income quantile, and age
indicators
- Graphs on earlier weekly indices and weekly

• Overall weekly index MMM indices and recent trends
Weekly index (8)
• Detailed weekly indices - Recent trends in weekly MMM index by region,
industry, store size, income quantile, and age
- Long-term time-series trends in ordinary, actual,

• Detailed analysis of and seasonal adjustment indices by region,
overall and detailed industry, store size, income quantile, and age
Trend analysis (4)
monthly/weekly - Long-term time-series trends in weekly MMM
indices and trends indices by region, industry, store size, income
quantile, and age
• Comparative analysis - Comparative analysis on trends in ordinary

Comparative
of detailed monthly indices by region, industry, and income quantile
analysis of official
indices and official and in official indices used by Statistics Korea
indices (4)
ones and Bank of Korea
The automation of index scoring proceeded in four main steps. First, the
necessary data was collected, including value of approved card payments
and other information provided by the Credit Finance Association and
Statistics Korea. Then it was entered into the index model to calculate the
index scores. The inflation rate and seasonal effects were then added to
adjust the scores. The final scores were then sent to the monitoring
system.
63
Economy Scanner: Weekly MMM Index Page
Using a marketing analysis technique to capture anomalies
The marketing mix modeling (MMM) technique, conventionally used in

marketing to measure performance, was used to capture anomalies in
statistics. This technique assumes the amount of revenue earned in the
absence of marketing as basic revenue, and additional revenue earned as
a result of marketing as marketing effect revenue. This technique allows
the analyst to detect and remove the amounts of revenue generated by
various occasions, such as special holidays.
64
Using MMM Analysis to Detect Unusual Patterns in Weekly Consumption
>> Any consumption above the critical level regarded as unusual
MERS outbreak (May 20, 2015, and onward)
Actually Approved
approved amount
amount estimate
Gap
Approved amount estimate
- Actually approved amount
Enhancing fitness using a flexible data analysis model
Although the quality of data provided by Shinhan Card was impeccable, it

was not formatted according to the standard industrial classification system
used by the Korean government. The locations given in the data of numerous
affiliate stores also did not match actual locations. This problem is quite
common in credit card transaction records provided by various sources. In
order to improve the accuracy of analysis, it was necessary to process and
refine the data. The Bank of Korea’s standard industrial classification system
was applied and the errors in store locations corrected.
65
Index scores are generally calculated by entering data concerning a specific

past period of time into the developed statistical analysis model. The weekly
index at the core of this project, however, was measured by varying the
statistical analysis techniques on the basis of weekly data review. While
this approach required human intervention and judgment, it ultimately
helped to maintain the high degree of fitness of the resulting scores. This
approach should be considered in the development of other indices that
need to be updated in a timely manner.
Source Information Provided Data Span
• Overall value of approved credit/debit card payments

• Value of utility bills paid with credit/debit cards
• Value of approved credit/debit card payments by income
quantile
Weekly and
Shinhan Card • Value of approved credit/debit card payments by region
monthly
• Value of approved credit/debit card payments by age
• Value of approved credit/debit card payments by industry
• Value of approved credit/debit card payments by store
size
• Overall ordinary value of retail spending

• Overall fixed value of retail spending
• Overall CPI Monthly
• CPI by category
Statistics Korea • CPI by spending purpose
• Ordinary retail spending by region

• Consumption spending by fifth income quantile (according
Quarterly
to household trend survey)
• Consumption spending by fifth income quantile by product
Bank of Korea • Household consumption spending by industry Quarterly
66
Reduced statistics production lead-time
The graph below shows a comparison of the accuracy of the newly-

developed consumption index and the official statistical information.
Comparison with Statistics Korea’s Ordinary Retail Consumption Index
The last two years of the analysis period Test period (2015 onward)
(2013 & 2014)
Monthly Retail
index consumption index
2013 2014 2015 2016
Accuracy 92% increase over previous

(direction of and following month 83% (19th month and 23rd month)
change) (22nd month and 24th month)
This shows that the project index has a 92 percent match with Statistics
Korea’s in comparison to the previous month, and an 83 percent match with
the same month the previous year.
67
Moreover, the index developed by this system has the added advantage
of timeliness. It monitors and provides updates on the economic impacts
of diverse national events and emergencies in a prompt manner and also
enables policymakers and businesses to monitor the stimulus effect of
short-term economic measures, such as the Black Friday bargain sale drive.
This will help policymakers make prompt interventions in various economic
situations.
Using private-sector data for the public good
This project provides information that is more detailed than other comparable
types and even provides information on consumption by income quantile.
This is extremely useful to statistics organizations. Statistics Korea, in fact,
plans to use the information generated by this project in measuring official
indicators of the national economy. Furthermore, it intends to expand the
scope of the project model into an official system in the future. The Bank
of Korea, too, plans to use the timely updates to supplement its Consumer
Psychology Index, as discussed and decided by the meeting on deciding
the standard interest rates.
68
5. Future Prospects Are
Concerns over biases in data
Using only credit card transaction data to monitor changes in the business
cycle will make it impossible for observers to consider the characteristics
of consumers preferring to do transactions in cash only. Using only a single
credit company’s data, moreover, could lead to biases reflective of the
particular characteristics and tendencies of that company’s customers. The
data used in this project were thus submitted to additional checks for
possible biases. In conclusion, no statistically significant errors emerged.
As the graph below shows, Shinhan Card’s credit card transaction data have
a correlation factor of 0.96 to the similar data of other credit companies,
and of 0.92 to Statistics Korea’s data on retail consumption (which includes
cash transactions). Shinhan Card’s data thus posed no significant concerns
over its reliability.
69
Monthly Trends in Statistics Korea’s Retail Consumption Index and Shinhan Card’s Credit Card Transaction Data
Statistics Korea’s Retail Consumption Index since 2010 and the monthly trends in Shinhan
Card’s credit card transaction data are correlated by a factor of 0.92. → The data from the
two sources on mid- to long-term trends are highly correlated.
Shinhan
Card
data
Retail
Consumption
Index Correlation factor: 0.92
(Statistics Korea)
January 2010 December 2015
Limited expandability due to the particular nature of data used
The index developed by this project is intended to assess overall changes

in the business cycle across Korea. The raw data used was thus processed
on a national scale. It is therefore impossible at present to produce analyses
by region, as this would require data refinement at the regional and local
levels, and development of additional services and features accordingly.
However, as multiple regional and local governments in Korea have also
expressed an interest in this project, an expanded version of the service
system should be developed.
The system developed by this project calculates index scores only without
generating any analysis reports. While it visualizes index scores in an
intuitive manner, the information it provides still caters to experts rather
than the public. In order to increase that public’s interest in using it, an
70
additional system is needed that analyzes and explains the economic and
financial data indicated.
Promoting public-private partnership
Shinhan Card intends to use the system to publish analysis reports of the
business cycle on a regular basis. The company is also considering further
cooperation with the Bank of Korea to conduct research on future business
cycle changes, development of new indices, and expanding the established
system. It also plans to trace and improve the reliability and accuracy of
the indicators it uses, and expand the reach of services into local
governments and research institutions as well. The data and analysis model
used in this project can also be used to produce other indices updated in
a timely manner, including those on gross domestic product (GDP), inflation
rate, and tourism.
This project has paved the way for further cooperation between the public
and private sectors on the development and management of major official
indices.
71
Epilogue
Analysis of data has been taking place in many different fields for centuries,
but this historical fact should not blind us to the novelty of today’s
phenomenon, in which big data analysis is quickly becoming integral to
almost all areas of human activity.
In order to facilitate and promote big data analysis in Korea, the developers
of pioneering big data projects ought to design their projects carefully from
the very beginning so as to minimize trial and error and ensure that each
attempt will be helpful to enhancing the capability to utilize such data.
72
References
[1] Early Patient Detection System Development: Final Report (NIA, December 30,
2016).
[2] Press releases from the Meeting on the Final Report on Development of the Early
Patient Detection System (HIRA, January 2017).
[3] Data-Based Counter-Strategy against the Spread of New Epidemics (Seoul

Metropolitan Government, January 2017).
[4] Data-Based Car Accident Forecasting Service System Development: Final Report
(NIA, December 30, 2016).
[5] Press releases from the Meeting on the Final Report on Development of the
Data-Based Car Accident Forecasting Service System (The IMC, January 2017).
[6] Development of Data-Based All-In-One Export Service: Final Report (NIA,

December 30, 2016).
Data-Based All-In-One Export Service (KOTRA, January 2017).
[8] Development of the Data-Based Business Cycle Monitoring and Early Warning
System: Final Report (NIA, December 30, 2016).
Data-Based Business Cycle Monitoring and Early Warning System (Shinhan
Card, January 2017).
[10] Development of a Model Capable of Providing Timely Updates on Anomalies

in Retail Consumption (Big Data Society No. 3, February 2017).
[11] Shinhan Card’s Guide on Big Data Consulting (http://infographic.shinhancard.com/).
73
NEAR & Future INSIGHT 2017-8
Published on October 20, 2017
Publisher Suh Byung-jo

President of National Information Society Agency
Published by Future Strategy Center, Department of ICT Policy,

Editor Park jeong-eun Director/Ph.D. (pje@nia.or.kr)

Lee Jung-a Director (leeja@nia.or.kr)
Lee Young-joo Executive Principal/Ph.D. (lyj@nia.or.kr)
Park Ji-young Senior Manager (jiyoung.park@nia.or.kr)
Sung Yeon-ji Researcher (flow56@nia.or.kr)
Address (Daegu Headquarters) 53, Cheomdan-ro, Dong-gu, Daegu,

Republic of Korea
(Seoul Office) 14, Cheonggyecheon-ro, Jung-gu, Seoul,
Republic of Korea
(Jeju NIA Global Center) 68-11, Seohojungang-ro,
Seogwipo-si, Jeju-do, Republic of Korea
Tel +82 53-230-1114

URL http://eng.nia.or.kr

How Big Data Can Be Used in The Public Sector

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

How Big Data Can Be Used in The Public Sector

Transféré par

Droits d'auteur :

Formats disponibles

2017-8

is published by the National

Editorial Kwon Eui-jeong, Associate Research Fellow

This report is an English translation of the report

Case 2. Forecasting Car Accidents 21

Case 3. Facilitating International Trade 40

Case 4. Understanding the Business Cycle 57

Preventing Epidemics with

Project title Development of the Early Patient Detection System

No one is safe in the age of global epidemics

Active international exchange and global transportation have increased the

2002 2009 2014 2015 2016

Environmental changes, including particulate matter and heat waves caused

Increasing concern over the safe handling of drugs and treatments

In the absence of a nationwide monitoring system on medical treatments

Demand is growing for a timely and accurate disease monitoring system

2. What Was Produced

A system capable of monitoring epidemics in real time basis

In order to deal effectively with the outbreak and spread of epidemics, it

Guessing epidemics from prescription drug data

Promptness is the first and foremost virtue of such an epidemic monitoring

Changes and details

A GIS solution-based spatial visualization service

Information on the real-time status of infectious diseases was then mapped

Real-Time Epidemic Monitoring System

The system provides monitoring

Probabilistic analysis based on historical data

Based on the information we now have, we can guess what is to come.

[Step 1] Determining target diseases: Prioritize data to be monitored

[Step 2] Analyzing prescription details: Analyze billing data on given

[Step 3] Identifying prescription patterns: Determine sets of drugs

[Step 4] Validating prescription patterns: Reverse-verify diseases that

[Step 5] Applying to DUR system: Detect suspected epidemics based

Regional Prescription Patterns for Influenza

An analysis process that was more complex than it looked

Data Used in Analysis

Type Information Provided Database Name Capacity

Diagnosis records • Billing details Billing Details 47 TB

Illness Statistics 478 MB

Drug information • Information on drug safety DUR 6.5 TB

Medical resources • Information on medical institutions Recovery 0.1 GB

Prompt response to epidemics, almost a week earlier than

The number of persons suspected of having influenza estimated by the

Comparison with KCDC Monitoring Results

Early prevention of epidemics necessary to safeguard public health

5. Future Research Are

PIPA allows only probabilistic analysis

Korea’s Personal Information Protection Act (PIPA) presents a major obstacle

Accordingly, this DEPDS Project was confined to monitoring the frequency

Success of the pilot project expected to expedite the system’s expansion

Laying the cornerstone for data-based epidemic prevention systems

Prompt responses are of the utmost importance in dealing with

Forecasting Car Accidents

Project title Development of a Data-Based Car Accident Forecasting System

The serious socioeconomic cost of car accidents

Approx. KRW 41.8415 trillion

Railroad Approx. KRW 52.6 billion

Shipping Approx. KRW 143 billion

Source: KOTI (2016), Monthly Transport (May)

2. What Was Produced

Radio broadcasting highly likely spots for accidents

Assessing current risk based on historical data

TBN as an institution possesses a far greater volume of data on reported