Académique Documents
Professionnel Documents
Culture Documents
∣Case Report∣
How Big Data Can Be Used
In the Public Sector
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
∙ The publication of any part of this report must be accompanied by an express indication that
this report was produced under the Broadcasting and Communication Development Fund
Project of the Ministry of Science and ICT.
∙ Any unauthorized reproduction of this report is prohibited. If it is reprocessed or cited, the
source must be indicated as the National Information Society Agency (NIA).
∙ The content of this report does not necessarily represent the official views of the NIA.
∙ This report is available on the NIA website (eng.nia.or.kr).
NEAR & Future INSIGHT 2017-8
∣Case Report∣
How Big Data Can Be Used
In the Public Sector
Written by Song Gyeong-bin, Executive Principal
Future Strategy Center, Department of ICT Policy,
National Information Society Agency (NIA)
Case 1
1. Background
The Middle Eastern Respiratory Syndrome (MERS), which broke out and
acutely affected Korea in 2015, led to the quarantine of 16,693 persons
suspected of infection in total: 186 were diagnosed and 38 ended up dying.
The epidemic also caused financial losses of KRW 6.3 trillion in Korea at
the time.
6
the number of people with heat stroke, and even resurrected cholera, which
was believed to have been all but extinguished for 14 years.
With the increasing frequency of epidemics and public health crises, there
is growing demand for a more effective system for monitoring and
countering the spread of disease. Korea now needs a proactive system
capable of monitoring and predicting disease in a timely and accurate
manner. The Project for the Development of an Early Patient Detection
System (DEPDS) was launched with a view to developing such a monitoring
and prevention system.
7
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
8
DUR
Prescription issued
HIRA
Prevents patient Provides real-time Provides a system for
exposure to safety information proactive monitoring of Database on
inappropriate drugs upon prescription inappropriate drugs patients’ past
treatment records
The data provided by the DUR system, however, lacks the information
diagnostic codes. Prescription data contain only information on provisional
diagnoses. In order to use the DUR data in epidemic monitoring, it is
therefore important to make judgment calls on what diagnosable disease
each prescription was targeting. The DEPDS Project thus involved analyzing
past billing data and creating a table of possible diagnoses indicated by
prescription data. The project developers then applied the patterns of
prescriptions with confirmed diagnoses to the real-time data gathered by
the DUR system to identify currently prevailing disease.
9
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Outcome: The system traces the locations of hospitals used by patients in a given region
to support analysis of their possible movements.
10
Due to the time and financial constraints, the DEPDS Project system was
first developed as a pilot instrument for tracing and monitoring influenza,
foot-and-mouth disease, and scrub typhus only. The system, however, can
be expanded to prevent and monitor all legal communicable diseases in the
future. It is also suited to the early detection of anomalies in certain regions,
such as the infection of multiple people with hepatitis C, thereby allowing
for early intervention.
11
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
3. Data Analysis
The same probabilistic process was used in the DEPDS Project. Information
on diagnoses rendered in the past was used to determine the diseases that
specific combinations of prescribed drugs target. The diseases so identified
were then applied to prescription data without fixed diagnoses.
12
Data Analysis for Monitoring of Epidemics
The first disease targeted in developing this new system was influenza,
which occurs quite frequently and also spreads quickly in Korea. In order
to identify the patterns of influenza-related prescriptions, data was sorted
from records on the diagnosis of influenza only. In the case of influenza
alone, hundreds of thousands of different drug combinations were
prescribed. However, some of these prescriptions were also used to treat
other similar diseases. It was thus necessary to verify that the prescription
patterns identified targeted influenza exclusively. The combinations of drugs
targeting influenza were then matched with the DUR data to find all
corresponding patterns so the resulting system could monitor influenza
outbreaks in real time. The same process was used with respect to
foot-and-mouth disease and scrub typhus.
13
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Seoul
Busan
The developers of the DEPDS Project launched it with the initial goal of
identifying and monitoring at least 10 of the major epidemic diseases in
Korea, but had to end the project after designing it to target only three
illnesses. The quantity of data the project was given was far greater than
any of the developers had imagined. The analysis process itself therefore
took up the bulk of time. It took more than half of the project’s schedule
to identify prescription patterns concerning one single disease. Moreover,
the statistics analysts in charge of the project had frequently to seek out
the advice of medical experts while analyzing and interpreting medical
data.
14
There were initial plans to apply machine learning to the project so that
prescription patterns could be identified without human intervention at every
turn. In order for the available machinery to identify patterns on its own,
however, the given data had to converge together at some points. The actual
data used, on the other hand, confirmed that doctors’ prescriptions were
far more diverse and inconsistent than expected. Machine learning was
finally abandoned. Nevertheless, given the emergence of new drugs and
changing trends in diagnoses and treatments, it will be necessary to
automate the pattern identification task at some point in the future.
Additional research will be needed to achieve that automation.
15
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
4. Expected Outcomes
16
Notwithstanding the slight lag, the graphs closely overlap each other. Note
that the DEPDS Project’s information on suspected cases predate the actual
confirmed cases of KCDC by one week or even two. This is because KCDC
relies on post-diagnostic information reported by doctors themselves, while
the DEPDS Project estimates the likely number of patients based on
analysis of real-time data. Because many doctors do not confirm diagnosis
until their second or third examination of a person, KCDC data is bound
to lag a week or two. The system developed by the DEPDS Project
therefore can identify and enable response to epidemics at least one week
before KCDC can.
KCDC intends to use the system developed by this project as part of its
Integrated Epidemic Control System. The majority of infectious diseases
whose spread and prevalence cannot be easily predicted require that the
government secure and distribute sufficient quantities of vaccine in a timely
manner. The fact that this project’s system can identify and respond to
epidemic outbreaks at least a week earlier than the system in use at present
will significantly help the Korean government secure sufficient supplies of
needed vaccines and treatments.
The value of this project has already been well recognized, and beginning
in 2017, the system will be applied to a significantly expanded range of
infectious and other types of diseases that threaten public health.
17
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
18
to ensure rational and flexible responses to major health crises before
banning any sharing of information whatsoever.
It is better for big data projects to start small and have short-run schedules
in mind rather than as full-scale projects. Keeping it small is crucial to
verifying their potential. The experiences of these small successes are
needed to form the basis for the effective use of big data.
This project was run as a small-scale, short-run pilot project targeting only
a few disease. The success of the project, nonetheless, was enough to
affirm the potential of a full-scale project for success. This project is also
significant in that it established a rational, stage-by-stage model through
which big data projects should be undertaken.
Medical
Real-time disease prediction
records on service
entire
population Public signal service
• *Provides basic information
for conducting pathological
DUR data
research on suspected
organizations and regions
• *Provides early alarms in
the event of outbreaks
Meteorological
and spread Government
information • *Supports policy measures (Ministry of
on climate to prevent epidemics, Health/KCDC/
change including a priori Ministry of
quarantining of regions Food & Drug
likely to develop diseases Safety, etc.)
Resident due to environmental
registration factors
data
19
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
MERS was a public health disaster in Korea, with a fatality rate of 20.4
percent, mainly because the Korean government was unable to intervene
in the early stages of the outbreak due to the inaccessibility of information.
It is critical to establish an effective system of collaboration between
departments and a nationwide system to counter epidemics in their early
phases promptly and effectively. The public health authorities in Korea at
first could not trace the movements of persons diagnosed with MERS, and
had to rely on private telecommunication service providers for information.
20
How Big Data
Can Be Used
In the Public
Sector
Case 2
1. Background
According to statistics from the National Police Agency (NPA), over 230,000
car accidents take place in Korea every year. The Korea Transport Institute
(KOTI) estimates that these accidents cost Korean society KRW 42 trillion
a year.
Car accidents cause both massive losses at the individual level, and serious
socioeconomic costs on the entire nation. Multiple organizations have
created and expanded transportation safety infrastructure to prevent car
accidents. Nevertheless, the number of car accidents has been growing
steadily by three to four percentage points every year, calling for innovative
and radical solutions.
22
Why should car accidents get reported only after the fact?
The Traffic Broadcasting Network (TBN) and its channels nationwide provide
useful information on road conditions around the clock, reporting on major
accidents, ongoing construction, and other planned and sudden conditions
interfering with traffic. However, the network broadcasts these events only
after or while they take place, expecting viewers to grasp the risks involved
themselves. Of course, the current structure of information broadcasting
is useful as it is, as it tells drivers to anticipate road conditions, decide on
alternative routes where possible and take care not to get into an accident.
Nevertheless, the current structure provides information in a passive and
ex post manner.
TBN Daegu of the RTA has decided to tackle this problem from a novel
perspective. It raised a bold question: Why should car accidents get reported
only after the fact? With the massive amounts of data at our disposal today,
why don’t we try to predict likely accidents beforehand? Certainly, telling
drivers information on likely risks and accidents would do far more to prevent
them than simply reporting accidents that have already occurred.
23
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
The experiment may sound all too futuristic, like something that could be
realized only in the world of the movie, Minority Report. The vision, however,
is not such an outlandish idea. It just was a service system capable of
analyzing accumulated data to predict the risk of accidents by location and
time using a probabilistic approach. The concept is quite simple, but it could
make significant difference in people’s everyday lives. After all, the system
would be designed to warn against, and thereby prevent, car accidents.
24
Web Page for Real-Time Reports on Car Accidents Nationwide
The data on traffic situations TBN channels have accumulated over time formed
the primary source of information for the system, and was combined with
secondary data on car accidents and weather conditions provided by the Road
Traffic Authority (RTA) and the National Meteorological Administration (NMA)
to assess the likelihood of accidents by location, hour, and weather conditions.
The resulting index of car accident risks was used to provide comprehensible
information for the public, such as in the form of the Daily Risk Index (for
the morning and the afternoon), the 10 Riskiest Spots for Car Accidents, the
risky hours, and daily updates on the spots where accidents were likely.
25
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
26
This project is remarkable for establishing a system that immediately uses
the results of big data analysis to provide objective and convincing
information for the public, with the people in Daegu and Busan being the
first such audiences. This pilot operation proved the potential for success
of the service model, which will be expanded into the Seoul-Gyeonggi
region before the rest of the nation.
27
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
3. Data Analysis
The concept of data analysis modeling involves identifying the variables that
influence a given problem, assigning appropriate weights to the variables,
and developing a formula to add up the relevant terms. This process sounds
deceptively simple, but it is not so easy to define relevant factors and
accurately assess their respective importance.
The machine learning technique applied to the analysis model in this project
is the Bayesian network, which is a complex technique that first identifies
the core variables influencing a given problem, and then defines the
correlations between them before deciding the weight to be assigned to
each variable. Despite its complexity, the Bayesian network technique was
used because it works relatively well with incomplete data and can reflect
the causal relationship between variables. The variables used in car accident
analysis included the number of accidents by month, day of the month,
28
day of the week, temperature, precipitation, humidity level, and the
discomfort index.
NMA
Target regions
identified
First, the past data on car accidents accumulated by TBN, the NPA, and
other such authorities was used to analyze the frequency of accidents by
location, hour, and day of the week. The accident data was also mapped
with corresponding meteorological data, such as temperature and humidity
level, in order to analyze the correlation between car accidents and weather
conditions. The weighting of each variable was then determined. Moreover,
the “variability” of the risk index at each given spot was used as an additional
variable to make it possible to predict unlikely spots for car accidents as
well. To this model were added time factors, locations, and weather data
so as to estimate the probability of car accidents.
29
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Like multiple other big data projects, this project, too, required extensive
processing and refinement of the given data. The text-type information on
the locations of car accidents, such as “From Dongsingyo Bridge on
Gukchaebosang-ro toward the Jonggak Negeori Junction, in front of the
Jung-gu District Office,” had to be translated into coordinates. The
morphemes and entity names included in the text-based natural language
also had to be processed further. Because the TBN channels in Daegu and
Busan did not use consistent data formats, all relevant data also had to
be refined.
30
Data Used in Analysis
Time
Type Source Quantity Information Provided
Span
31
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
4. Expected Outcomes
32
The broadcast information may be simple in structure, merely telling
listeners about the likely spots for car accidents on the basis of past data.
Listeners, however, are attentive as the program provides scientific and
objective information based on analysis of millions of data items. Drivers
who are near or passing through the forecasted accident spots, in particular,
will naturally take extra caution. This, in turn, will help reduce car accidents.
The project has been running for only a short period of time so far, and
requires continued monitoring in the future. Nevertheless, since the program
started airing in 2016, the car accident death toll has dropped by six
percentage points and the number of injured persons by 10 percent, while
the total number of accidents in downtown Daegu has also decreased by
nine percentage points. This is significant, given that the number of car
accidents is increasing by three or four percentage points annually
elsewhere in Korea.
※ Figures in parentheses represent the percentage increase or decrease over the previous year.
The data for 2016 span the months January through October only.
The RTA expects that the system developed by this project can significantly
help reduce the number of car accidents in Korea when it starts services
nationwide.
33
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Success factors
There are mainly two factors that have led to the success of this project.
First is the clarity of the objective. Neither the quality of the data used nor
the sophistication of the analysis model was as important. The project was
conducted with the sole and very specific objective of generating forecasts
on likely car accidents and providing them in less than a minute on radio.
The clarity of this objective guided the analysis and assessment of car
accident risk in certain cities by hour and location.
34
The dilemma of accuracy
Slight errors in forecast would not land a pilot project like this one in a heap
of criticism. After all, it is better to be safe than sorry. Car accident forecasts
are fundamentally intended to warn drivers so they can avoid accidents.
The accuracy of the forecasts is of secondary concern. The most important
indicator of performance should therefore be how effective it has been in
reducing car accidents.
35
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
The limits inherent to raw data often impose great constraints on the data
analysis process itself. Even with extra-handling, such as deleting erroneous
data and replacing omissions with estimates, it is nearly impossible to
overcome the limitations.
This project was able to forecast car accidents with only certain levels of
frequency. Forecasting accidents with less frequency would have required
far more data. As the project developers sought to provide forecasts on
likely accidents throughout the given area, this insufficiency presented a
major obstacle. The results of simulating the forecasting accuracy of this
project’s data analysis model are listed below.
36
many accidents (data). However, forecast accuracy declines with respect
to spots with record of few accidents due to the insufficiency of data. Busan,
which has far more accidents per location than Daegu, afforded relatively
accurate forecasts regarding the 100 most frequent accident spots. Forecast
accuracy, however, fell below the reliable level even concerning the 50th
most frequent spot and beyond in Daegu due to the relatively less data
available. Of course, it is unlikely for spots with record of few accidents
to figure at the high end of the list of likely accident spots. Nevertheless,
in order to ensure wide use of the model in the future, it is important to
devise a finer-tuned and more accurate technique for analysis to enhance
accuracy with these spots as well.
The quality of data is central to the success of big data projects. It is not
uncommon for such projects to falter and fail in their early stages due to
the less than satisfactory quality. Although big data analysis has emerged
as a pivotal paradigm in technology today, data used to be accorded only
secondary importance until recently, necessary insofar as it supported
application services. Policymakers neglected its importance as a resource,
and failed to prepare and develop systemic infrastructure in advance.
Numerous types of data generated today are of such inferior quality that
they cannot be used in proper analysis. This reflects the absence of a
standard system or rules concerning their collection and generation. Much
data today is generated according to arbitrary rules established by past
system designers or administrators. Even within a single system, different
devices generate data of different formats. The absence of a well-organized
system for managing collected data also leads to its loss or the creation
of unsorted data that cannot be restored successfully. The lack of
consistency in data significantly undermines its usability, whether in the
public or private sector. It is therefore critical for organizations to prepare
and enforce systems that will ensure the systematic generation and
management of quality data.
37
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
The data analysis model developed in this project will evolve and expand
into a better one capable of reflecting diverse local and regional
characteristics in car accidents. The Korean Ministry of Science, ICT, and
Future Planning (MSIP) has recognized the utility and potential of this service
model, and begun to consider supporting its development into a national
version. Negotiations are currently under way to expand the scope of radio
broadcasting services out of Daegu and Busan into Daejeon, Gwangju,
Incheon, and other such major cities, while making the car accident
forecasts available on personal GPS devices in the Seoul-Gyeonggi region.
The RTA intends to complete the evolution of the model into a national
system by 2019, while also extending the reach of forecasts from downtown
areas onto major expressways.
Numerous attempts have been made to diversify the applications of big data
analysis, but few have yet attained tangible success. This project provides
an inspiring example of how big data analysis can be used to promote public
safety. The radio broadcasts will raise public awareness of the value of big
data, and also catalyze the use of its analysis in many diverse fields over
and beyond traffic safety.
38
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Case 3
“KOTRA provides
the Trade Map on
its Korean website,
“According to a recent report, through partnership
mixers and grinders are enjoying with the ITC.”
growing popularity at trade centers
in Bangkok (Thailand), Hanoi
(Vietnam), and Chungking (China).
41
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
42
2. What Was Produced
All exporting businesses want to know which products they should sell and
to which countries. A myriad of factors should be considered to answer
these questions, including which countries have the demand for what
products, how large the markets are, what the tariff and exchange conditions
are, whether there are any political issues between those countries and
Korea, what the cultural and customary characteristics of those countries
are, and how far those countries are, geographically, from Korea. These
43
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
KOTRA DBAES
The data and information provided by these sources are of immense value
to exporting SMEs. At present, however, they remain fragmented and
compartmentalized. It is difficult for SMEs to find the best sources of
information they need. Even if they succeed in finding that information,
accurately understanding and analyzing it would still require the skills of
expert analysts that most SMEs do not have the financial wherewithal to
employ.
44
DBAES at Work: Sample Page
An SME intent on using this service can enter the name of the product
it wishes to export into the website. The website then automatically displays
a list of recommended target countries for the product, as well as detailed
information on why those countries are recommended.
45
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
➡
The company sought the GCL consultation
The company decided
service of KOTRA and accordingly
to target Taiwan and attempted
changed its target country to India,
to export its products there,
successfully exporting USD 20,000 worth
but to no fruitful end.
of goods as of the latter half of 2016.
46
3. Data Analysis
In order for KOTRA to provide information and consultation specific for each
business, it first needs to know the business in detail. KOTRA already
possesses data on the global competence level (GCL) of 20,000 or so of
the 90,000 exporting SMEs in Korea. The GCL data includes details on each
company’s management status, personnel capacity, annual revenue, share
of revenue made up by exports, and the types of products it exports. KOTRA
uses this data to assess the export capacity of each business first.
47
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Next, KOTRA also needs to ascertain the trade conditions of all the countries
with which Korea trades, and make detailed assessments of each. This
process requires consideration of a wide range of information. Accordingly,
KOTRA has developed the Country-Specific Market Appeal Index (CMAI)
to evaluate the potential market appeal of various export products on each
country’s market. This index shows accessibility, attractiveness, growth
potential, and competitiveness of each market abroad. Accessibility is
measured in terms of the tariff rate, trade barriers (including import-
restricting policies) and physical distance from Korea. Attractiveness is
measured in terms of market size and popularity of the given product.
Growth potential is measured using the rates of increase or decrease in
the quantities of the given product a market imports and the rates of change
in the market’s imports from Korea in general. Finally, competitiveness is
measured using the trade balance associated with the given product as well
as its recent performance on the market.
Once the CMAI score is obtained, the GCL scores of the given business
and of the product it wishes to export are added to identify the
recommended target countries. As the DBAES system already stores most
of the information needed to assess each business’ export potential, the
user can simply enter the given business ID and the product code into the
website to generate information well-suited to the needs and characteristics
of that business. For instance, the system recommends markets with
relatively high entry barriers, but that offer great potential, such as Canada
and China, for a lipstick manufacturer with a strong export record. The
system recommends other markets that are smaller and more accessible,
such as Malaysia, for another lipstick manufacturer with a smaller export
record. The system also provides guidance on other support programs and
resources available from the Korean government for small exporters.
48
Linking the GCL & Business Data for Comprehensive Analysis
Not only was it difficult to collect a wide range of diverse data and information
necessary for comprehensive analysis, it was even more difficult to determine
and assign accurate weights to all the variables involved. The four dimensions
of the CMAI―accessibility, attractiveness, growth potential, and
competitiveness of the given market―were crucial to the accuracy of analysis,
and required extensive consideration. There were numerous variables involved,
but the correlations between these variables were also significant, as was the
complexity of those correlations. In order to design a reliable analysis model,
KOTRA sought the participation of academic experts through a separate
research project. The analysis model thus developed was submitted to rigorous
verification with a view to minimizing errors and distortion.
49
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
As with most other big data projects, inconsistency and poor data quality
also made things time-consuming and costly. Integrating data from multiple
sources often raises problems over inconsistency. Organization A, for
example, possesses information on manufacturers of certain products, while
Organization B possesses information on the demand for the same products
in each major country. The two organizations, however, use different product
codes or versions of data. In the meantime, Organization C also claims to
possess information on the demand for given products, but its data do not
match Organization B’s demand data. Data analysts spend extensive amounts
of time processing and refining this inconsistent data. The refining process
alone took up more than a month of the given schedule in this project.
Overseas Market
KOTRA Information on performance of export products
Market Information
information Worldwide Export UN Comtrade, Global trade statistics provided by UN
Statistics etc. Comtrade, ITC, etc.
50
4. Expected Outcomes
Not every segment of Korean society possesses a system for making timely
and informed decisions on the basis of big data analysis. DBAES is thus
expected to play a pioneering role in paving and expanding the information
infrastructure for exporting SMEs in Korea struggling to find and access
quality information on trade and exports and also help them cut the costs
and time required for data analysis. KOTRA intends to undertake similar
projects in the future to help exporting SMEs in Korea grow into strong
players on the international market.
Establishing new
global standards
Enhancing global
competence
Services
tailored to GCL
Fostering businesses
capable of setting global
standards
Enhancing global
competence
Enhancing export
capability
51
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Paving the way for business innovation with big data analysis
The DBAES Project therefore helped to bring the quality of this information
infrastructure up a notch, greatly boosting KOTRA’s confidence in its ability
to make use of big data, and inspiring it to launch plans for future
collaborative projects in this field with various other agencies and
organizations on matters not limited to exports, but also including the
economy and trade in general.
52
5. Future Prospects Are
With the public’s improved understanding and awareness of big data, the
inconsistency in the sizes of data or the types of data-collecting platforms
has been on the decline. Few today think of big data projects as something
to be handled exclusively by ICT experts. Organizations intent on enhancing
their analysis capacity consider increasing the number of statisticians, data
engineers, and data analysts they hire. It is important to internalize these
human resources in order to enhance that organization’s capacity with big
data on a continued basis.
However, equally important to ensure the effective use of big data analysis
is making continued use of existing experts who continue to apply traditional
approaches to problem-solving. These human experts have accumulated
professional experience and knowledge in their respective fields over
decades, and are now seasoned veterans at making reliable decisions using
their unique and traditional methods. Big data may seem to present radically
innovative solutions and even appear capable of catching up with these
human experts’ performance in short spans of time. No big data projects,
however, can ultimately succeed without the active participation of these
human experts. Such projects can maximize their prospects for success
when human experts take on a leading role, with data analysts providing
help as needed.
The DBAES Project, too, made effective use of the consulting records and
data that multiple expert consultants had accumulated over the years. The
data analysis model itself was designed by taking into account their opinions.
53
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
These experts were also the first to try the pilot system and suggest
improvements. Although some expressed doubts regarding the utility of the
DBAES system at first, continued interaction and discussion with them did
considerably much to abate their initial resistance. The collaboration
between these expert consultants and the data specialists went a long way
toward maximizing effectiveness of the resulting system and improving the
quality of advice provided to exporting SMEs.
Copyright issues interfering with the wide use of the DBAES system
Other sources also restrict the public disclosure of their data, “except for
public purpose and in the public interest.” This, too, invites controversy over
what the abstract idea of “the public interest” actually encompasses, and
therefore encourages developers to settle for a very conservative and narrow
reading of the restriction.
The use of diverse forms of data often raises such copyright issues. The
lack of consistency among data providers’ policies on the scope of public
disclosure often presents an obstacle to big data projects. As the data
industry grows, copyright disputes among data providers will likely increase.
It is therefore important to prevent these disputes by enacting proper
54
legislative and policy guidelines.
The DBAES Project, while still in its nascent stage, has allowed KOTRA to
bring together fragmented data from diverse sources and thereby generate
new analyses and insights necessary to provide quality and trustworthy
information for exporting SMEs. KOTRA intends to accumulate more data
for the system and expand its functions and services so that businesses
themselves can conduct the analysis they need of diverse export conditions.
DBAES is expected to help SMEs increase their exports and explore new
markets overseas, while also prompting policymakers to develop new
measures to support their global expansion. Quality information is key to
saving the 90,000 exporting SMEs in Korea from additional unnecessary
hardship.
55
How Big Data
Can Be Used
In the Public
Sector
Case 4
1. Background
Policy events
58
Although Statistics Korea and related agencies have been struggling to
overcome these limitations by diversifying the range of data they collect,
it is still difficult to find a database that affords prompt analysis of the
business cycle.
59
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Credit card transaction data forms an optimal source of timely and accurate
information on changes in the business cycle. Almost 73 percent of all
transactions in 2016 in Korea were made with credit cards. This project
accessed and analyzed the credit card transaction data of Shinhan Card,
boasting the largest credit market share in Korea, in order to develop a
domestic consumption index that provides business cycle information by
region, industry, and income quantile on a weekly and monthly basis.
60
Economy Scanner: A system monitoring the business cycle 24/7
The first phase of the project involved the Korea Environmental Economics
Association (KEEA) and Shinhan Card co-developing a business cycle index
on the basis of card transaction data. The index was based on six monthly
indicators (overall, local, industrial, income quantile, affiliate store size, and
age) and two weekly indicators (overall and income quantile). In the second
phase of the project, the weekly indicators were diversified into six like
the monthly indicators. In addition, 35 service pages were developed to
enable the user to calculate the index scores automatically and visualize
them.
61
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
3. Data Analysis
The consumption index at the heart of this project was developed by a team
of expert economists associated with the KEEA, using data provided by
Shinhan Card, the Credit Finance Association, Statistics Korea, and the Bank
of Korea, and testing the diverse models and approaches in the process.
The resulting consumption index compares the current volume of retail
consumption in Korea to that of 2010, and adjusts it by taking into account
the rate of inflation and seasonal conditions. The raw data on credit card
transactions was modified and processed into data sets by age, income
quantile, affiliate store size, and the like. For accuracy, data on non-retail
transactions, such as utility bill payments and wholesale transactions, were
excluded from analysis.
62
Service Description
The automation of index scoring proceeded in four main steps. First, the
necessary data was collected, including value of approved card payments
and other information provided by the Credit Finance Association and
Statistics Korea. Then it was entered into the index model to calculate the
index scores. The inflation rate and seasonal effects were then added to
adjust the scores. The final scores were then sent to the monitoring
system.
63
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
64
Using MMM Analysis to Detect Unusual Patterns in Weekly Consumption
>> Any consumption above the critical level regarded as unusual
Actually Approved
approved amount
amount estimate
Gap
Approved amount estimate
- Actually approved amount
65
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
66
4. Expected Outcomes
The last two years of the analysis period Test period (2015 onward)
(2013 & 2014)
Monthly Retail
index consumption index
This shows that the project index has a 92 percent match with Statistics
Korea’s in comparison to the previous month, and an 83 percent match with
the same month the previous year.
67
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Moreover, the index developed by this system has the added advantage
of timeliness. It monitors and provides updates on the economic impacts
of diverse national events and emergencies in a prompt manner and also
enables policymakers and businesses to monitor the stimulus effect of
short-term economic measures, such as the Black Friday bargain sale drive.
This will help policymakers make prompt interventions in various economic
situations.
This project provides information that is more detailed than other comparable
types and even provides information on consumption by income quantile.
This is extremely useful to statistics organizations. Statistics Korea, in fact,
plans to use the information generated by this project in measuring official
indicators of the national economy. Furthermore, it intends to expand the
scope of the project model into an official system in the future. The Bank
of Korea, too, plans to use the timely updates to supplement its Consumer
Psychology Index, as discussed and decided by the meeting on deciding
the standard interest rates.
68
5. Future Prospects Are
Using only credit card transaction data to monitor changes in the business
cycle will make it impossible for observers to consider the characteristics
of consumers preferring to do transactions in cash only. Using only a single
credit company’s data, moreover, could lead to biases reflective of the
particular characteristics and tendencies of that company’s customers. The
data used in this project were thus submitted to additional checks for
possible biases. In conclusion, no statistically significant errors emerged.
As the graph below shows, Shinhan Card’s credit card transaction data have
a correlation factor of 0.96 to the similar data of other credit companies,
and of 0.92 to Statistics Korea’s data on retail consumption (which includes
cash transactions). Shinhan Card’s data thus posed no significant concerns
over its reliability.
69
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Monthly Trends in Statistics Korea’s Retail Consumption Index and Shinhan Card’s Credit Card Transaction Data
Statistics Korea’s Retail Consumption Index since 2010 and the monthly trends in Shinhan
Card’s credit card transaction data are correlated by a factor of 0.92. → The data from the
two sources on mid- to long-term trends are highly correlated.
Shinhan
Card
data
Retail
Consumption
Index Correlation factor: 0.92
(Statistics Korea)
The system developed by this project calculates index scores only without
generating any analysis reports. While it visualizes index scores in an
intuitive manner, the information it provides still caters to experts rather
than the public. In order to increase that public’s interest in using it, an
70
additional system is needed that analyzes and explains the economic and
financial data indicated.
Shinhan Card intends to use the system to publish analysis reports of the
business cycle on a regular basis. The company is also considering further
cooperation with the Bank of Korea to conduct research on future business
cycle changes, development of new indices, and expanding the established
system. It also plans to trace and improve the reliability and accuracy of
the indicators it uses, and expand the reach of services into local
governments and research institutions as well. The data and analysis model
used in this project can also be used to produce other indices updated in
a timely manner, including those on gross domestic product (GDP), inflation
rate, and tourism.
This project has paved the way for further cooperation between the public
and private sectors on the development and management of major official
indices.
71
NEAR & Future INSIGHT∣Case Report∣How Big Data Can Be Used In the Public Sector
Epilogue
Analysis of data has been taking place in many different fields for centuries,
but this historical fact should not blind us to the novelty of today’s
phenomenon, in which big data analysis is quickly becoming integral to
almost all areas of human activity.
In order to facilitate and promote big data analysis in Korea, the developers
of pioneering big data projects ought to design their projects carefully from
the very beginning so as to minimize trial and error and ensure that each
attempt will be helpful to enhancing the capability to utilize such data.
72
References
[1] Early Patient Detection System Development: Final Report (NIA, December 30,
2016).
[2] Press releases from the Meeting on the Final Report on Development of the Early
Patient Detection System (HIRA, January 2017).
[4] Data-Based Car Accident Forecasting Service System Development: Final Report
(NIA, December 30, 2016).
[5] Press releases from the Meeting on the Final Report on Development of the
Data-Based Car Accident Forecasting Service System (The IMC, January 2017).
[7] Press releases from the Meeting on the Final Report on Development of the
Data-Based All-In-One Export Service (KOTRA, January 2017).
[8] Development of the Data-Based Business Cycle Monitoring and Early Warning
System: Final Report (NIA, December 30, 2016).
[9] Press releases from the Meeting on the Final Report on Development of the
Data-Based Business Cycle Monitoring and Early Warning System (Shinhan
Card, January 2017).
73
NEAR & Future INSIGHT 2017-8
Published on October 20, 2017