Vous êtes sur la page 1sur 7



A. Background

Digital document storage is growing rapidly along with the increasing use of computers. These
conditions create problems to access the desired information accurately and quickly. Therefore, even
though most digital documents are stored in the form of text and various efficient algorithms for text
search have been developed, the search technique for all stored document contents is not the right
solution given the growth in the size of stored data generally. Information retrieval (Information
Retrieval) is one branch of science that handles this problem which aims to help users find information
that is relevant to their needs in a short time. Existing information search applications, one of which is
web mining for searches based on keywords with clustering techniques. In addition, text documents and
word count calculations are also carried out in the document. Clustering is done using the CLHM
(Centroid Linkage Hierarchical Method) method. For the number of clusters, the user does not know the
exact number to cluster the documents. For this reason, the Hill Climbing method is used to identify the
movement of variants from each stage of cluster formation and analyze the pattern so that the number
of clusters will be formed automatically. The use of text mining, CLHM clustering and the Hill Climbing
Automatic Clustering process are very easy for users because they produce clusters automatically and
precisely in a fast time.

B. Purpose

1. To fulfill the task of the Database Management System course.

2. To find out more about Data Mining.

3. To increase knowledge.

C. Problem Formulation

1. What is Data Mining?

2. What are the functions and objectives of Data Mining?

3. How is the application of Data Mining in life?



A. Understanding Data Mining

Lots of definitions of what data mining is. Data mining is a tool that allows users to quickly access large
amounts of data. More specific understanding of data mining, which is a tool and application using
statistical analysis on data. Data mining is a process of extracting or extracting large amounts of data and
information, which is not yet known, but can be understood and useful from large databases and is used
to make a very important business decision. Data mining describes a collection of techniques in order to
find unknown patterns in the data that has been collected. Data mining allows users to find knowledge
in database data that the user is not likely to know.

Data mining is a semi-automatic process that uses statistical techniques, mathematics, artificial
intelligence, and machine learning to extract and identify potential and useful knowledge information
stored in large databases. (Turban et al, 2005). Data mining is part of the KDD process (Knowledge
Discovery in Databases) which consists of several stages such as data selection, pre-processing,
transformation, data mining, and outcome evaluation (Maimon and Last, 2000). KDD is also commonly
known as a database.

B. Functions and Objectives of Data Mining

1. Data mining function

Data Mining identifies facts or conclusions suggested based on filtering through data to explore patterns
or data anomalies. Data Mining has 5 functions:

a. Classification

Classification, which concludes the definitions of the characteristics of a group. Example: company
customers who have shifted competition to other companies.

b. Clustering

Clustering, which is identifying groups of goods or products that have special characteristics (clustering
is different from classification, where there are no definitions of crew characteristics in the clustering
given during classification.)

c. Association

Association, which is identifying the relationship between events that occur at a time, such as the
contents of a shopping basket.

d. Sequencing

Similar to association, sequencing identifies different relationships over a period of time, such as
customers who visit the supermarket repeatedly.

e. Forecasting

Forecasting estimates value in the future based on patterns with large data sets, such as forecasting
market demand.
2. Data Mining Objectives

Data mining goals include:

a. Explanatory

To explain some research conditions, such as why pickup truck sales are increasing in Colorado.

b. Confirmatory

To reinforce the hypothesis, as well as 2 times the family income prefers to buy family equipment,
compared to one family income.

c. Exploratory

Analyzing data for new relationships that are not expected, as well as what patterns are suitable for
credit card fraud cases.

C. Application of Data Mining

In what fields can data mining be applied? Following are some examples of data mining application

- Market and management analysis

Solutions that can be solved with Data Mining, including: Shooting the target market, Seeing the buying
patterns of users from time to time, Cross-Market analysis, Customer Profiles, Identifying Customer
needs, Assessing Customer loyalty, Information Summary.

a. Company Analysis and Risk Management

Solutions that can be solved with data mining include: Financial planning and asset evaluation, resource
planning (competition planning).

b. Telecommunication

A telecommunications company applies data mining to see millions of transactions that enter, which
transactions must be handled manually

c. Finance

The Financial Crimes Enforcement Network in the United States recently used data mining to mine
trillions of various subjects such as property, bank accounts and other financial transactions to detect
suspicious financial transactions (such as money laundering)

d. Insurance
Australian Health Insurance Commission uses data mining to identify health services that are actually
unnecessary but still carried out by insurance participants.

e. Sports

IBM Advanced Scout uses data mining to analyze NBA game statistics (number of blocked shots, assists
and fouls) in order to achieve competitive advantage for the New York Knicks and Miami Heat teams.

D. Data Mining Methodology

as one part of an information system, data mining provides planning from ideas to final implementation.
The components of a data mining plan are as follows.

1. Problem Analysis (Analyzing the Problem)

Origin data or source data must be estimated to see whether the data meets the data mining criteria.

The quality of data abundance is the main factor for deciding whether the data is suitable and available
as an addition. The results expected from the impact of data mining must be carefully understood and
ensured that the data needed brings information that can be extracted.

2. Extracting and Cleaning Data (Extracting and Cleansing The Data)

The first data can be extracted from the original data, such as OLTP databases, text files, Microsoft
Access Databases, and even from spreadsheets, then the data is placed in a data warehouse that has a
structure that matches the data model typically.

Data Transformation Service (DTS) is used to extract and clean data from inconsistencies and
incompatibilities with the appropriate format.

3. Data Validity (Validating the Data)

Once the data has been extracted and cleaned, this is a good exercise to trace the model we have
created to ensure that all existing data is current and fixed data.

4. Making and Training the Model (Creatig and Training the Model)

When the algorithm is applied to the model, a structure has been built. It is very important at this time
to look at the data that has been built to ensure that the data resembles facts in the source data.

5. Data Query from Model Data Mining (Querying the Data Model)

When suitable models are created and built, data has been made available to support the decision. This
usually involves writing front end application queries with application programs / database programs.

6. Validation Evaluation of the Mining Model (Maintaining the Validity of the Data Mining Model)

After the data mining module is collected, over a period of time, initial data characteristics such as
granularity and validity may change. Because the data mining model can continue to change over time.

E. Data Mining Process

The phases start from the raw data and end with the knowledge or information that has been
processed, which is obtained as a result of the following stages:

a. Cleansing data, also known as data cleansing, is a phase where data is incomplete, contains errors and
inconsistencies removed from the data collection, so that the relevant clean data can be used to be
reprocessed for discovery knowledge

b. Data Integration, at this stage there is data integration, where data sources are repeated (multiple
data), repeated files (multiple files), can be combined and combined into a source.

c. Data Selection, in this step, data relevant to the analysis can be selected and received from existing
data collections.

d. Data Transformation, also known as data consolidation. At this stage, where the selected data is
transformed into forms suitable for excavation procedures (meaning proedure) by normalizing and
aggregating data.

e. Data Mining, this stage is the most important stage, using techniques that are applied to extract
potential useful patterns.

f. Pattern Evaluation, at this stage, interesting patterns clearly present knowledge has been identified
based on the measure that has been given.

g. Knowledge Representation, this is the last stage where knowledge that has been discovered is visually
displayed to the user. This important stage uses visualization techniques to help users understand and
interpret the results of data mining.

F. Engineering Data Mining

Before knowing the techniques that can be used in data mining there are four operations that can be
linked to data mining as follows.

a. Predictive modeling, there are two techniques that can be done in predictive modeling, namely:

· Classification

Used to make an initial guess about a class that is specific to each record in the database from a possible
set of class values

· Value Prediction

Used to estimate the continuous numeric value associated with the database record. This technique
uses classical statistical techniques from linear regression and nonlinear regression.

b. Database segmentation

The purpose of database segmentation is to partition the database into a number of segments, clusters,
or records that are the same, where the record is expected to be homogeneous.

c. Prediction
Analysts use this technique to build an effective predictive model.

The Decision Tree has the following advantages:

a. The Decision Tree is easy to understand and interpret.

b. Data preparation for decision trees is primarily and not needed.

c. The decision tree can overcome both nominal and categorical data.

d. Decision tree is a white box model.

e. Decision tree can validate the model with statistical tests. This will make it possible to calculate the
reliability of the model.

The Decision Tree is a powerful technique, can work well with large data in a short time.


A. Conclusion

In the Data Mining process the most important thing is in the "Data Mining" stage by using techniques
that are applied to extract potential useful patterns.

B. Suggestions

The following are suggestions that might need to be carried out in future developments in data mining
applications using the clustering method:

· In the future in the next development, in order to maximize the support of the decisions that will be
taken, for example to facilitate promotional activities, a facility can be added in the form of an email
sending facility to the customer.

· In this case study, items used as process data in forming a cluster are only based on one item, namely
reading the frequency of the existing customer id in the transaction table based on the existing
customer id in the customer table. On further development it is recommended that process data
retrieval is not only based on one item, it might also be possible to read more than one item. For
example, the item ID or the total price paid for the transaction, so that it can be known what items are
usually purchased by the customer in a cluster or the amount of the total price paid by the customer for
the transaction he did. Thus the cluster formed is not only three clusters but may be more than that and
the information obtained becomes even more.

Indrajani, S.Kom., MM. (2011). Introduction and Database Systems. PT Elex Media Komputindo, Jakarta.