Vous êtes sur la page 1sur 6

A Novel Data Mining Technique To Improve Business Intelligence

Muhammad Sheraz Arshad Malik 1 ,Sadaf Safder 2 , Kiran Huma 3 , Bakhtawar Jabeen 4

Department of Information Technology Government College University Faisalabad, Pakistan sheraz_awan@gcuf.edu.pk 1 ,sadafgcuf.24@gmail.com 2

Ali Ur Rehman

Institute of Business & Management Sciences University of Agriculture Faisalabad, Pakistan

Ranaalirehman143@gmail.com

Muhammad Awais

Department of Software Engineering Government College University Faisalabad, Pakistan awais_java@yahoo.com

AbstractIn the present era, data exploration in business intelligence became a big problem. Information play an important role in the business industry. As data is not classified and segmented in some manner than data exploration became very difficult. The theme of this paper is to use a technique that extracts the information and data pattern by using clustering methodology of data mining. It is different from other tools, where data are a combination of alike items and provide a suitable group of each cluster is required. In this method, a number of transaction are getting from the internet by web mining. These transactions are passing from a cluster based data mining algorithm and a significant key is present to identify a priority combination of the cluster through which information extract. This method is used in the business world to improve the business productivity. The major concept of this script is to provide the extraction of different clustering method of multiple clusters to improve the business intelligence. In the future, it will try to improve the Red box approach by using forecasting method.

(DM);

Analytic Service Model (ASM); Induction method; Association

algorithm;

KeywordsBusiness Intelligence

(BI);

Data Mining

I.

INTRODUCTION

Business is growing rapidly through World Wide Web. There is a collection of data and information exchange through internet [17]. It enables the big organization to complete their function rapidly and more accurately. All the data is stored in the web documents and pages [16].Web mining is a technique through which direct extract the important data from web

documents and pages. This data is used to better understand marketing dynamics and up to data the organization according to existing trends. Red Box is a new technique that used in business intelligence to make it more efficient. In red box technique cluster method is used to extract the information. Clustering is a method of Data mining that uses large databases. It is used to collect the data of same type or same attributes from the information. Web mining method is used here to collect the data from web databases.

Business Intelligence (BI) performs an instant action for the organization on the base of the large and dynamic database. Red Box is performing a major role in improving business. It uses the minimum time to take a good decision. In the present era, organizations mostly use Business Intelligence based software. These software make the business more reliable and economic as they minimize the expenditures and used in promotion of organization products and also minimizes the efforts of employees.

Red box uses k-means model of cluster method. K-means model finds same priority objects. It is a tree-like structure that organizes data which have the same characteristics. The aim of this paper to store all the data according to the date. It identifies the date in which organization position was good and bad and also identify what is the cause behind this. Red box technique used for decision making [6]. Decision making is a tree-like structure in which different factors are involved like store data

in their own database which obtain by using web mining, single domains, and different patterns etc.

.

TABLE I.

II. RELATED WORK

COMPARISON TOOLS OF DATAMINING

 

Tool

Techniques

Function

Reference

SPSS

Decision trees,

Classification,

[1]

CLEMENTINE

Neural Network

Estimation,

Bayesian

Prediction, Affinity

Classification

Grouping,

Regression

Description,

Time

Support Vector

series data

Machine

DB-Miner

OLAP

and

Characterization,

[2]

attribute-oriented

Comparison,

induction,

Association,

Statistical

Classification,

Analysis,

Prediction,

Progressive

Clustering

Deeping

for

mining

multiple-

level Knowledge,

Meta-rule

guided

mining

Data

Mining

Classification Method, NB, J48, MLP,SVM

Decision

making,

[3]

Approach

Classification,

Comparison,

Data

 

cleaning,

Noise

Removal,

Data

scoping

WEKA

Data

Classification,

[4]

preprocessing and

prediction,

visualization,

Comparison,

Attribute

Clustering,

Selection, ONER,

Association,

Decision tree, K-

Decision trees

means,

Cobweb,

Association

rule,

Nearest Neighbor,

Model Evaluation

XL-Miner

Association

rule,

Discriminant

[5]

Classification,

Analysis

Logistic

Clustering,

Regression with best

Prediction,

Time

subset

selection,

series

Classification trees,

Naïve

Bayes

Classifiers,

Neural

Network, K-Nearest

Neighbor

 

III.

MOTIVATION

 

The reason behind to develop a new tool.

A. Data representation and storage

Data find and organize in the cluster. Cluster access data and divide it into parts. All parts are organized in such a way that each part contain same priority and characteristics of data. When it needs to access a specific part of data then represent this data that contains certain characteristics.it makes data recovery easy and simple.

B. Business intelligence

It improved business intelligence in such a way that productivity of organization become developed and increase the rank of organization. It improved skills of business such as data recovery, data management, and data sharing and also manage the cost of data.

C. Data access from the web It improved Red box technique by directly access data from

a web page by using web mining. Web mining refined important

data from web pages and documents. It directly accesses data from the web and then stores it in the form of cluster in their own database by using k-means model.

D. Cost controlling

It improved currents trends of the technology of business by minimizing the cost of the product. It managed cost in such a way that offered benefit to the organizations.

E. Data integration

Data access from different sources like from their own database, from web mining, data warehouses, and different website. It makes easy to analysis and gets the relevant data from all of these sources.

F. Market Share

It improves market share by examining data model. Such as what is the structure of the model and how it is work? What are basic components necessary which used for business intelligence? It also examines how these components used for data analysis to improve market share without any interference.

G. True Decision

It helps in true decision making. It forces on module learning. Module learning develops a good decision making that

is really need for the improvement of business intelligence. Such

as to understand a complicated problem of programming. We must read it thoroughly. If this problem exists already in an organized module then we easily understand it and solve it.

IV. APPLICATION OF REDBOX

Information is derived from business and its pattern is derived by applying methods of Data mining. These methods derived the structure of the database that helps the user to understand it easily. Its usability is increased because it is user-friendly. The red box makes the structure of database user-friendly in such a way that both organization owner and user fulfill its needs which improves profit of the organization. It also maintains a data structure of stored databases. Association rule mining is the second application of Red box in which it examines and evaluates database. Each transaction comprises data of client’s transactions. These data is consist of different items. Rule mining identifies the

relationships among these different items. It is collecting the same priority of items in a group which contain support & confidence values that equal or more than equal values of user- defined support and confidence. These user-defined minimum confidence and minimum support are both of two operations which measure the interestingness of association rule mining. These rules are called interesting association rule. These rules denote a collection of different items. Such as an organization wants to improve its market strategy. [7] It collects information about products which are very popular in the market. This information is got by interesting rule. But this organization also gain information by non-interesting rule such as that product that is unpopular in the market. Because it is also a good information collecting by organization owner to increase its profit. Fig.2 is a data mining model in which different groups of products are identified by ID. The main concept of the Red box is to derive meaningful information from the existing database by using data mining algorithm where each transaction recognizes the Business Intelligence. This manuscript describes data mining, web mining, and active mining tools as an algorithm. In the Fig2 different cluster of products from which transacted items are derived and then red box technique are applied that find out the date on which this transaction is occur. It is also find what the reason behind transactions occurred on the specific date. Fig.1 defined a design model of information which derived from overall transactions, The basic aim of using data mining tool is to improve the predefined business which may be important for the business owner and as well client to take a better decision. So that they increase their business in such a way that they gain more benefit. The red box is a predefined prototype which represents the both Business Intelligence and structure of data mining algorithm. It is the client-oriented system. It helps users to find online products and buy them.

V. DATA MINING TECHNIQUE USED FOR BUSINESS INTELLIGENCE MODEL

The techniques of data mining are developed to work on existing information inside the Analytic Service Model (ASM). ASM is designed in such a way that it analyzes all type of information and it is customer oriented. All kinds of data first pass from Analytic Services Model and then find the sections of these data by using mining tools to get input for the algorithm of data mining framework and also obtain output. The data mining framework may use only for constantly dimension members. Only the specific data is implied in it which used constantly dimension members (not used attribute dimensions or user specific dimensions). It is represented as input data to the Data Mining Framework. Thus, data which is required for analysis must present in standard dimensions and evaluate it inside the cube. Now, examine what the items a customer buys from the market and find characteristics of these items to identify why these items bought with other items. Data mining tools such as prediction, classification, affinity analysis, reduction, exploration, and visualization these all are used for data analysis and identify their characteristics to make clusters [8]. The database is introduced to maintain the data in a systematic way. Data present in the database are organized in the form of table. The table is a collection of rows and tables. Business Intelligence is a method used to access these data. Data mining is an advanced technique which is required to get only mandatory data from their data warehouses and web mining. Data mining improves Business Intelligence by good decision-making method [9]. This paper presents the application of Red box technology which implements the data mining tools in the business intelligence based model. The new idea of data mining technology is design a good methodology by using data mining approaches. This methodology contains a collection of transactions of data. It starts from identification of the problem and then does the process of transaction shown in Fig.1

Business Decision Data Extraction Red Box Knowledge Building BI Data Analysis Model Data Mining Algorithm
Business Decision
Data Extraction
Red Box
Knowledge Building
BI Data Analysis Model
Data Mining Algorithm
Outlet Sales
Data Storage
OLTP
Source Systems
Data
Extraction
Transaction & Loading
Staging

Fig.1 Working on the methodology of the Red box[10]

MILK PRODUCT (P-Id)

MP001

GDL

MP002

NBT

MP003

CHS

MP004

KLF

MP005

ICC

OIL (P-Id)

OL001

SFO

G002

MSO

G003

GNO

G004

OLV

G005

DPO

Vegetable (P-Id)

V001

POT

V002

ONI

V003

GIN

V004

LFN

V005

CAR

Grocery (P-Id)

G001

RIC

G002

KDL

G003

SGR

G004

OIL

G005

SOA

G002 KDL G003 SGR G004 OIL G005 SOA CUST-ID 74251 Trans- ID P-ID ITEMS SD0213 G002
CUST-ID 74251 Trans- ID P-ID ITEMS SD0213 G002 KDL G003 SGR MP003 CHS MP004 KLF
CUST-ID 74251
Trans- ID
P-ID
ITEMS
SD0213
G002
KDL
G003
SGR
MP003
CHS
MP004
KLF
G004
OLV
Tran-ID
Data of Transaction
Report and Analysis
G004 OLV Tran-ID Data of Transaction Report and Analysis Fig. 1. Example of Data Mining model

Fig. 1.

Example of Data Mining model with a different group of the cluster[11]

A. Web Mining Web mining [16; 17] is a data mining technique which is used to mine important data pattern from the web. These data patterns represent behavior and interest of user. It also navigate which data pattern is popular among users.it identifies the relationship between items of data which

improve the structure of strategy. This structure helps the owner to make a better decision. Web mining improves business intelligence. It mine important information from the web and use this information as an input and store in the database and also get the output as when required.

B. Clustering Methodology It is the main technology of data mining. It gathers items in

a group and makes a cluster but it is different from classification

as it is not used for a predefined group of data. Whether it is beneficial for data extraction and find the cluster that data make itself. For example, in a supermarket cluster is made by those items that customers mostly purchase and buy items relatively with other items or maybe another way. For example, assume the sale transactions of a customer superstore. Its grouped clients by sale date contained different cluster emerge: clients who purchase vegetable and oil frequently, clients who purchase meet every time, who buy baby food, milk etc. [10].

C. Decision Trees

Decision trees [11; 12] especially useful for decision making. These are work as a flowchart to make a decision. These decisions create rules for making group of data. Decision trees help to generate an optimized path for a better decision by understanding a minimum number of steps. For example, Classification and Regression Trees (CART). CART give directions for the new dataset to evaluate which group of data will have an appropriate outcome. Decision trees also help new user to find his particular target on the web.

Decision trees are great for decision making as they make a group of customers and products on the base of priority and having same attributes which permit analysis of all kind of dataset. The classification method is applied to calculate a person, object, transaction or an event and how to divide it into groups. If a supermarket owner wants to divide their customers into three groups.

Faithful

Possible to leave

Likely to leave

If he has stored data about customers characteristics and their buying data pattern and then by using classification model he enables how to divide into what type of category [14].

D. Rule Induction

Rule Induction [15] is a derivation of important if-then rules from datasets depends on statistical significance. It explains the statistical correlation among the existences of items in a dataset. Association rule mining is an important type of data mining which use the rule induction technique. It is used to

derived data and identify the relationship between different items of the dataset. It helps in identifying the purchasing attitude and trends of the customers.

E. Association Algorithm

Association Algorithm is used for the referenced engine which depends on analysis of product and market. This helps the customers in selecting the items on the based they purchased earlier. The model is constructed by a database that has identifiers. Each individual as well as a collection of items that

contain cases both having identifiers. This collection of items in

a database are called item set. These algorithms navigate a data to find the objects that present in a case. MINIMUM-SUPPORT

is used for all items which may exist in the case. Market analysis

is also called associative analysis. Popular items that commonly

used together is analyzed by using associative analysis. For example, in a supermarket cluster is made by those items that customers mostly purchase relatively with other items. Such as supermarket owner is not surprised to know if a customer purchase milk, tea, sugar, bread, and jam together.

Associative techniques help the customer to find the group

of items that relatively connect to each other and purchase these

things together. It is also help the owner to place the relative items in the right place. Above described the Data mining techniques like web mining, a clustering method, decision trees, and rule induction and association algorithm [14]. The aim to describe these all element is that to cover all aspects of Red box techniques which may affect the trends and behaviors of Business intelligence for example elements like input, output forms etc.

V. CONCLUSION AND FUTURE WORK

This paper defines Redbox technique, clustering

methodology, Web mining and also explore data mining tools

to

describe how to make cluster by using clustering technology.

It

explores how to recognize the data patterns of a data set or

between the collection of items. The application Redbox approach improves the Business intelligence. It highlights the current business trends which help the user to take a better decision that increase the profit of business. This is suitable for the situation in which a customer needs specific items from the collection of items. This helps the customer in predicting the items that they exactly want on the basis of previous purchasing items. This is best for the same data structure of the database

that performs multiple tasks of business.

In

the future forecasting technique[18] of data mining is used

in

acquiring the more accurate data. When the possibility of

predicting the data is increased then algorithm of forecasting

method is used to suggest a more accurate item.

ACKNOWLEDGMENT

I would like to acknowledge Department of Information

technology of Government College University Faisalbad for its

support.

REFERENCES

[1] Pushpalata Pujari, Jyoti Bala Gupta, “Exploiting Data Mining

[2]

Techniques For Improving Efficiency Of Time Series Data Using SPSS- Clementine,” Journal of Arts, Science & Commerce, ISSN 2231-4172 Jiawei Han, Jenny Y. Chiang, Sonny Chee, Jianping Chen, Qing Chen,

Analytic,Jurnal Teknologi (Sciences & Engineering) 78: 8-2 (2016) 75-

[3]

Shan Cheng, Wan Gong, Micheline Kamber, Krzysztof Koperski, Gang Liu Yijun Lu, Nebojsa Stefanovic, Lara Winstone, Betty B. Xia, Osmar R. Zaiane, Shuhua Zhang, Hua Zhu, DBMiner: A System for Data Mining in Relational Databases and Data Warehouses,Shamini Raja Kumaran, Mohd Shahizan Othman, Lizawati Mi Yusuf,”

Data Mining Approaches In Business Intelligence: Postgraduate Data

79

[4]

Zdravko Markov, Ingrid Russell,” An Introduction to the WEKA Data

[5]

Mining System,B.Sangameshwari1, P. Uma2, “A Survey on Data Mining Techniques In Business Intelligence,” International Journal Of Engineering And Computer ScienceVolume 3 Issue 10 October 2014 Page No. 8575-8582

[6]

Lior Rokach, Oded Maimon, Clustering Method, Department of

[7]

Industrial Engineering, Tel-Aviv University. mari, (2008), Data Mining for Retail Website Design and Enhanced

[8]

Marketing, PP-2. Galit Shmueli, Nitin R. Patel, Peter C. Bruce, (2010). Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel® with XLMiner®, 2 nd Edition.

[9] Prabhu, N.Anbazhagan (2014), A New Hybrid Algorithm for Business Intelligence Recommender System, International Journal of Network Security & Its Applications (IJNSA), Vol.6,

No.2,pp.43-52.

Improving

Business Intelligence,International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 2, Mar-Apr 2016

[11] Tapan

Business Intelligence,International Journal of Computer Science Trends and Technology (IJCST) Volume 4 Issue 2, Mar-Apr 2016

[12] Pat Langley and Herbert A. Simon. Applications of Machine Learning and Rule Induction. Communications of the ACM,38(11):5464, 1995. [13] Agnar Aamodt and Enric Plaza. “Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches. AICom Artificial Intelligence Communications,” 7(1):39– 59, 1994.

Improving

[10] Tapan

Nayak,”

Nayak,”

RedBox-A

RedBox-A

Data

Data

Mining

Mining

Approach

Approach

for

for

[14] Dan

Sullivan,

(2012),

Next

Generation

Business

Intelligence:

Data

Mining

[online]

http://www.tomsitpro.com/articles/business_intelligence data_mining_tools -data_analytics-spssolap_analysis

[15] Pat Langley and Herbert A. Simon. Applications of Machine Learning and Rule Induction. Communications of the ACM, 38(11):5464, 1995.

[16] Jaideep Srivastava, Prasanna Desikan, Vipin Kumar,” Concepts, Applications, and Research Directions

[17] R.

TECHNIQUES OF WEB MINING,” IJCSMC, Vol. 3, Issue. 5, May 2014, pg.331 341 [18] J. Scot Armstrong, Roderick J. Brodie,” Forecasting for Marketing,” Reprinted from Quantitative Methods in Marketing, edited by Graham J. Hooley and Michael K. Hussey (London: International Thomson Business Press, 1999), pages 92-120.

AND

Web Mining

Munilatha,

K.Venkataramana,”

A

STUDY

ON

ISSUES