Vous êtes sur la page 1sur 4

International Journal on Recent and Innovation Trends in Computing and Communication

Volume: 4 Issue: 4

ISSN: 2321-8169
01 - 04

_______________________________________________________________________________________________

The Survey On: Data Mining Data Warehousing & OLAP


Prof. Palash. M. Gourshettiwar
Asst. Prof.
Comp. Sci. Engg.
D.M.I.E.T.R, Sawangi(M).
palash9477@gmail.com

Prof. Dhiraj Shirbhate


Asst. Prof.
Comp. Sci. Engg.
J.D.I.E.T, Yavatmal
shirbhate.dhiraj@gmail.com

Prof. Rushikesh Shete


Asst. Prof.
Comp. Sci. Engg.
D.M.I.E.T.R, Sawangi(M)
sheterushikesh7@gmail.com

Abstract:-This paper gives a review of the Data mining handle. After the investigation of the way of data mining and its significance in
information warehousing is included. It depicts the CRISP-DM standard now being utilized as a part of industry as the standard for an
innovation impartial data mining prepare display. The paper finishes up with a noteworthy delineation of the data mining handle system and the
unsolved issues that offer open doors for research. The approach is both reasonable and theoretically stable to be valuable to both scholastics and
experts.
Keywords: Data mining, Data Warehousing & OLAP methodology

__________________________________________________*****_________________________________________________
I.

INTRODUCTION

DATA MINING
The goal of data mining is to recognize legitimate novel,
possibly valuable and justifiable relationships and examples
in existing data. Finding helpful examples in data is known
by various names (counting information mining) in various
groups (e.g., learning extraction, data disclosure, data
collecting, information prehistoric studies, and information
design preparing). The expression "data mining" is
essentially utilized by analysts, database specialists, and the
MIS and business groups. The term data Discovery in
Databases (KDD) is for the most part used to allude to the
general procedure of finding helpful information from
information, where information mining is a specific stride in
this procedure. The extra strides in the KDD process, for
example,
information
arrangement,
information
determination, information cleaning, and legitimate
translation of the aftereffects of the data mining handle,
guarantee that valuable learning is gotten from the data.
Data mining is an augmentation of conventional information
examination and measurable methodologies in that it fuses
expository procedures drawn from a scope of trains
including, however not constrained to,

numerical analysis,
pattern matching and areas of artificial intelligence
such as

machine learning,
neural networks and genetic algorithms.

While numerous data mining errands take after a


conventional, speculation driven data investigation
approach, it is regular place to utilize a sharp, data driven
approach that energizes the example discovery calculations
to discover valuable patterns, examples, and connections.
Basically, the two sorts of data mining approaches vary in
whether they look to assemble models or to discover
designs. The main approach, worried with building models

is, aside from the issues inborn from the expansive sizes of
the data sets, like traditional exploratory measurable
strategies. The goal is to deliver a general rundown of an
arrangement of data to distinguish and portray the principle
components of the state of the appropriation. Samples of
such models incorporate a bunch examination parcel of an
arrangement of information, a relapse show for expectation,
and a tree-based characterization run the show. In model
building, a qualification is now and then made in the middle
of exact and unthinking models .The previous looks to
model connections without constructing them in light of any
fundamental hypothesis. The last depend on some
hypothesis or system for the hidden information producing
handle. Data mining, nearly by definition, is basically
worried with the operational.
The second kind of data mining approach, design location,
looks to recognize little takeoffs from the standard, to
identify irregular examples of conduct. Samples incorporate
unordinary spending designs in MasterCard utilization (for
misrepresentation identification), sporadic waveforms in
EEG follows, and protests with examples of attributes
dissimilar to others. It is this class of methodologies that
prompted the thought of data mining as looking for "pieces"
of data among the mass of data. When all is said in done,
business databases represent a one of a kind issue for
example extraction due to their intricacy. Intricacy emerges
from irregularities, for example, intermittence, clamor,
equivocalness, and inadequacy. Keeping in mind most data
mining calculations can isolate the impacts of such
unessential qualities in deciding the real example; the
prescient force of the mining calculations may diminish as
the quantity of these irregularities increment.
DATA MINING AND DATA WAREHOUSING
The development of an data stockroom, which
includes data cleaning and information combination, can be
1

IJRITCC | April 2016, Available @ http://www.ijritcc.org

_______________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication


Volume: 4 Issue: 4

ISSN: 2321-8169
01 - 04

_______________________________________________________________________________________________
seen as an essential pre-handling venture for data mining.
Be that as it may, an information stockroom is not a
necessity for data mining. Assembling a substantial data
distribution center that unites data from various sources,
determines data uprightness issues, and loads the data into a
database, can be a colossal assignment, now and again
taking years and costing a large number of dollars. On the
off chance that an data distribution center is not accessible,
the data to be mined can be extricated from one or more
operational or value-based databases, or information shops.
Then again, the data mining database could be a legitimate
or a physical subset of an data distribution center.
Data mining utilizes the data distribution center as
the wellspring of data for learning data disclosure (KDD)
frameworks through an amalgam of manmade brainpower
and insights related strategies to discover affiliations,
successions, characterizations, groups, and forecast.
Almost all data enter the warehouse from the
operational environment. The data are then "cleaned" and
moved into the warehouse.
The data continue to reside in the warehouse until
they reach an age where one of three actions is taken: the
data are purged; the data, together with other information,
are summarized; or the data are archived. An aging process
inside the warehouse moves current data into old detail data.

Data acquisition software (back-end) which


extracts data from legacy systems and external
sources, consolidates and summarizes the data,
and loads them into the data warehouse.
The data warehouse itself contains the data and
associated database software. It is often
referred to as the "target database."
The client (front-end) software, which allows
users and applications (such as DSS and EIS)
to access and analyze data in the warehouse.

These three components may reside on different


platforms, or two or three of them may be on the same
platform. Regardless of the platform combination, all three
components are required.
DATA MINING AND OLAP
The topic of how data warehousing and OLAP
identify with data mining .The relationship can be briefly
caught as takes after: "The ability of OLAP to give
numerous and dynamic perspectives of abridged data in an
data distribution center sets a strong establishment for
fruitful information mining." Therefore, data mining and
OLAP can be seen as instruments than can be utilized to
supplement each other. The term OLAP, remaining for
Online Analytical Processing, is regularly used to depict the
different sorts of question driven examination that are
embraced while investigating the data in a database or an
data stockroom. OLAP accommodates the particular

extraction and review of data from various perspectives;


these perspectives are for the most part alluded to as
measurements. Every measurement can and for the most
part has numerous levels of accumulation, i.e. a period
measurement can be composed into days, weeks, and years.
The fundamental refinement in the middle of
OLAP and information mining is that OLAP is an data
outline/accumulation apparatus, while data mining blossoms
with detail. Data mining permits the robotized disclosure of
understood examples and intriguing learning that is stowing
away in a lot of data. Preceding following up on the
example revealed by data mining, an examiner may utilize
OLAP keeping in mind the end goal to decide the
ramifications of utilizing the found example as a part of
administering a choice. Keeping in mind OLAP is
considered part of the range of choice bolster apparatuses, it
goes above and beyond than the conventional inquiry and
reporting devices. The conventional inquiry and reporting
apparatuses portray "what" is in a database, while OLAP is
utilized to reply "why" certain things are valid in that the
client frames a theory around a relationship and checks it
with a progression of inquiries against the information.
Expressions utilized as a part of OLAP that portray the
different capacities include:

rolling up (producing marginals),


drilling (going down levels of aggregation
the opposite of rolling up),
slicing (conditioning on one variable),
dicing (conditioning on many variables) and
pivoting (rotating the data axes to provide an
alternative presentation of the data.

The fundamental refinement in the middle of OLAP and


data mining is that OLAP is an data outline/accumulation
apparatus, while data mining blossoms with detail. Data
mining permits the robotized disclosure of understood
examples and intriguing learning that is stowing away in a
lot of data. Preceding following up on the example revealed
by data mining, an examiner may utilize OLAP keeping in
mind the end goal to decide the ramifications of utilizing the
found example as a part of administering a choice. Keeping
in mind OLAP is considered part of the range of choice
bolster apparatuses; it goes above and beyond than the
conventional inquiry and reporting devices. The
conventional inquiry and reporting apparatuses portray
"what" is in a database, while OLAP is utilized to reply
"why" certain things are valid in that the client frames a
theory around a relationship and checks it with a progression
of inquiries against the information. Expressions utilized as
a part of OLAP that portray the different capacities include:
2

IJRITCC | April 2016, Available @ http://www.ijritcc.org

_______________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication


Volume: 4 Issue: 4

ISSN: 2321-8169
01 - 04

_______________________________________________________________________________________________
DATA MINING IN PERSPECTIVE
While the term data mining is frequently utilized
rather freely, it is by and large a term that is utilized for a
particular arrangement of exercises, all of which include
separating important new data from information.
Notwithstanding, the term data mining is not new to
analysts. It is a term synonymous with information digging
or information snooping and has been utilized to depict the
procedure of trawling through data in the trust of
distinguishing examples. Data snooping happens when a
given dataset is utilized more than once for deduction or
model determination. The implication is defamatory on the
grounds that an adequately comprehensive pursuit will
unquestionably hurl examples or the likeby definition,
information that are not just uniform contain contrasts that
can be translated as examples. The inconvenience is that a
large portion of these "examples" will basically be a result
of irregular changes, and won't speak to any fundamental
structure in the information. The goal of information
investigation is not to show the transitory irregular examples
existing apart from everything else, except to demonstrate
the hidden structures that offer ascent to predictable and
replicable examples.
In rundown, data mining helps associations
concentrate on the most imperative data accessible in their
current databases. Be that as it may, data mining is just
device; it doesn't dispense with the need to know the
business, to comprehend the information, or to comprehend
the diagnostic strategies included. It should be recalled that
the prescient connections discovered by means of data
mining are not as a matter of course reasons for an activity
or a conduct.

engineering is performed in consultation with the data


mining client and the data mining analyst in order to assist
in achieving business ends.
The IT analyst, who provides access to the hardware,
software and data needed to complete the data mining
project successfully. It is important to note that data mining
is a technology that needs to co-exist harmoniously with
other technologies in the organization. In addition, the data
to be mined could be coming from virtually any existing
system, database, or data warehouse in the organization.
Depending on the scale and scope of the project, multiple
individuals may assume each of the various roles. For
example, a large project would likely need several data
mining analysts and data mining engineers.

ACTORS IN DATA MINING


Data mining is performed by people, many of whom will be
discussed in this tutorial. They include:
The project leader , who has the overall responsibility for
planning, coordinating, executing, and deploying the data
mining project.
The data mining client, who is the business domain expert
that requests the project and utilizes the results, but
generally does not possess the technical skills needed to
participate in the execution of the more technical phases of
the data mining project such as data preparation and
modeling.
The data mining analyst, who thoroughly understands, from
a business perspective, what the client wants to accomplish
and assists in translating those business objectives into
technical requirements to be used in the subsequent
development of the data mining model(s).
The data mining engineer, who develops, interprets and
evaluates the data mining model(s) in light of the business
objectives and business success criteria. Data mining

Many organizations use data mining to help manage all


phases of the customer life cycle, including acquiring new
customers, increasing revenue from existing customers, and
retaining good customers. By determining characteristics of
good customers (profiling), a company can target prospects
with similar characteristics. By profiling customers who
bought a particular product a firm can focus attention on
similar customers who have not bought that product (crossselling). Profiling also enables a company to act to retain
customers who are at risk for leaving (reducing churn or
attrition), because it is usually far less expensive to retain a
customer than acquire a new one. However, profiling
introduces issues of privacy.

II.

THE BUSINESS IMPERATIVE

Data mining offers esteem over an expansive range of


commercial ventures and can be utilized as a vehicle to
expand benefits by diminishing expenses and/or raising
income. A couple of the normal routes in which data mining
can finish those goals are
lowering costs at the beginning of the product life
cycle during research and development;
determining the proper bounds for statistical process
control methods in automated manufacturing
processes;
eliminating expensive mailings to customers who are
unlikely to respond to an offer during a marketing
campaign;
facilitating one-to-one marketing and mass
customization opportunities in customer relationship
management.

Examples of other industries where data mining can make a


contribution include:
Telecommunications and credit card companies are two
of the leaders in applying data mining to detect
fraudulent use of their services.
Insurance companies and stock exchanges are
interested in applying data mining to reduce fraud.
3

IJRITCC | April 2016, Available @ http://www.ijritcc.org

_______________________________________________________________________________________

International Journal on Recent and Innovation Trends in Computing and Communication


Volume: 4 Issue: 4

ISSN: 2321-8169
01 - 04

_______________________________________________________________________________________________

Medical applications use data mining to predict the


effectiveness of surgical procedures, medical tests, or
medications.
Financial firms use data mining to determine market
and industry characteristics as well as to predict
individual company and stock performance.
Retailers make use of data mining to decide which
products to stock in particular stores (and even how to
place them within a store), as well as to assess the
effectiveness of promotions and coupons.
Pharmaceutical firms mine large databases for chemical
compounds and genetic material to discover substances
that might be candidates for development as agents for
the treatments of disease.
III.
THE TECHNICAL IMPERATIVE
Data mining uses
the classical statistical procedures such as logistic
regression, discriminant analysis, and cluster analysis,
machine learning techniques such as neural networks,
decision trees, and genetic algorithms.

In the continuum of data analysis techniques, the disciplines


of statistics and of machine learning often overlap.
CONCLUSIONS
Today, most ventures are effectively gathering and putting
away data in huge databases. A large portion of them have
perceived the potential estimation of these data as a data
hotspot for settling on business choices. The drastically
expanding interest for better choice backing is replied by a
developing accessibility of learning disclosure, and data
mining is one stage at the center of the learning revelation
handle. This instructional exercise has shown how data
mining focuses about creating calculations for extricating
structure from data and how that structure can take the type
of factual examples, models, and connections. These
structures give a premise inside of which to foresee and
suspect when certain occasions happen and when seen at
this level, one starts to comprehend the basic significance of
data mining. Open doors for further research proliferate
especially as the Internet gives organizations an operational
stage for association with their clients all day and all night
without geographic or physical limits. Consequently, from a
key viewpoint, the need to explore the quickly developing
universe of advanced information will depend intensely on
the capacity to adequately oversee and mine the crude
information.
REFERENCES
[1] Berry, M. J., Linoff, G. S. (2000), Mastering Data Mining:
The Art and Science of Customer Relationship Management.
Wiley Computer Publishing, New York.
[2] Chung, H. M., Gray, P. (1999), Special Section: Data
Mining. Journal of Management Information Systems,
(16:1),11-17.
[3] Colin, S. (2000), The CRISP-DM Model: The New Blueprint

[4]
[5]

[6]

[7]

[8]
[9]
[10]

[11]
[12]

[13]

[14]

[15]
[16]
[17]
[18]

[19]

[20]

[21]

[22]

for Data Mining, Journal of Data Warehousing, (5:4), Fall,


13-22. Fayyad, U., Piatetsky-Shapiro, G., and Smyth, R
(1996). "The KDD Process for Extracting Useful Knowledge
from Volumes of Data," Communications of the ACM,
(39:11), pp. 27-34.
Fayyad, U., (2001), The Digital Physics of Data Mining,
Communications of the ACM, March, (44:3), 62-65.
Glymour, C., Madigan D., et al (1996), Statistical
Inference and Data Mining.Communications of the ACM,
(39:11), 35-41.
Goebel, M., Gruenwald, L. (1999), A Survey of Data Mining
and Knowledge Discovery Software Tools, ACM SIGKDD
(Special Interest Group on Knowledge Discovery and Data
Mining) Explorations, June, (1:1), 20-28.
Gray
P.,
Watson,
H.J.
(1998a),
Professional
Briefings...Present and Future Directions in Data
Warehousing, Database for Advances in Information
Systems, Summer, (29:3), 83-90.
Gray, P., Watson, H.J. (1998b), Decision Support in the Data
Warehouse, Upper Saddle River, N.J.
Gray, P. (1997) " Mining for Data Warehousing Gems,"
Information Systems Management, Winter, 82-86.
Han, J., Kamber, M. (2001), Data Mining: Concepts and
Techniques, Morgan-Kaufmann Academic Press, San
Francisco.
Hand, D. J. (1998), Data Mining: Statistics and More?, The
American Statistician, May (52:2), 112-118.
Johnson, R. & Wicheren,D.W. (1998). Applied Multivariate
Statistical Analysis. Prentice Hall, New York.Kennedy, R. L.,
Lee, Y. Roy, B. V. Reed, C. D. & Lippman, R. P. (1997).
Solving Data Mining Problems Through Pattern Recognition.
New Jersey: Prentice Hall Professional Technical Reference.
Kosala, R., Blockeel, H. (2000), Web Mining Research: A
Survey, ACM SIGKDD (Special Interest Group on
Knowledge Discovery and Data Mining) Explorations, June,
(2:1), 1-10.
Langley, P., Simon, H. (1995), Applications of Machine
Learning and Rule Induction,Communications of the ACM,
November, 55-65.
Moeller, R. A. (2001), Distributed Data Warehousing Using
Web Technology, AMACOM, New York .
Peacock, P. R. (1998a) Data Mining in Marketing: Part 1,
Marketing Management, Winter, 9-18.
Peacock, P. R. (1998b) Data Mining in Marketing: Part 2,
Marketing Management, Spring, 15-25.
Rajagopalan, B., Krovi, R. (2002), Benchmarking Data
Mining Algorithms, Journal of Database Management, JanMar, 13, 25-36
Ranjit, B., Sugumaran, V. (1999), Application of Intelligent
Agent Technology for Managerial Data Analysis and
Mining, Database for Advances in Information Systems,
(30:1), 77-94. Sharma, S., Applied Multivariate Techniques,
John Wiley & Sons, Inc. (1996).
Srivastava, J., Cooley, R., Deshpande, M., Tan, P., Web
Usage Mining: Discovery and Applications of Usage Patterns
from Web Data, ACM SIGKDD (Special Interest Group on
Knowledge Discovery and Data Mining) Explorations,
January, (1:2) Tegarden, D.J. (1999) Business Information
Visualization Communications of AIS (1)4
Wells, M. T. (1999), Feature Extraction Construction and
Selection: A Data Mining Perspective, Journal of the
American Statistical Association, (94:448), 1390.
White, H.,A Reality Check for Data Snooping (2000),
Econometrica, (68:5), September, 1097-1126. Witten, I. H.
(2000), Data mining : practical machine learning tools and
techniques with Java implementations, Morgan Kaufman, San
Francisco.
4

IJRITCC | April 2016, Available @ http://www.ijritcc.org

_______________________________________________________________________________________

Vous aimerez peut-être aussi