Vous êtes sur la page 1sur 25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

Datamining
FromWikipedia,thefreeencyclopedia

Datamining(theanalysisstepofthe"KnowledgeDiscoveryinDatabases"process,orKDD),[1]an
interdisciplinarysubfieldofcomputerscience,[2][3][4]isthecomputationalprocessofdiscoveringpatternsin
largedatasetsinvolvingmethodsattheintersectionofartificialintelligence,machinelearning,statistics,
anddatabasesystems.[2]Theoverallgoalofthedataminingprocessistoextractinformationfromadata
setandtransformitintoanunderstandablestructureforfurtheruse.[2]Asidefromtherawanalysisstep,it
involvesdatabaseanddatamanagementaspects,datapreprocessing,modelandinferenceconsiderations,
interestingnessmetrics,complexityconsiderations,postprocessingofdiscoveredstructures,visualization,
andonlineupdating.[2]
Thetermisamisnomer,becausethegoalistheextractionofpatternsandknowledgefromlargeamountof
data,nottheextractionofdataitself.[5]Italsoisabuzzword,[6]andisfrequentlyalsoappliedtoanyformof
largescaledataorinformationprocessing(collection,extraction,warehousing,analysis,andstatistics)as
wellasanyapplicationofcomputerdecisionsupportsystem,includingartificialintelligence,machine
learning,andbusinessintelligence.Thepopularbook"Datamining:Practicalmachinelearningtoolsand
techniqueswithJava"[7](whichcoversmostlymachinelearningmaterial)wasoriginallytobenamedjust
"Practicalmachinelearning",andtheterm"datamining"wasonlyaddedformarketingreasons.[8]Often
themoregeneralterms"(largescale)dataanalysis",or"analytics"orwhenreferringtoactualmethods,
artificialintelligenceandmachinelearningaremoreappropriate.
Theactualdataminingtaskistheautomaticorsemiautomaticanalysisoflargequantitiesofdatatoextract
previouslyunknowninterestingpatternssuchasgroupsofdatarecords(clusteranalysis),unusualrecords
(anomalydetection)anddependencies(associationrulemining).Thisusuallyinvolvesusingdatabase
techniquessuchasspatialindices.Thesepatternscanthenbeseenasakindofsummaryoftheinputdata,
andmaybeusedinfurtheranalysisor,forexample,inmachinelearningandpredictiveanalytics.For
example,thedataminingstepmightidentifymultiplegroupsinthedata,whichcanthenbeusedtoobtain
moreaccuratepredictionresultsbyadecisionsupportsystem.Neitherthedatacollection,datapreparation,
norresultinterpretationandreportingarepartofthedataminingstep,butdobelongtotheoverallKDD
processasadditionalsteps.
Therelatedtermsdatadredging,datafishing,anddatasnoopingrefertotheuseofdataminingmethodsto
samplepartsofalargerpopulationdatasetthatare(ormaybe)toosmallforreliablestatisticalinferences
tobemadeaboutthevalidityofanypatternsdiscovered.Thesemethodscan,however,beusedincreating
newhypothesestotestagainstthelargerdatapopulations.

Contents
1Etymology
2Background
2.1Researchandevolution
3Process
http://en.wikipedia.org/wiki/Data_mining

1/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

3.1Preprocessing
3.2Datamining
3.3Resultsvalidation
4Standards
5Notableuses
5.1Games
5.2Business
5.3Scienceandengineering
5.4Humanrights
5.5Medicaldatamining
5.6Spatialdatamining
5.7Temporaldatamining
5.8Sensordatamining
5.9Visualdatamining
5.10Musicdatamining
5.11Surveillance
5.12Patternmining
5.13Subjectbaseddatamining
5.14Knowledgegrid
6Privacyconcernsandethics
6.1SituationinEurope
6.2SituationintheUnitedStates
7CopyrightLaw
7.1SituationinEurope
7.2SituationintheUnitedStates
8Software
8.1Freeopensourcedataminingsoftwareandapplications
8.2Commercialdataminingsoftwareandapplications
8.3Marketplacesurveys
9Seealso
10References
11Furtherreading
12Externallinks

Etymology
http://en.wikipedia.org/wiki/Data_mining

2/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

Inthe1960s,statisticiansusedtermslike"DataFishing"or"DataDredging"torefertowhatthey
consideredthebadpracticeofanalyzingdatawithoutanapriorihypothesis.Theterm"DataMining"
appearedaround1990inthedatabasecommunity.Forashorttimein1980s,aphrase"databasemining",
wasused,butsinceitwastrademarkedbyHNC,aSanDiegobasedcompany(nowmergedintoFICO),to
pitchtheirDatabaseMiningWorkstation[9]researchersconsequentlyturnedto"datamining".Otherterms
usedincludeDataArchaeology,InformationHarvesting,InformationDiscovery,KnowledgeExtraction,
etc.GregoryPiatetskyShapirocoinedtheterm"KnowledgeDiscoveryinDatabases"forthefirstworkshop
onthesametopic(KDD1989)(http://www.kdnuggets.com/meetings/kdd89/)andthistermbecamemore
popularinAIandMachineLearningCommunity.However,thetermdataminingbecamemorepopularin
thebusinessandpresscommunities.[10]Currently,DataMiningandKnowledgeDiscoveryareused
interchangeably.Sinceabout2007,"PredictiveAnalytics"andsince2011,"DataScience"termswerealso
usedtodescribethisfield.

Background
Themanualextractionofpatternsfromdatahasoccurredforcenturies.Earlymethodsofidentifying
patternsindataincludeBayes'theorem(1700s)andregressionanalysis(1800s).Theproliferation,ubiquity
andincreasingpowerofcomputertechnologyhasdramaticallyincreaseddatacollection,storage,and
manipulationability.Asdatasetshavegrowninsizeandcomplexity,direct"handson"dataanalysishas
increasinglybeenaugmentedwithindirect,automateddataprocessing,aidedbyotherdiscoveriesin
computerscience,suchasneuralnetworks,clusteranalysis,geneticalgorithms(1950s),decisiontreesand
decisionrules(1960s),andsupportvectormachines(1990s).Dataminingistheprocessofapplyingthese
methodswiththeintentionofuncoveringhiddenpatterns[11]inlargedatasets.Itbridgesthegapfrom
appliedstatisticsandartificialintelligence(whichusuallyprovidethemathematicalbackground)to
databasemanagementbyexploitingthewaydataisstoredandindexedindatabasestoexecutetheactual
learninganddiscoveryalgorithmsmoreefficiently,allowingsuchmethodstobeappliedtoeverlargerdata
sets.

Researchandevolution
ThepremierprofessionalbodyinthefieldistheAssociationforComputingMachinery's(ACM)Special
InterestGroup(SIG)onKnowledgeDiscoveryandDataMining(SIGKDD).[12][13]Since1989thisACM
SIGhashostedanannualinternationalconferenceandpublisheditsproceedings,[14]andsince1999ithas
publishedabiannualacademicjournaltitled"SIGKDDExplorations".[15]
Computerscienceconferencesondatamininginclude:
CIKMConferenceACMConferenceonInformationandKnowledgeManagement
DMINConferenceInternationalConferenceonDataMining
DMKDConferenceResearchIssuesonDataMiningandKnowledgeDiscovery
ECDMConferenceEuropeanConferenceonDataMining
ECMLPKDDConferenceEuropeanConferenceonMachineLearningandPrinciplesandPractice
ofKnowledgeDiscoveryinDatabases
EDMConferenceInternationalConferenceonEducationalDataMining
http://en.wikipedia.org/wiki/Data_mining

3/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

ICDMConferenceIEEEInternationalConferenceonDataMining
KDDConferenceACMSIGKDDConferenceonKnowledgeDiscoveryandDataMining
MLDMConferenceMachineLearningandDataMininginPatternRecognition
PAKDDConferenceTheannualPacificAsiaConferenceonKnowledgeDiscoveryandData
Mining
PAWConferencePredictiveAnalyticsWorld
SDMConferenceSIAMInternationalConferenceonDataMining(SIAM)
SSTDSymposiumSymposiumonSpatialandTemporalDatabases
WSDMConferenceACMConferenceonWebSearchandDataMining
Dataminingtopicsarealsopresentonmanydatamanagement/databaseconferencessuchastheICDE
Conference,SIGMODConferenceandInternationalConferenceonVeryLargeDataBases

Process
TheKnowledgeDiscoveryinDatabases(KDD)processiscommonlydefinedwiththestages:
(1)Selection
(2)Preprocessing
(3)Transformation
(4)DataMining
(5)Interpretation/Evaluation.[1]
Itexists,however,inmanyvariationsonthistheme,suchastheCrossIndustryStandardProcessforData
Mining(CRISPDM)whichdefinessixphases:
(1)BusinessUnderstanding
(2)DataUnderstanding
(3)DataPreparation
(4)Modeling
(5)Evaluation
(6)Deployment
orasimplifiedprocesssuchas(1)preprocessing,(2)datamining,and(3)resultsvalidation.
Pollsconductedin2002,2004,and2007showthattheCRISPDMmethodologyistheleading
methodologyusedbydataminers.[16][17][18]Theonlyotherdataminingstandardnamedinthesepollswas
SEMMA.However,34timesasmanypeoplereportedusingCRISPDM.Severalteamsofresearchers
havepublishedreviewsofdataminingprocessmodels,[19][20]andAzevedoandSantosconducteda
comparisonofCRISPDMandSEMMAin2008.[21]
http://en.wikipedia.org/wiki/Data_mining

4/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

Preprocessing
Beforedataminingalgorithmscanbeused,atargetdatasetmustbeassembled.Asdataminingcanonly
uncoverpatternsactuallypresentinthedata,thetargetdatasetmustbelargeenoughtocontainthese
patternswhileremainingconciseenoughtobeminedwithinanacceptabletimelimit.Acommonsourcefor
dataisadatamartordatawarehouse.Preprocessingisessentialtoanalyzethemultivariatedatasetsbefore
datamining.Thetargetsetisthencleaned.Datacleaningremovestheobservationscontainingnoiseand
thosewithmissingdata.

Datamining
Datamininginvolvessixcommonclassesoftasks:[1]
Anomalydetection(Outlier/change/deviationdetection)Theidentificationofunusualdatarecords,
thatmightbeinterestingordataerrorsthatrequirefurtherinvestigation.
Associationrulelearning(Dependencymodeling)Searchesforrelationshipsbetweenvariables.For
exampleasupermarketmightgatherdataoncustomerpurchasinghabits.Usingassociationrule
learning,thesupermarketcandeterminewhichproductsarefrequentlyboughttogetherandusethis
informationformarketingpurposes.Thisissometimesreferredtoasmarketbasketanalysis.
Clusteringisthetaskofdiscoveringgroupsandstructuresinthedatathatareinsomewayor
another"similar",withoutusingknownstructuresinthedata.
Classificationisthetaskofgeneralizingknownstructuretoapplytonewdata.Forexample,ane
mailprogrammightattempttoclassifyanemailas"legitimate"oras"spam".
Regressionattemptstofindafunctionwhichmodelsthedatawiththeleasterror.
Summarizationprovidingamorecompactrepresentationofthedataset,includingvisualizationand
reportgeneration.

Resultsvalidation
Dataminingcanunintentionallybemisused,andcanthenproduceresultswhichappeartobesignificant
butwhichdonotactuallypredictfuturebehaviorandcannotbereproducedonanewsampleofdataand
bearlittleuse.Oftenthisresultsfrominvestigatingtoomanyhypothesesandnotperformingproper
statisticalhypothesistesting.Asimpleversionofthisprobleminmachinelearningisknownasoverfitting,
butthesameproblemcanariseatdifferentphasesoftheprocessandthusatrain/testsplitwhenapplicable
atallmaynotbesufficienttopreventthisfromhappening.
Thefinalstepofknowledgediscoveryfromdataistoverifythatthepatternsproducedbythedatamining
algorithmsoccurinthewiderdataset.Notallpatternsfoundbythedataminingalgorithmsarenecessarily
valid.Itiscommonforthedataminingalgorithmstofindpatternsinthetrainingsetwhicharenotpresent
http://en.wikipedia.org/wiki/Data_mining

5/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

inthegeneraldataset.Thisiscalledoverfitting.Toovercomethis,theevaluationusesatestsetofdataon
whichthedataminingalgorithmwasnottrained.Thelearnedpatternsareappliedtothistestset,andthe
resultingoutputiscomparedtothedesiredoutput.Forexample,adataminingalgorithmtryingto
distinguish"spam"from"legitimate"emailswouldbetrainedonatrainingsetofsampleemails.Once
trained,thelearnedpatternswouldbeappliedtothetestsetofemailsonwhichithadnotbeentrained.The
accuracyofthepatternscanthenbemeasuredfromhowmanyemailstheycorrectlyclassify.Anumberof
statisticalmethodsmaybeusedtoevaluatethealgorithm,suchasROCcurves.
Ifthelearnedpatternsdonotmeetthedesiredstandards,subsequentlyitisnecessarytoreevaluateand
changethepreprocessinganddataminingsteps.Ifthelearnedpatternsdomeetthedesiredstandards,then
thefinalstepistointerpretthelearnedpatternsandturnthemintoknowledge.

Standards
Therehavebeensomeeffortstodefinestandardsforthedataminingprocess,forexamplethe1999
EuropeanCrossIndustryStandardProcessforDataMining(CRISPDM1.0)andthe2004JavaData
Miningstandard(JDM1.0).Developmentonsuccessorstotheseprocesses(CRISPDM2.0andJDM2.0)
wasactivein2006,buthasstalledsince.JDM2.0waswithdrawnwithoutreachingafinaldraft.
Forexchangingtheextractedmodelsinparticularforuseinpredictiveanalyticsthekeystandardisthe
PredictiveModelMarkupLanguage(PMML),whichisanXMLbasedlanguagedevelopedbytheData
MiningGroup(DMG)andsupportedasexchangeformatbymanydataminingapplications.Asthename
suggests,itonlycoverspredictionmodels,aparticulardataminingtaskofhighimportancetobusiness
applications.However,extensionstocover(forexample)subspaceclusteringhavebeenproposed
independentlyoftheDMG.[22]

Notableuses
SeealsoCategory:Applieddatamining.

Games
Sincetheearly1960s,withtheavailabilityoforaclesforcertaincombinatorialgames,alsocalled
tablebases(e.g.for3x3chess)withanybeginningconfiguration,smallboarddotsandboxes,smallboard
hex,andcertainendgamesinchess,dotsandboxes,andhexanewareafordatamininghasbeenopened.
Thisistheextractionofhumanusablestrategiesfromtheseoracles.Currentpatternrecognitionapproaches
donotseemtofullyacquirethehighlevelofabstractionrequiredtobeappliedsuccessfully.Instead,
extensiveexperimentationwiththetablebasescombinedwithanintensivestudyoftablebaseanswersto
welldesignedproblems,andwithknowledgeofpriorart(i.e.,pretablebaseknowledge)isusedtoyield
insightfulpatterns.Berlekamp(indotsandboxes,etc.)andJohnNunn(inchessendgames)arenotable
examplesofresearchersdoingthiswork,thoughtheywerenotandarenotinvolvedintablebase
generation.

Business

http://en.wikipedia.org/wiki/Data_mining

6/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

Inbusiness,dataminingistheanalysisofhistoricalbusinessactivities,storedasstaticdataindata
warehousedatabases.Thegoalistorevealhiddenpatternsandtrends.Dataminingsoftwareusesadvanced
patternrecognitionalgorithmstosiftthroughlargeamountsofdatatoassistindiscoveringpreviously
unknownstrategicbusinessinformation.Examplesofwhatbusinessesusedataminingforinclude
performingmarketanalysistoidentifynewproductbundles,findingtherootcauseofmanufacturing
problems,topreventcustomerattritionandacquirenewcustomers,crossselltoexistingcustomers,and
profilecustomerswithmoreaccuracy.[23]
Intodaysworldrawdataisbeingcollectedbycompaniesatanexplodingrate.Forexample,
Walmartprocessesover20millionpointofsaletransactionseveryday.Thisinformationisstoredin
acentralizeddatabase,butwouldbeuselesswithoutsometypeofdataminingsoftwaretoanalyzeit.
IfWalmartanalyzedtheirpointofsaledatawithdataminingtechniquestheywouldbeableto
determinesalestrends,developmarketingcampaigns,andmoreaccuratelypredictcustomer
loyalty.[24]
Everytimeacreditcardorastoreloyaltycardisbeingused,orawarrantycardisbeingfilled,datais
beingcollectedabouttheusersbehavior.Manypeoplefindtheamountofinformationstoredaboutus
fromcompanies,suchasGoogle,Facebook,andAmazon,disturbingandareconcernedabout
privacy.Althoughthereisthepotentialforourpersonaldatatobeusedinharmful,orunwanted,
waysitisalsobeingusedtomakeourlivesbetter.Forexample,FordandAudihopetooneday
collectinformationaboutcustomerdrivingpatternssotheycanrecommendsaferroutesandwarn
driversaboutdangerousroadconditions.[25]
Dataminingincustomerrelationshipmanagementapplicationscancontributesignificantlytothe
bottomline.Ratherthanrandomlycontactingaprospectorcustomerthroughacallcenterorsending
mail,acompanycanconcentrateitseffortsonprospectsthatarepredictedtohaveahighlikelihood
ofrespondingtoanoffer.Moresophisticatedmethodsmaybeusedtooptimizeresourcesacross
campaignssothatonemaypredicttowhichchannelandtowhichofferanindividualismostlikelyto
respond(acrossallpotentialoffers).Additionally,sophisticatedapplicationscouldbeusedto
automatemailing.Oncetheresultsfromdatamining(potentialprospect/customerandchannel/offer)
aredetermined,this"sophisticatedapplication"caneitherautomaticallysendanemailoraregular
mail.Finally,incaseswheremanypeoplewilltakeanactionwithoutanoffer,"upliftmodeling"can
beusedtodeterminewhichpeoplehavethegreatestincreaseinresponseifgivenanoffer.Uplift
modelingtherebyenablesmarketerstofocusmailingsandoffersonpersuadablepeople,andnotto
sendofferstopeoplewhowillbuytheproductwithoutanoffer.Dataclusteringcanalsobeusedto
automaticallydiscoverthesegmentsorgroupswithinacustomerdataset.
Businessesemployingdataminingmayseeareturnoninvestment,butalsotheyrecognizethatthe
numberofpredictivemodelscanquicklybecomeverylarge.Forexample,ratherthanusingone
http://en.wikipedia.org/wiki/Data_mining

7/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

modeltopredicthowmanycustomerswillchurn,abusinessmaychoosetobuildaseparatemodel
foreachregionandcustomertype.Insituationswherealargenumberofmodelsneedtobe
maintained,somebusinessesturntomoreautomateddataminingmethodologies.
Dataminingcanbehelpfultohumanresources(HR)departmentsinidentifyingthecharacteristicsof
theirmostsuccessfulemployees.Informationobtainedsuchasuniversitiesattendedbyhighly
successfulemployeescanhelpHRfocusrecruitingeffortsaccordingly.Additionally,Strategic
EnterpriseManagementapplicationshelpacompanytranslatecorporatelevelgoals,suchasprofit
andmarginsharetargets,intooperationaldecisions,suchasproductionplansandworkforce
levels.[26]
Marketbasketanalysis,relatestodatamininguseinretailsales.Ifaclothingstorerecordsthe
purchasesofcustomers,adataminingsystemcouldidentifythosecustomerswhofavorsilkshirts
overcottonones.Althoughsomeexplanationsofrelationshipsmaybedifficult,takingadvantageof
itiseasier.Theexampledealswithassociationruleswithintransactionbaseddata.Notalldataare
transactionbasedandlogical,orinexactrulesmayalsobepresentwithinadatabase.
MarketbasketanalysishasbeenusedtoidentifythepurchasepatternsoftheAlphaConsumer.
Analyzingthedatacollectedonthistypeofuserhasallowedcompaniestopredictfuturebuying
trendsandforecastsupplydemands.
Dataminingisahighlyeffectivetoolinthecatalogmarketingindustry.Catalogershavearich
databaseofhistoryoftheircustomertransactionsformillionsofcustomersdatingbackanumberof
years.Dataminingtoolscanidentifypatternsamongcustomersandhelpidentifythemostlikely
customerstorespondtoupcomingmailingcampaigns.
Dataminingforbusinessapplicationscanbeintegratedintoacomplexmodelinganddecision
makingprocess.[27]Reactivebusinessintelligence(RBI)advocatesa"holistic"approachthat
integratesdatamining,modeling,andinteractivevisualizationintoanendtoenddiscoveryand
continuousinnovationprocesspoweredbyhumanandautomatedlearning.[28]
Intheareaofdecisionmaking,theRBIapproachhasbeenusedtomineknowledgethatis
progressivelyacquiredfromthedecisionmaker,andthenselftunethedecisionmethod
accordingly.[29]Therelationbetweenthequalityofadataminingsystemandtheamountof
investmentthatthedecisionmakeriswillingtomakewasformalizedbyprovidinganeconomic
perspectiveonthevalueofextractedknowledgeintermsofitspayofftotheorganization[27]This
decisiontheoreticclassificationframework[27]wasappliedtoarealworldsemiconductorwafer
manufacturingline,wheredecisionrulesforeffectivelymonitoringandcontrollingthe
http://en.wikipedia.org/wiki/Data_mining

8/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

semiconductorwaferfabricationlineweredeveloped.[30]
Anexampleofdataminingrelatedtoanintegratedcircuit(IC)productionlineisdescribedinthe
paper"MiningICTestDatatoOptimizeVLSITesting."[31]Inthispaper,theapplicationofdata
mininganddecisionanalysistotheproblemofdielevelfunctionaltestingisdescribed.Experiments
mentioneddemonstratetheabilitytoapplyasystemofmininghistoricaldietestdatatocreatea
probabilisticmodelofpatternsofdiefailure.Thesepatternsarethenutilizedtodecide,inrealtime,
whichdietotestnextandwhentostoptesting.Thissystemhasbeenshown,basedonexperiments
withhistoricaltestdata,tohavethepotentialtoimproveprofitsonmatureICproducts.Other
examples[32][33]oftheapplicationofdataminingmethodologiesinsemiconductormanufacturing
environmentssuggestthatdataminingmethodologiesmaybeparticularlyusefulwhendataisscarce,
andthevariousphysicalandchemicalparametersthataffecttheprocessexhibithighlycomplex
interactions.Anotherimplicationisthatonlinemonitoringofthesemiconductormanufacturing
processusingdataminingmaybehighlyeffective.

Scienceandengineering
Inrecentyears,datamininghasbeenusedwidelyintheareasofscienceandengineering,suchas
bioinformatics,genetics,medicine,educationandelectricalpowerengineering.
Inthestudyofhumangenetics,sequencemininghelpsaddresstheimportantgoalofunderstanding
themappingrelationshipbetweentheinterindividualvariationsinhumanDNAsequenceandthe
variabilityindiseasesusceptibility.Insimpleterms,itaimstofindouthowthechangesinan
individual'sDNAsequenceaffectstherisksofdevelopingcommondiseasessuchascancer,whichis
ofgreatimportancetoimprovingmethodsofdiagnosing,preventing,andtreatingthesediseases.One
dataminingmethodthatisusedtoperformthistaskisknownasmultifactordimensionality
reduction.[34]
Intheareaofelectricalpowerengineering,dataminingmethodshavebeenwidelyusedforcondition
monitoringofhighvoltageelectricalequipment.Thepurposeofconditionmonitoringistoobtain
valuableinformationon,forexample,thestatusoftheinsulation(orotherimportantsafetyrelated
parameters).Dataclusteringtechniquessuchastheselforganizingmap(SOM),havebeenapplied
tovibrationmonitoringandanalysisoftransformeronloadtapchangers(OLTCS).Usingvibration
monitoring,itcanbeobservedthateachtapchangeoperationgeneratesasignalthatcontains
informationabouttheconditionofthetapchangercontactsandthedrivemechanisms.Obviously,
differenttappositionswillgeneratedifferentsignals.However,therewasconsiderablevariability
amongstnormalconditionsignalsforexactlythesametapposition.SOMhasbeenappliedtodetect
abnormalconditionsandtohypothesizeaboutthenatureoftheabnormalities.[35]
http://en.wikipedia.org/wiki/Data_mining

9/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

Dataminingmethodshavebeenappliedtodissolvedgasanalysis(DGA)inpowertransformers.
DGA,asadiagnosticsforpowertransformers,hasbeenavailableformanyyears.Methodssuchas
SOMhasbeenappliedtoanalyzegenerateddataandtodeterminetrendswhicharenotobvioustothe
standardDGAratiomethods(suchasDuvalTriangle).[35]
Ineducationalresearch,wheredatamininghasbeenusedtostudythefactorsleadingstudentsto
choosetoengageinbehaviorswhichreducetheirlearning,[36]andtounderstandfactorsinfluencing
universitystudentretention.[37]Asimilarexampleofsocialapplicationofdataminingisitsusein
expertisefindingsystems,wherebydescriptorsofhumanexpertiseareextracted,normalized,and
classifiedsoastofacilitatethefindingofexperts,particularlyinscientificandtechnicalfields.Inthis
way,dataminingcanfacilitateinstitutionalmemory.
Dataminingmethodsofbiomedicaldatafacilitatedbydomainontologies,[38]miningclinicaltrial
data,[39]andtrafficanalysisusingSOM.[40]
Inadversedrugreactionsurveillance,theUppsalaMonitoringCentrehas,since1998,useddata
miningmethodstoroutinelyscreenforreportingpatternsindicativeofemergingdrugsafetyissuesin
theWHOglobaldatabaseof4.6millionsuspectedadversedrugreactionincidents.[41]Recently,
similarmethodologyhasbeendevelopedtominelargecollectionsofelectronichealthrecordsfor
temporalpatternsassociatingdrugprescriptionstomedicaldiagnoses.[42]
Datamininghasbeenappliedtosoftwareartifactswithintherealmofsoftwareengineering:Mining
SoftwareRepositories.

Humanrights
Dataminingofgovernmentrecordsparticularlyrecordsofthejusticesystem(i.e.,courts,prisons)
enablesthediscoveryofsystemichumanrightsviolationsinconnectiontogenerationandpublicationof
invalidorfraudulentlegalrecordsbyvariousgovernmentagencies.[43][44]

Medicaldatamining
In2011,thecaseofSorrellv.IMSHealth,Inc.,decidedbytheSupremeCourtoftheUnitedStates,ruled
thatpharmaciesmayshareinformationwithoutsidecompanies.Thispracticewasauthorizedunderthe1st
AmendmentoftheConstitution,protectingthe"freedomofspeech."[45]However,thepassageoftheHealth
InformationTechnologyforEconomicandClinicalHealthAct(HITECHAct)helpedtoinitiatethe
adoptionoftheelectronichealthrecord(EHR)andsupportingtechnologyintheUnitedStates.[46]The
HITECHActwassignedintolawonFebruary17,2009aspartoftheAmericanRecoveryand
ReinvestmentAct(ARRA)andhelpedtoopenthedoortomedicaldatamining.[47]Priortothesigningof
thislaw,estimatesofonly20%ofUnitedStatesbasedphysicianwereutilizingelectronicpatient
http://en.wikipedia.org/wiki/Data_mining

10/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

records.[46]SrenBrunaknotesthatthepatientrecordbecomesasinformationrichaspossibleand
therebymaximizesthedataminingopportunities.[46]Hence,electronicpatientrecordsfurtherexpands
thepossibilitiesregardingmedicaldataminingtherebyopeningthedoortoavastsourceofmedicaldata
analysis.

Spatialdatamining
Spatialdataminingistheapplicationofdataminingmethodstospatialdata.Theendobjectiveofspatial
dataminingistofindpatternsindatawithrespecttogeography.Sofar,dataminingandGeographic
InformationSystems(GIS)haveexistedastwoseparatetechnologies,eachwithitsownmethods,
traditions,andapproachestovisualizationanddataanalysis.Particularly,mostcontemporaryGIShave
onlyverybasicspatialanalysisfunctionality.Theimmenseexplosioningeographicallyreferenceddata
occasionedbydevelopmentsinIT,digitalmapping,remotesensing,andtheglobaldiffusionofGIS
emphasizestheimportanceofdevelopingdatadriveninductiveapproachestogeographicalanalysisand
modeling.
DataminingoffersgreatpotentialbenefitsforGISbasedapplieddecisionmaking.Recently,thetaskof
integratingthesetwotechnologieshasbecomeofcriticalimportance,especiallyasvariouspublicand
privatesectororganizationspossessinghugedatabaseswiththematicandgeographicallyreferenceddata
begintorealizethehugepotentialoftheinformationcontainedtherein.Amongthoseorganizationsare:
officesrequiringanalysisordisseminationofgeoreferencedstatisticaldata
publichealthservicessearchingforexplanationsofdiseaseclustering
environmentalagenciesassessingtheimpactofchanginglandusepatternsonclimatechange
geomarketingcompaniesdoingcustomersegmentationbasedonspatiallocation.
ChallengesinSpatialmining:Geospatialdatarepositoriestendtobeverylarge.Moreover,existingGIS
datasetsareoftensplinteredintofeatureandattributecomponentsthatareconventionallyarchivedin
hybriddatamanagementsystems.Algorithmicrequirementsdiffersubstantiallyforrelational(attribute)
datamanagementandfortopological(feature)datamanagement.[48]Relatedtothisistherangeand
diversityofgeographicdataformats,whichpresentuniquechallenges.Thedigitalgeographicdata
revolutioniscreatingnewtypesofdataformatsbeyondthetraditional"vector"and"raster"formats.
Geographicdatarepositoriesincreasinglyincludeillstructureddata,suchasimageryandgeoreferenced
multimedia.[49]
Thereareseveralcriticalresearchchallengesingeographicknowledgediscoveryanddatamining.Miller
andHan[50]offerthefollowinglistofemergingresearchtopicsinthefield:
Developingandsupportinggeographicdatawarehouses(GDW's):Spatialpropertiesareoften
reducedtosimpleaspatialattributesinmainstreamdatawarehouses.CreatinganintegratedGDW
requiressolvingissuesofspatialandtemporaldatainteroperabilityincludingdifferencesin
semantics,referencingsystems,geometry,accuracy,andposition.
Betterspatiotemporalrepresentationsingeographicknowledgediscovery:Currentgeographic
http://en.wikipedia.org/wiki/Data_mining

11/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

knowledgediscovery(GKD)methodsgenerallyuseverysimplerepresentationsofgeographic
objectsandspatialrelationships.Geographicdataminingmethodsshouldrecognizemorecomplex
geographicobjects(i.e.,linesandpolygons)andrelationships(i.e.,nonEuclideandistances,
direction,connectivity,andinteractionthroughattributedgeographicspacesuchasterrain).
Furthermore,thetimedimensionneedstobemorefullyintegratedintothesegeographic
representationsandrelationships.
Geographicknowledgediscoveryusingdiversedatatypes:GKDmethodsshouldbedeveloped
thatcanhandlediversedatatypesbeyondthetraditionalrasterandvectormodels,includingimagery
andgeoreferencedmultimedia,aswellasdynamicdatatypes(videostreams,animation).

Temporaldatamining
Datamaycontainattributesgeneratedandrecordedatdifferenttimes.Inthiscasefindingmeaningful
relationshipsinthedatamayrequireconsideringthetemporalorderoftheattributes.Atemporal
relationshipmayindicateacausalrelationship,orsimplyanassociation.

Sensordatamining
Wirelesssensornetworkscanbeusedforfacilitatingthecollectionofdataforspatialdataminingfora
varietyofapplicationssuchasairpollutionmonitoring.[51]Acharacteristicofsuchnetworksisthatnearby
sensornodesmonitoringanenvironmentalfeaturetypicallyregistersimilarvalues.Thiskindofdata
redundancyduetothespatialcorrelationbetweensensorobservationsinspiresthetechniquesforin
networkdataaggregationandmining.Bymeasuringthespatialcorrelationbetweendatasampledby
differentsensors,awideclassofspecializedalgorithmscanbedevelopedtodevelopmoreefficientspatial
dataminingalgorithms.[52]

Visualdatamining
Intheprocessofturningfromanalogicalintodigital,largedatasetshavebeengenerated,collected,and
storeddiscoveringstatisticalpatterns,trendsandinformationwhichishiddenindata,inordertobuild
predictivepatterns.Studiessuggestvisualdataminingisfasterandmuchmoreintuitivethanistraditional
datamining.[53][54][55]SeealsoComputervision.

Musicdatamining
Dataminingtechniques,andinparticularcooccurrenceanalysis,hasbeenusedtodiscoverrelevant
similaritiesamongmusiccorpora(radiolists,CDdatabases)forpurposesincludingclassifyingmusicinto
genresinamoreobjectivemanner.[56]

Surveillance

http://en.wikipedia.org/wiki/Data_mining

12/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

DatamininghasbeenusedbytheU.S.government.ProgramsincludetheTotalInformationAwareness
(TIA)program,SecureFlight(formerlyknownasComputerAssistedPassengerPrescreeningSystem
(CAPPSII)),Analysis,Dissemination,Visualization,Insight,SemanticEnhancement(ADVISE),[57]and
theMultistateAntiTerrorismInformationExchange(MATRIX).[58]Theseprogramshavebeen
discontinuedduetocontroversyoverwhethertheyviolatethe4thAmendmenttotheUnitedStates
Constitution,althoughmanyprogramsthatwereformedunderthemcontinuetobefundedbydifferent
organizationsorunderdifferentnames.[59]
Inthecontextofcombatingterrorism,twoparticularlyplausiblemethodsofdataminingare"pattern
mining"and"subjectbaseddatamining".

Patternmining
"Patternmining"isadataminingmethodthatinvolvesfindingexistingpatternsindata.Inthiscontext
patternsoftenmeansassociationrules.Theoriginalmotivationforsearchingassociationrulescamefrom
thedesiretoanalyzesupermarkettransactiondata,thatis,toexaminecustomerbehaviorintermsofthe
purchasedproducts.Forexample,anassociationrule"beerpotatochips(80%)"statesthatfouroutof
fivecustomersthatboughtbeeralsoboughtpotatochips.
Inthecontextofpatternminingasatooltoidentifyterroristactivity,theNationalResearchCouncil
providesthefollowingdefinition:"Patternbaseddatamininglooksforpatterns(includinganomalousdata
patterns)thatmightbeassociatedwithterroristactivitythesepatternsmightberegardedassmallsignals
inalargeoceanofnoise."[60][61][62]PatternMiningincludesnewareassuchaMusicInformationRetrieval
(MIR)wherepatternsseenbothinthetemporalandnontemporaldomainsareimportedtoclassical
knowledgediscoverysearchmethods.

Subjectbaseddatamining
"Subjectbaseddatamining"isadataminingmethodinvolvingthesearchforassociationsbetween
individualsindata.Inthecontextofcombatingterrorism,theNationalResearchCouncilprovidesthe
followingdefinition:"Subjectbaseddataminingusesaninitiatingindividualorotherdatumthatis
considered,basedonotherinformation,tobeofhighinterest,andthegoalistodeterminewhatother
personsorfinancialtransactionsormovements,etc.,arerelatedtothatinitiatingdatum."[61]

Knowledgegrid
Knowledgediscovery"OntheGrid"generallyreferstoconductingknowledgediscoveryinanopen
environmentusinggridcomputingconcepts,allowinguserstointegratedatafromvariousonlinedata
sources,aswellmakeuseofremoteresources,forexecutingtheirdataminingtasks.Theearliestexample
wastheDiscoveryNet,[63][64]developedatImperialCollegeLondon,whichwonthe"MostInnovative
DataIntensiveApplicationAward"attheACMSC02(Supercomputing2002)conferenceandexhibition,
basedonademonstrationofafullyinteractivedistributedknowledgediscoveryapplicationfora
bioinformaticsapplication.OtherexamplesincludeworkconductedbyresearchersattheUniversityof
Calabria,whodevelopedaKnowledgeGridarchitecturefordistributedknowledgediscovery,basedongrid
computing.[65][66]
http://en.wikipedia.org/wiki/Data_mining

13/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

Privacyconcernsandethics
Whiletheterm"datamining"itselfhasnoethicalimplications,itisoftenassociatedwiththeminingof
informationinrelationtopeoples'behavior(ethicalandotherwise).[67]
Thewaysinwhichdataminingcanbeusedcaninsomecasesandcontextsraisequestionsregarding
privacy,legality,andethics.[68]Inparticular,datamininggovernmentorcommercialdatasetsfornational
securityorlawenforcementpurposes,suchasintheTotalInformationAwarenessProgramorinADVISE,
hasraisedprivacyconcerns.[69][70]
Dataminingrequiresdatapreparationwhichcanuncoverinformationorpatternswhichmaycompromise
confidentialityandprivacyobligations.Acommonwayforthistooccuristhroughdataaggregation.Data
aggregationinvolvescombiningdatatogether(possiblyfromvarioussources)inawaythatfacilitates
analysis(butthatalsomightmakeidentificationofprivate,individualleveldatadeducibleorotherwise
apparent).[71]Thisisnotdataminingperse,butaresultofthepreparationofdatabeforeandforthe
purposesoftheanalysis.Thethreattoanindividual'sprivacycomesintoplaywhenthedata,once
compiled,causethedataminer,oranyonewhohasaccesstothenewlycompileddataset,tobeableto
identifyspecificindividuals,especiallywhenthedatawereoriginallyanonymous.[72][73][74]
Itisrecommendedthatanindividualismadeawareofthefollowingbeforedataarecollected:[71]
thepurposeofthedatacollectionandany(known)dataminingprojects
howthedatawillbeused
whowillbeabletominethedataandusethedataandtheirderivatives
thestatusofsecuritysurroundingaccesstothedata
howcollecteddatacanbeupdated.
Datamayalsobemodifiedsoastobecomeanonymous,sothatindividualsmaynotreadilybe
identified.[71]However,even"deidentified"/"anonymized"datasetscanpotentiallycontainenough
informationtoallowidentificationofindividuals,asoccurredwhenjournalistswereabletofindseveral
individualsbasedonasetofsearchhistoriesthatwereinadvertentlyreleasedbyAOL.[75]

SituationinEurope
Europehasratherstrongprivacylaws,andeffortsareunderwaytofurtherstrengthentherightsofthe
consumers.However,theU.S.E.U.SafeHarborPrinciplescurrentlyeffectivelyexposeEuropeanusersto
privacyexploitationbyU.S.companies.AsaconsequenceofEdwardSnowden'sGlobalsurveillance
disclosure,therehasbeenincreaseddiscussiontorevokethisagreement,asinparticularthedatawillbe
fullyexposedtotheNationalSecurityAgency,andattemptstoreachanagreementhavefailed.

SituationintheUnitedStates
IntheUnitedStates,privacyconcernshavebeenaddressedtosomeextentbytheUSCongressviathe
passageofregulatorycontrolssuchastheHealthInsurancePortabilityandAccountabilityAct(HIPAA).
TheHIPAArequiresindividualstogivetheir"informedconsent"regardinginformationtheyprovideand
http://en.wikipedia.org/wiki/Data_mining

14/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

itsintendedpresentandfutureuses.AccordingtoanarticleinBiotechBusinessWeek',"'[i]npractice,
HIPAAmaynotofferanygreaterprotectionthanthelongstandingregulationsintheresearcharena,'says
theAAHC.Moreimportantly,therule'sgoalofprotectionthroughinformedconsentisunderminedbythe
complexityofconsentformsthatarerequiredofpatientsandparticipants,whichapproachalevelof
incomprehensibilitytoaverageindividuals."[76]Thisunderscoresthenecessityfordataanonymityindata
aggregationandminingpractices.
U.S.informationprivacylegislationsuchasHIPAAandtheFamilyEducationalRightsandPrivacyAct
(FERPA)appliesonlytothespecificareasthateachsuchlawaddresses.Useofdataminingbythemajority
ofbusinessesintheU.S.isnotcontrolledbyanylegislation.

CopyrightLaw
SituationinEurope
DuetoalackofflexibilitiesinEuropeancopyrightanddatabaselaw,theminingofincopyrightworks
suchaswebminingwithoutthepermissionofthecopyrightownerisnotlegal.Whereadatabaseispure
datainEuropethereislikelytobenocopyright,butdatabaserightsmayexistsodataminingbecomes
subjecttoregulationsbytheDatabaseDirective.OntherecommendationoftheHargreavesreviewthisled
totheUKgovernmenttoamenditscopyrightlawin2014[77]toallowcontentminingasalimitationand
exception.OnlythesecondcountryintheworldtodosoafterJapan,whichintroducedanexceptionin
2009fordatamining.HoweverduetotherestrictionoftheCopyrightDirective,theUKexceptiononly
allowscontentminingfornoncommercialpurposes.UKcopyrightlawalsodoesnotallowthisprovision
tobeoverriddenbycontractualtermsandconditions.TheEuropeanCommissionfacilitatedstakeholder
discussionontextanddataminingin2013,underthetitleofLicencesforEurope.[78]Thefocusonthe
solutiontothislegalissuebeinglicencesandnotlimitationsandexceptionsledtorepresentativesof
universities,researchers,libraries,civilsocietygroupsandopenaccesspublisherstoleavethestakeholder
dialogueinMay2013.[79]

SituationintheUnitedStates
BycontrasttoEurope,theflexiblenatureofUScopyrightlaw,andinparticularfairusemeansthatcontent
mininginAmerica,aswellasotherfairusecountriessuchasIsrael,TaiwanandSouthKoreaisviewedas
beinglegal.Ascontentminingistransformative,thatisitdoesnotsupplanttheoriginalwork,itisviewed
asbeinglawfulunderfairuse.ForexampleaspartoftheGoogleBooksettlementthepresidingjudgeon
thecaseruledthatGoogle'sdigitisationprojectofincopyrightbookswaslawful,inpartbecauseofthe
transformativeusesthatthedigitisationprojectdisplayedonebeingtextanddatamining.[80]

Software
SeealsoCategory:Dataminingandmachinelearningsoftware.

Freeopensourcedataminingsoftwareandapplications
Carrot2:Textandsearchresultsclusteringframework.
http://en.wikipedia.org/wiki/Data_mining

15/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

Chemicalize.org:Achemicalstructureminerandwebsearchengine.
ELKI:Auniversityresearchprojectwithadvancedclusteranalysisandoutlierdetectionmethods
writtenintheJavalanguage.
GATE:anaturallanguageprocessingandlanguageengineeringtool.
KNIME:TheKonstanzInformationMiner,auserfriendlyandcomprehensivedataanalytics
framework.
MLFlex:Asoftwarepackagethatenablesuserstointegratewiththirdpartymachinelearning
packageswritteninanyprogramminglanguage,executeclassificationanalysesinparallelacross
multiplecomputingnodes,andproduceHTMLreportsofclassificationresults.
MLPACKlibrary:acollectionofreadytousemachinelearningalgorithmswrittenintheC++
language.
MassiveOnlineAnalysis(MOA):arealtimebigdatastreamminingwithconceptdrifttoolinthe
Javaprogramminglanguage.
NLTK(NaturalLanguageToolkit):Asuiteoflibrariesandprogramsforsymbolicandstatistical
naturallanguageprocessing(NLP)forthePythonlanguage.
OpenNN:Openneuralnetworkslibrary.
Orange:AcomponentbaseddataminingandmachinelearningsoftwaresuitewritteninthePython
language.
R:Aprogramminglanguageandsoftwareenvironmentforstatisticalcomputing,datamining,and
graphics.ItispartoftheGNUProject.
RapidMiner:Anenvironmentformachinelearninganddataminingexperiments.
SCaViS:JavacrossplatformdataanalysisframeworkdevelopedatArgonneNationalLaboratory.
SenticNetAPI(http://sentic.net/api):Asemanticandaffectiveresourceforopinionminingand
sentimentanalysis.
Tanagra:Avisualisationorienteddataminingsoftware,alsoforteaching.
Torch:AnopensourcedeeplearninglibraryfortheLuaprogramminglanguageandscientific
computingframeworkwithwidesupportformachinelearningalgorithms.
UIMA:TheUIMA(UnstructuredInformationManagementArchitecture)isacomponentframework
foranalyzingunstructuredcontentsuchastext,audioandvideooriginallydevelopedbyIBM.
Weka:AsuiteofmachinelearningsoftwareapplicationswrittenintheJavaprogramminglanguage.

Commercialdataminingsoftwareandapplications
AngossKnowledgeSTUDIO:dataminingtoolprovidedbyAngoss.
Clarabridge:enterpriseclasstextanalyticssolution.
HPVerticaAnalyticsPlatform:dataminingsoftwareprovidedbyHP.
IBMSPSSModeler:dataminingsoftwareprovidedbyIBM.
http://en.wikipedia.org/wiki/Data_mining

16/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

KXENModeler:dataminingtoolprovidedbyKXEN.
LIONsolver:anintegratedsoftwareapplicationfordatamining,businessintelligence,andmodeling
thatimplementstheLearningandIntelligentOptimizatioN(LION)approach.
MicrosoftAnalysisServices:dataminingsoftwareprovidedbyMicrosoft.
NetOwl:suiteofmultilingualtextandentityanalyticsproductsthatenabledatamining.
OracleDataMining:dataminingsoftwarebyOracle.
SASEnterpriseMiner:dataminingsoftwareprovidedbytheSASInstitute.
STATISTICADataMiner:dataminingsoftwareprovidedbyStatSoft.

Marketplacesurveys
Severalresearchersandorganizationshaveconductedreviewsofdataminingtoolsandsurveysofdata
miners.Theseidentifysomeofthestrengthsandweaknessesofthesoftwarepackages.Theyalsoprovide
anoverviewofthebehaviors,preferencesandviewsofdataminers.Someofthesereportsinclude:
2011WileyInterdisciplinaryReviews:DataMiningandKnowledgeDiscovery[81]
RexerAnalyticsDataMinerSurveys(20072013)[82]
ForresterResearch2010PredictiveAnalyticsandDataMiningSolutionsreport[83]
Gartner2008"MagicQuadrant"report[84]
RobertA.Nisbet's2006ThreePartSeriesofarticles"DataMiningTools:WhichOneisBestFor
CRM?"[85]
Haughtonetal.'s2003ReviewofDataMiningSoftwarePackagesinTheAmericanStatistician[86]
Goebel&Gruenwald1999"ASurveyofDataMiningaKnowledgeDiscoverySoftwareTools"in
SIGKDDExplorations[87]

Seealso
Methods
Anomaly/outlier/changedetection
Associationrulelearning
Classification
Clusteranalysis
Decisiontree
Factoranalysis
Geneticalgorithms
Intentionmining
Multilinearsubspacelearning
http://en.wikipedia.org/wiki/Data_mining

17/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

Neuralnetworks
Regressionanalysis
Sequencemining
Structureddataanalysis
Supportvectormachines
Textmining
Applicationdomains
Analytics
Bioinformatics
Businessintelligence
Dataanalysis
Datawarehouse
Decisionsupportsystem
Drugdiscovery
Exploratorydataanalysis
Predictiveanalytics
Webmining
Applicationexamples
SeealsoCategory:Applieddatamining.
Customeranalytics
Datamininginagriculture
Datamininginmeteorology
Educationaldatamining
NationalSecurityAgency
PoliceenforcedANPRintheUK
Quantitativestructureactivityrelationship
Surveillance/Masssurveillance(e.g.,StellarWind)
Relatedtopics
Dataminingisaboutanalyzingdataforinformationaboutextractinginformationoutofdata,see:
Dataintegration
Datatransformation
http://en.wikipedia.org/wiki/Data_mining

18/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

Informationextraction
Informationintegration
Namedentityrecognition
Profiling(informationscience)
Webscraping

References
1. ^abcFayyad,UsamaPiatetskyShapiro,GregorySmyth,Padhraic(1996)."FromDataMiningtoKnowledge
DiscoveryinDatabases"(http://www.kdnuggets.com/gpspubs/aimagkddoverview1996Fayyad.pdf).Retrieved
17December2008.
2. ^abcd"DataMiningCurriculum"(http://www.sigkdd.org/curriculum.php).ACMSIGKDD.20060430.
Retrieved20111028.
3. ^Clifton,Christopher(2010)."EncyclopdiaBritannica:DefinitionofDataMining"
(http://www.britannica.com/EBchecked/topic/1056150/datamining).Retrieved20101209.
4. ^Hastie,TrevorTibshirani,RobertFriedman,Jerome(2009)."TheElementsofStatisticalLearning:Data
Mining,Inference,andPrediction"(http://wwwstat.stanford.edu/~tibs/ElemStatLearn/).Retrieved20120807.
5. ^Han,JiaweiKamber,Micheline(2001).Datamining:conceptsandtechniques.MorganKaufmann.p.5.
ISBN9781558604896."Thus,dataminingshouldhabebeenmoreappropriatelynamed"knowledgeminingfrom
data,"whichisunfortunatelysomewhatlong"
6. ^Seee.g.OKAIRP2005FallConference,ArizonaStateUniversity
(http://www.okairp.org/documents/2005%20Fall/F05_ROMEDataQualityETC.pdf),About.com:Datamining
(http://databases.about.com/od/datamining/a/datamining.htm)
7. ^Witten,IanH.Frank,EibeHall,MarkA.(30January2011).DataMining:PracticalMachineLearning
ToolsandTechniques(3ed.).Elsevier.ISBN9780123748560.
8. ^Bouckaert,RemcoR.Frank,EibeHall,MarkA.Holmes,GeoffreyPfahringer,BernhardReutemann,Peter
Witten,IanH.(2010)."WEKAExperienceswithaJavaopensourceproject".JournalofMachineLearning
Research11:25332541."theoriginaltitle,"Practicalmachinelearning",waschanged...Theterm"datamining"
was[added]primarilyformarketingreasons."
9. ^Mena,Jess(2011).MachineLearningForensicsforLawEnforcement,Security,andIntelligence.Boca
Raton,FL:CRCPress(Taylor&FrancisGroup).ISBN9781439860694.
10. ^PiatetskyShapiro,GregoryParker,Gary(2011)."Lesson:DataMining,andKnowledgeDiscovery:An
Introduction"(http://www.kdnuggets.com/data_mining_course/x1introtodataminingnotes.html).Introduction
toDataMining.KDNuggets.Retrieved30August2012.
11. ^Kantardzic,Mehmed(2003).DataMining:Concepts,Models,Methods,andAlgorithms.JohnWiley&Sons.
ISBN0471228524.OCLC50055336(https://www.worldcat.org/oclc/50055336).
12. ^"MicrosoftAcademicSearch:Topconferencesindatamining"(http://academic.research.microsoft.com/?
SearchDomain=2&SubDomain=7&entitytype=2).MicrosoftAcademicSearch.
13. ^"GoogleScholar:ToppublicationsDataMining&Analysis"(http://scholar.google.de/citations?
view_op=top_venues&hl=en&vq=eng_datamininganalysis).GoogleScholar.
http://en.wikipedia.org/wiki/Data_mining

19/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

14. ^Proceedings(http://www.kdd.org/conferences.php),InternationalConferencesonKnowledgeDiscoveryand
DataMining,ACM,NewYork.
15. ^SIGKDDExplorations(http://www.kdd.org/explorations/about.php),ACM,NewYork.
16. ^GregoryPiatetskyShapiro(2002)KDnuggetsMethodologyPoll
(http://www.kdnuggets.com/polls/2002/methodology.htm)
17. ^GregoryPiatetskyShapiro(2004)KDnuggetsMethodologyPoll
(http://www.kdnuggets.com/polls/2004/data_mining_methodology.htm)
18. ^GregoryPiatetskyShapiro(2007)KDnuggetsMethodologyPoll
(http://www.kdnuggets.com/polls/2007/data_mining_methodology.htm)
19. ^scarMarbn,GonzaloMariscalandJavierSegovia(2009)ADataMining&KnowledgeDiscoveryProcess
Model(http://cdn.intechopen.com/pdfs/5937/InTech
A_data_mining_amp_knowledge_discovery_process_model.pdf).InDataMiningandKnowledgeDiscoveryin
RealLifeApplications,Bookeditedby:JulioPonceandAdemKarahoca,ISBN9783902613530,pp.438
453,February2009,ITech,Vienna,Austria.
20. ^LukaszKurganandPetrMusilek(2006)AsurveyofKnowledgeDiscoveryandDataMiningprocessmodels
(http://journals.cambridge.org/action/displayAbstract?fromPage=online&aid=451120).TheKnowledge
EngineeringReview.Volume21Issue1,March2006,pp124,CambridgeUniversityPress,NewYork,NY,
USAdoi:10.1017/S0269888906000737(http://dx.doi.org/10.1017%2FS0269888906000737)
21. ^Azevedo,A.andSantos,M.F.KDD,SEMMAandCRISPDM:aparalleloverview
(http://www.iadis.net/dl/final_uploads/200812P033.pdf).InProceedingsoftheIADISEuropeanConferenceon
DataMining2008,pp182185.
22. ^Gnnemann,StephanKremer,HardySeidl,Thomas(2011)."AnextensionofthePMMLstandardto
subspaceclusteringmodels"."Proceedingsofthe2011workshoponPredictivemarkuplanguagemodeling
PMML'11".p.48.doi:10.1145/2023598.2023605(http://dx.doi.org/10.1145%2F2023598.2023605).
ISBN9781450308373.
23. ^O'Brien,J.A.,&Marakas,G.M.(2011).ManagementInformationSystems.NewYork,NY:McGraw
Hill/Irwin.
24. ^Alexander,D.(n.d.).DataMining.RetrievedfromTheUniversityofTexasatAustin:CollegeofLiberalArts:
http://www.laits.utexas.edu/~anorman/BUS.FOR/course.mat/Alex/
25. ^Goss,S.(2013,April10).Dataminingandourpersonalprivacy.RetrievedfromTheTelegraph:
http://www.macon.com/2013/04/10/2429775/dataminingandourpersonalprivacy.html
26. ^Monk,EllenWagner,Bret(2006).ConceptsinEnterpriseResourcePlanning,SecondEdition.Boston,MA:
ThomsonCourseTechnology.ISBN0619216638.OCLC224465825
(https://www.worldcat.org/oclc/224465825).
27. ^abcElovici,YuvalBraha,Dan(2003)."ADecisionTheoreticApproachtoDataMining"
(http://necsi.edu/affiliates/braha/IEEE_Decision_Theoretic.pdf)(PDF).IEEETransactionsonSystems,Man,and
CyberneticsPartA:SystemsandHumans33(1).
28. ^Battiti,RobertoandBrunato,MauroReactiveBusinessIntelligence.FromDatatoModelstoInsight
(http://www.reactivebusinessintelligence.com/),ReactiveSearchSrl,Italy,February2011.ISBN97888905795
09.
29. ^Battiti,RobertoPasserini,Andrea(2010)."BrainComputerEvolutionaryMultiObjectiveOptimization(BC
EMO):ageneticalgorithmadaptingtothedecisionmaker"
http://en.wikipedia.org/wiki/Data_mining

20/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

EMO):ageneticalgorithmadaptingtothedecisionmaker"

(http://rtm.science.unitn.it/~battiti/archive/bcemo.pdf).IEEETransactionsonEvolutionaryComputation14(15):
671687.doi:10.1109/TEVC.2010.2058118(http://dx.doi.org/10.1109%2FTEVC.2010.2058118).
30. ^Braha,DanElovici,YuvalLast,Mark(2007)."Theoryofactionabledataminingwithapplicationto
semiconductormanufacturingcontrol"(http://necsi.edu/affiliates/braha/TPRS_A_165421_O.pdf)(PDF).
InternationalJournalofProductionResearch45(13).
31. ^Fountain,TonyDietterich,ThomasandSudyka,Bill(2000)MiningICTestDatatoOptimizeVLSITesting
(http://web.engr.oregonstate.edu/~tgd/publications/kdd2000dlft.pdf),inProceedingsoftheSixthACMSIGKDD
InternationalConferenceonKnowledgeDiscovery&DataMining,ACMPress,pp.1825
32. ^Braha,DanShmilovici,Armin(2002)."DataMiningforImprovingaCleaningProcessintheSemiconductor
Industry"(http://necsi.edu/affiliates/braha/IEEECleaning_02.pdf)(PDF).IEEETransactionsonSemiconductor
Manufacturing15(1).
33. ^Braha,DanShmilovici,Armin(2003)."OntheUseofDecisionTreeInductionforDiscoveryofInteractions
inaPhotolithographicProcess"(http://necsi.edu/affiliates/braha/IEEE_Decision_Trees.pdf)(PDF).IEEE
TransactionsonSemiconductorManufacturing16(4).
34. ^Zhu,XingquanDavidson,Ian(2007).KnowledgeDiscoveryandDataMining:ChallengesandRealities.New
York,NY:Hershey.p.18.ISBN9781599042527.
35. ^abMcGrail,AnthonyJ.Gulski,EdwardAllan,DavidBirtwhistle,DavidBlackburn,TrevorR.Groot,
EdwinR.S."DataMiningTechniquestoAssesstheConditionofHighVoltageElectricalPlant".CIGRWG
15.11ofStudyCommittee15.
36. ^Baker,RyanS.J.d."IsGamingtheSystemStateorTrait?EducationalDataMiningThroughtheMulti
ContextualApplicationofaValidatedBehavioralModel".WorkshoponDataMiningforUserModeling2007.
37. ^SuperbyAguirre,JuanFranciscoVandamme,JeanPhilippeMeskens,Nadine."Determinationoffactors
influencingtheachievementofthefirstyearuniversitystudentsusingdataminingmethods".Workshopon
EducationalDataMining2006.
38. ^Zhu,XingquanDavidson,Ian(2007).KnowledgeDiscoveryandDataMining:ChallengesandRealities.New
York,NY:Hershey.pp.163189.ISBN9781599042527.
39. ^Zhu,XingquanDavidson,Ian(2007).KnowledgeDiscoveryandDataMining:ChallengesandRealities.New
York,NY:Hershey.pp.3148.ISBN9781599042527.
40. ^Chen,YudongZhang,YiHu,JianmingLi,Xiang(2006)."TrafficDataAnalysisUsingKernelPCAand
SelfOrganizingMap".IEEEIntelligentVehiclesSymposium.
41. ^Bate,AndrewLindquist,MarieEdwards,I.RalphOlsson,StenOrre,RolandLansner,Andersandde
Freitas,RogelioMelhadoABayesianneuralnetworkmethodforadversedrugreactionsignalgeneration
(http://dml.cs.byu.edu/~cgc/docs/atdm/W11/BCPNNADR.pdf),EuropeanJournalofClinicalPharmacology
1998Jun54(4):31521PubMed(https://www.ncbi.nlm.nih.gov/pubmed/9696956)
42. ^Norn,G.NiklasBate,AndrewHopstadius,JohanStar,KristinaandEdwards,I.Ralph(2008)Temporal
PatternDiscoveryforTrendsandTransientEffects:ItsApplicationtoPatientRecords.Proceedingsofthe
FourteenthInternationalConferenceonKnowledgeDiscoveryandDataMining(SIGKDD2008),LasVegas,
NV,pp.963971.
43. ^Zernik,JosephDataMiningasaCivicDutyOnlinePublicPrisoners'RegistrationSystems
(http://www.scribd.com/doc/38328591/),InternationalJournalonSocialMedia:Monitoring,Measurement,
Mining,1:8496(2010)
44. ^Zernik,JosephDataMiningofOnlineJudicialRecordsoftheNetworkedUSFederalCourts
http://en.wikipedia.org/wiki/Data_mining

21/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

44. ^Zernik,JosephDataMiningofOnlineJudicialRecordsoftheNetworkedUSFederalCourts
(http://www.scribd.com/doc/38328585/),InternationalJournalonSocialMedia:Monitoring,Measurement,
Mining,1:6983(2010)
45. ^DavidG.Savage(20110624)."Pharmaceuticalindustry:SupremeCourtsideswithpharmaceuticalindustryin
twodecisions"(http://articles.latimes.com/2011/jun/24/nation/lanacourtdrugs20110624).LosAngelesTimes.
Retrieved20121107.
46. ^abcAnalyzingMedicalData.(2012).CommunicationsoftheACM55(6),1315.
doi:10.1145/2184319.2184324(http://dx.doi.org/10.1145%2F2184319.2184324)
47. ^http://searchhealthit.techtarget.com/definition/HITECHAct
48. ^Healey,RichardG.(1991)DatabaseManagementSystems,inMaguire,DavidJ.Goodchild,MichaelF.and
Rhind,DavidW.,(eds.),GeographicInformationSystems:PrinciplesandApplications,London,GB:Longman
49. ^Camara,AntonioS.andRaper,Jonathan(eds.)(1999)SpatialMultimediaandVirtualReality,London,GB:
TaylorandFrancis
50. ^Miller,HarveyJ.andHan,Jiawei(eds.)(2001)GeographicDataMiningandKnowledgeDiscovery,London,
GB:Taylor&Francis
51. ^Ma,Y.Richards,M.Ghanem,M.Guo,Y.Hassard,J.(2008)."AirPollutionMonitoringandMiningBased
onSensorGridinLondon".Sensors8(6):3601.doi:10.3390/s8063601
(http://dx.doi.org/10.3390%2Fs8063601).
52. ^Ma,Y.Guo,Y.Tian,X.Ghanem,M.(2011)."DistributedClusteringBasedAggregationAlgorithmfor
SpatialCorrelatedSensorNetworks".IEEESensorsJournal11(3):641.doi:10.1109/JSEN.2010.2056916
(http://dx.doi.org/10.1109%2FJSEN.2010.2056916).
53. ^Zhao,KaidiandLiu,BingTirpark,ThomasM.andWeimin,XiaoAVisualDataMiningFrameworkfor
ConvenientIdentificationofUsefulKnowledge(http://dl.acm.org/citation.cfm?id=1106390)
54. ^Keim,DanielA.InformationVisualizationandVisualDataMining
(http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.135.7051)
55. ^Burch,MichaelDiehl,StephanWeigerber,PeterVisualDataMininginSoftwareArchives
(http://dl.acm.org/citation.cfm?doid=1056018.1056024)
56. ^Pachet,FranoisWestermann,GertandLaigre,DamienMusicalDataMiningforElectronicMusic
Distribution(http://www.csl.sony.fr/downloads/papers/2001/pachet01c.pdf),Proceedingsofthe1stWedelMusic
Conference,Firenze,Italy,2001,pp.101106.
57. ^GovernmentAccountabilityOffice,DataMining:EarlyAttentiontoPrivacyinDevelopingaKeyDHS
ProgramCouldReduceRisks,GAO07293(February2007),Washington,DC
58. ^SecureFlightProgramreport(http://www.msnbc.msn.com/id/20604775/),MSNBC
59. ^"Total/TerrorismInformationAwareness(TIA):IsItTrulyDead?"
(http://w2.eff.org/Privacy/TIA/20031003_comments.php).ElectronicFrontierFoundation(officialwebsite).
2003.Retrieved20090315.
60. ^Agrawal,RakeshMannila,HeikkiSrikant,RamakrishnanToivonen,HannuandVerkamo,A.InkeriFast
discoveryofassociationrules,inAdvancesinknowledgediscoveryanddatamining,MITPress,1996,pp.307
328
61. ^abNationalResearchCouncil,ProtectingIndividualPrivacyintheStruggleAgainstTerrorists:AFramework
forProgramAssessment,Washington,DC:NationalAcademiesPress,2008
62. ^Haag,StephenCummings,MaevePhillips,Amy(2006).ManagementInformationSystemsforthe
http://en.wikipedia.org/wiki/Data_mining

22/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

62. ^Haag,StephenCummings,MaevePhillips,Amy(2006).ManagementInformationSystemsforthe
informationage.Toronto:McGrawHillRyerson.p.28.ISBN0070955697.OCLC63194770
(https://www.worldcat.org/oclc/63194770).
63. ^Ghanem,MoustafaGuo,YikeRowe,AnthonyWendel,Patrick(2002)."Gridbasedknowledgediscovery
servicesforhighthroughputinformatics"."Proceedings11thIEEEInternationalSymposiumonHigh
PerformanceDistributedComputing".p.416.doi:10.1109/HPDC.2002.1029946
(http://dx.doi.org/10.1109%2FHPDC.2002.1029946).ISBN0769516866.
64. ^Ghanem,MoustafaCurcin,VasaWendel,PatrickGuo,Yike(2009)."BuildingandUsingAnalytical
WorkflowsinDiscoveryNet"."DataMiningTechniquesinGridComputingEnvironments".p.119.
doi:10.1002/9780470699904.ch8(http://dx.doi.org/10.1002%2F9780470699904.ch8).ISBN9780470699904.
65. ^Cannataro,MarioTalia,Domenico(January2003)."TheKnowledgeGrid:AnArchitectureforDistributed
KnowledgeDiscovery"(http://grid.deis.unical.it/papers/pdf/CACM2003.pdf).CommunicationsoftheACM46
(1):8993.doi:10.1145/602421.602425(http://dx.doi.org/10.1145%2F602421.602425).Retrieved17October
2011.
66. ^Talia,DomenicoTrunfio,Paolo(July2010)."Howdistributeddataminingtaskscanthriveasknowledge
services"(http://grid.deis.unical.it/papers/pdf/CACM2010.pdf).CommunicationsoftheACM53(7):132137.
doi:10.1145/1785414.1785451(http://dx.doi.org/10.1145%2F1785414.1785451).Retrieved17October2011.
67. ^Seltzer,William."ThePromiseandPitfallsofDataMining:EthicalIssues"
(http://www.amstat.org/committees/ethics/linksdir/Jsm2005Seltzer.pdf).
68. ^Pitts,Chip(15March2007)."TheEndofIllegalDomesticSpying?Don'tCountonIt"
(http://www.washingtonspectator.com/articles/20070315surveillance_1.cfm).WashingtonSpectator.
69. ^Taipale,KimA.(15December2003)."DataMiningandDomesticSecurity:ConnectingtheDotstoMake
SenseofData"(http://www.stlr.org/cite.cgi?volume=5&article=2).ColumbiaScienceandTechnologyLaw
Review5(2).OCLC45263753(https://www.worldcat.org/oclc/45263753).SSRN546782
(https://ssrn.com/abstract=546782).
70. ^Resig,JohnandTeredesai,Ankur(2004)."AFrameworkforMiningInstantMessagingServices"
(http://citeseer.ist.psu.edu/resig04framework.html).Proceedingsofthe2004SIAMDMConference.
71. ^abcThinkBeforeYouDig:PrivacyImplicationsofDataMining&Aggregation
(http://www.nascio.org/publications/documents/NASCIOdataMining.pdf),NASCIOResearchBrief,September
2004
72. ^Ohm,Paul."Don'tBuildaDatabaseofRuin"
(http://blogs.hbr.org/cs/2012/08/dont_build_a_database_of_ruin.html).HarvardBusinessReview.
73. ^DarwinBondGraham,IronCagebookTheLogicalEndofFacebook'sPatents
(http://www.counterpunch.org/2013/12/03/ironcagebook/),Counterpunch.org,2013.12.03
74. ^DarwinBondGraham,InsidetheTechindustrysStartupConference
(http://www.counterpunch.org/2013/09/11/insidethetechindustrysstartupconference/),Counterpunch.org,
2013.09.11
75. ^AOLsearchdataidentifiedindividuals(http://www.securityfocus.com/brief/277),SecurityFocus,August2006
76. ^BiotechBusinessWeekEditors(June30,2008)BIOMEDICINEHIPAAPrivacyRuleImpedesBiomedical
Research,BiotechBusinessWeek,retrieved17November2009fromLexisNexisAcademic
77. ^ResearchersGivenDataMiningRightUnderNewUKCopyrightLaws.(http://www.out
law.com/en/articles/2014/june/researchersgivendataminingrightundernewukcopyrightlaws/%7CUK)Out
Law.com.Retrieved14November2014

http://en.wikipedia.org/wiki/Data_mining

23/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

Law.com.Retrieved14November2014
78. ^"LicencesforEuropeStructuredStakeholderDialogue2013"(http://ec.europa.eu/licencesforeurope
dialogue/en/content/aboutsite).EuropeanCommission.Retrieved14November2014.
79. ^"TextandDataMining:ItsimportanceandtheneedforchangeinEurope"(http://libereurope.eu/news/textand
dataminingitsimportanceandtheneedforchangeineurope/).AssociationofEuropeanResearchLibraries.
Retrieved14November2014.
80. ^"JudgegrantssummaryjudgmentinfavorofGoogleBooksafairusevictory"
(http://www.lexology.com/library/detail.aspx?g=a18c5b925a204d1da098a3095046a88e).Lexology.com.
AntonelliLawLtd.Retrieved14November2014.
81. ^Mikut,RalfReischl,Markus(SeptemberOctober2011)."DataMiningTools"
(http://onlinelibrary.wiley.com/doi/10.1002/widm.24/abstract).WileyInterdisciplinaryReviews:DataMiningand
KnowledgeDiscovery1(5):431445.doi:10.1002/widm.24(http://dx.doi.org/10.1002%2Fwidm.24).Retrieved
October21,2011.
82. ^KarlRexer,HeatherAllen,&PaulGearan(2011)UnderstandingDataMiners(http://www.analytics
magazine.org/mayjune2011/320understandingdataminers),AnalyticsMagazine,May/June2011(INFORMS:
InstituteforOperationsResearchandtheManagementSciences).
83. ^Kobielus,JamesTheForresterWave:PredictiveAnalyticsandDataMiningSolutions,Q12010
(http://www.forrester.com/rb/Research/wave%26trade%3B_predictive_analytics_and_data_mining_solutions%2C
/q/id/56077/t/2),ForresterResearch,1July2008
84. ^Herschel,GarethMagicQuadrantforCustomerDataMiningApplications
(http://mediaproducts.gartner.com/reprints/sas/vol5/article3/article3.html),GartnerInc.,1July2008
85. ^Nisbet,RobertA.(2006)DataMiningTools:WhichOneisBestforCRM?Part1(http://www.information
management.com/specialreports/20060124/10460251.html),InformationManagementSpecialReports,January
2006
86. ^Haughton,DominiqueDeichmann,JoelEshghi,AbdolrezaSayek,SelinTeebagy,NicholasandTopi,Heikki
(2003)AReviewofSoftwarePackagesforDataMining(http://www.jstor.org/pss/30037299),TheAmerican
Statistician,Vol.57,No.4,pp.290309
87. ^Goebel,MichaelGruenwald,Le(1999)ASurveyofDataMiningandKnowledgeDiscoverySoftwareTools
(https://wwwmatthes.in.tum.de/file/1klx69ggd5riv/Enterprise%202.0%20Tool%20Survey/Paper/A%20survey%20
of%20data%20mining%20and%20knowledge%20discovery%20software%20tools.pdf),SIGKDDExplorations,
Vol.1,Issue1,pp.2033

Furtherreading
Cabena,PeterHadjnian,PabloStadler,RolfVerhees,JaapandZanasi,Alessandro(1997)
DiscoveringDataMining:FromConcepttoImplementation,PrenticeHall,ISBN0137439806
M.S.Chen,J.Han,P.S.Yu(1996)"Datamining:anoverviewfromadatabaseperspective
(http://cs.nju.edu.cn/zhouzh/zhouzh.files/course/dm/reading/reading01/chen_tkde96.pdf)".
KnowledgeanddataEngineering,IEEETransactionson8(6),866883
Feldman,RonenandSanger,JamesTheTextMiningHandbook,CambridgeUniversityPress,ISBN
http://en.wikipedia.org/wiki/Data_mining

24/25

25/11/2014

DataminingWikipedia,thefreeencyclopedia

9780521836579
Guo,YikeandGrossman,Robert(editors)(1999)HighPerformanceDataMining:Scaling
Algorithms,ApplicationsandSystems,KluwerAcademicPublishers
Han,Jiawei,MichelineKamber,andJianPei.Datamining:conceptsandtechniques.Morgan
kaufmann,2006.
Hastie,Trevor,Tibshirani,RobertandFriedman,Jerome(2001)TheElementsofStatistical
Learning:DataMining,Inference,andPrediction,Springer,ISBN0387952845
Liu,Bing(2007)WebDataMining:ExploringHyperlinks,ContentsandUsageData,Springer,
ISBN3540378812
Murphy,Chris(16May2011)."IsDataMiningFreeSpeech?".InformationWeek(UMB):12.
Nisbet,RobertElder,JohnMiner,Gary(2009)HandbookofStatisticalAnalysis&DataMining
Applications,AcademicPress/Elsevier,ISBN9780123747655
Poncelet,PascalMasseglia,FlorentandTeisseire,Maguelonne(editors)(October2007)"Data
MiningPatterns:NewMethodsandApplications",InformationScienceReference,ISBN9781
599041629
Tan,PangNingSteinbach,MichaelandKumar,Vipin(2005)IntroductiontoDataMining,ISBN
0321321367
Theodoridis,SergiosandKoutroumbas,Konstantinos(2009)PatternRecognition,4thEdition,
AcademicPress,ISBN9781597492720
Weiss,SholomM.andIndurkhya,Nitin(1998)PredictiveDataMining,MorganKaufmann
Witten,IanH.Frank,EibeHall,MarkA.(30January2011).DataMining:PracticalMachine
LearningToolsandTechniques(3ed.).Elsevier.ISBN9780123748560.(SeealsoFreeWeka
software)
Ye,Nong(2003)TheHandbookofDataMining,Mahwah,NJ:LawrenceErlbaum

Externallinks
Retrievedfrom"http://en.wikipedia.org/w/index.php?
title=Data_mining&oldid=634499498"

WikimediaCommonshas
mediarelatedtoData
mining.

Categories: Datamining Dataanalysis Formalsciences


Thispagewaslastmodifiedon19November2014at07:08.
TextisavailableundertheCreativeCommonsAttributionShareAlikeLicenseadditionaltermsmay
apply.Byusingthissite,youagreetotheTermsofUseandPrivacyPolicy.Wikipediaisa
registeredtrademarkoftheWikimediaFoundation,Inc.,anonprofitorganization.

http://en.wikipedia.org/wiki/Data_mining

25/25

Vous aimerez peut-être aussi