Vous êtes sur la page 1sur 28

N P Singh

BI refers to the tools and process used by an organization to gain


intelligent insight into its business through the use of a variety
of actionable reports.
Organizations have a lot of data flowing in and out of different
departments, disparate systems, and numerous business
process. The process of storing, retrieving, cleansing, and
translating this data into meaningful information and
subsequently managing this information is called Enterprise
Information Management (EIM).
BI is a component of EIM which concerns itself with finding
meaning to this the information through monitoring, analysis,
dissection (slicing and dicing), and other measurements.
BI components include online analytical processing, Ad Hoc
Querying/Reporting, Operational Reporting/Analytics,
Analytical Dashboards/Scorecards, Data Mining, Forecasting,
and Statistical Analysis.

Component

Functionality

Market Leading Tool

Operational
Canned and static reports geared toward reporting needs that change
Actuate, Information Builders,
Reporting/Analytic very infrequently and the use of analytic dashboards which employ visual Business Objects
s
alerts to present status updates and areas of concern for key
performance indicators (KPIs)
Ad Hoc
Managed queries which are drawn from an environment with a defined
Querying/Reportin set of query options that can be executed
g

Business Objects, Cognos

Online Analytical
An approach to quickly provide answers to analytical queries that are
Processing (OLAP) multidimensional in nature. At the heart of OLAP is cubes which is an
arrangement of data in arrays to allow fast analysis

Hyperion, Oracle, Micro


Strategy, Cognos

OLAP can be used for data mining (see below)


Dashboards

Monitor critical business metrics, alert issues that need attention, and
manage indicators in order to take action faster. Allows for tracking of
performance through scorecards and collaboration with others to follow
recommended actions to improve organizational performance

Cognos, Business Objects,


Hyperion

Data
Mining/Forecastin
g/
Statistical Analysis

The concept of predictive analytics, using unmanaged analytical models


to exploit data and reveal future business performance, events, and
customer behavior. It is the process of sorting through data to identify
patterns and relationships

SAS, IBM

Data mining is the process of discovering meaningful new correlations, patterns, and trends

by sifting through large amounts of data stored in repositories, using pattern recognition
technologies as well as statistical and mathematical techniques. Data Mining can do
Task
Description

Explanation
Describe patterns and trends lying within data. High-quality description can often be accomplished by
exploratory data analysis, a graphical method of exploring data in search of patterns and trends.

Classification In classification, there is a target categorical variable, such as income bracket, which, for example, could be
partitioned into three classes or categories: high income, middle income, and low income. The data mining
model examines a large set of records, each record containing information on the target variable as well as
a set of input or predictor variables.
Estimation

Estimation is similar to classification except that the target variable is numerical rather than categorical.
Example, estimating the amount of money a randomly chosen family of four will spend for back-to-school
shopping this fall.

Prediction

Prediction is similar to classification and estimation, except that for prediction, the results lie in the future

Clustering

Clustering refers to the grouping of records, observations, or cases into classes of similar objects. A cluster
is a collection of records that are similar to one another, and dissimilar to records in other clusters.
Clustering differs from classification in that there is no target variable for clustering.

Association

The association task for data mining is the job of finding which attributes go together. Most prevalent in
the business world, where it is known as affinity analysis or market basket analysis, the task of association
seeks to uncover rules for quantifying the relationship between two or more attributes. Example, examining
the proportion of children whose parents read to them who are themselves good readers.

Data analytics (DA) is the science of examining raw data


with the purpose of drawing conclusions about that
information.
Data analytics is used in many industries to allow
companies and organization to make better business
decisions and in the sciences to verify or disprove existing
models or theories.
Data analytics is distinguished from data mining by the
scope, purpose and focus of the analysis.
Data miners sort through huge data sets using
sophisticated software to identify undiscovered patterns
and establish hidden relationships.
Data analytics focuses on inference, the process of
deriving a conclusion based solely on what is already
known by the researcher.

The term "analytics" has been used by many business


intelligence (BI) software vendors as a buzzword to describe
quite different functions.
Data analytics is used to describe everything from online
analytical processing (OLAP) to CRM analytics in call centers.
Banks and credit cards companies, for instance, analyze
withdrawal and spending patterns to prevent fraud or identity
theft.
Ecommerce companies examine Web site traffic or navigation
patterns to determine which customers are more or less likely
to buy a product or service based upon prior purchases or
viewing trends.
Modern data analytics often use information dashboards
supported by real-time data streams.
So-called real-time analytics involves dynamic analysis and
reporting, based on data entered into a system less than one
minute before the actual time of use.

Analytics - The science of analysis data,


statistical and quantitative analysis,
explanatory and predictive modeling, and
fact-based decision-making.
Subset of business intelligence (BI).
Data analytics refers to the process of
organizing and analyzing all of that data.

Data analytics (DA) involves processes and


activities designed to obtain and evaluate data to
extract useful information.
The results of DA may be used to identify areas of
key risk, fraud, errors or misuse; improve business
efficiencies; verify process effectiveness; and
influence business decisions.
There are many issues to consider when starting a
new DA program, including maximizing the return
on investment (ROI), complying with project
budgets, managing false positives, and ensuring
the protection and confidentiality of the source
data and results.

Once ad hoc DA methods have been employed


and the user understands the basic business rules
within the data, more sophisticated statistical
techniques can be used to uncover more
complex business rules and identify, for further
review, transactions that do not follow these
rules.
One development methodology in common use
is the CRoss-Industry Standard Process for Data
Mining (CRISP-DM) reference model.

Data mining is the analysis of large data sets


by a computer program to identify patterns
(business rules) that exist within the data.
This information is then used to flag records that
have an unlikely probability of matching those
rules.
Data mining can be used by the business for root
cause analysis and to identify exceptions in
existing data for correction purposes.
Organizations can use the analysis to validate
business rules, examine data quality and identify
outlying transactions for follow up.

Predictive DA involves the analysis of large


data sets for the purpose of predicting
future activity patterns based on past
transactions.
Fuzzy logic matching of data helps to
identify potentially fraudulent transactions
or duplicate records for correctional
purposes.

Duplicate
transactions

Exact duplicatesAll fields are identical within a date


range.
Fuzzy duplicatesSome fields are identical, with at least
one or more fields that are similar or different.

Data quality

Fields where key data elements are missing or invalid are


identified.
Date ranges fall outside of normal values.
There are sequence gaps in key fields, such as the check or
payment number.

Transaction
limits

Single and multiple accumulated values exceed limits.


Transaction amounts exceed, or are just below, the
authorization limit.

File
matching

There is a two- or three-way match between related


transactions.
UnmatchedOrphaned records occur between related files.

Charact
er
pattern
matchin
g

Prohibited key words


Prohibited vendors/employeesPercent of names matched against a list
of restricted names:
Matched to the US Office of Foreign Assets Controls Specially
Designated Nationals (OFAC SDN) list to
identify terrorists: www.treasury.gov/resource-center/sanctions/SDNList/Pages/default.aspx
Matched to the General Services Administration Excluded Parties List
System (GSA EPLS) to identify parties that are excluded from receiving
federal contracts: https://www.epls.gov/epls/search.do
Matched to the Office of Inspector General List of Excluded
Individuals/Entities (OIG LEIE) to identify individuals and organizations
blocked for federally funded healthcare providers:
www.oig.hhs.gov/fraud/exclusions/exclusions_list.asp
Phonetic string matchThe phonetic name is matched against the list of
restricted names:
SOUNDSLIKE algorithm of the New York State Identification and
Intelligence System (NYSIIS) Code: www.dropby.com/NYSIIS.html
Fuzzy address matchA portion of address values are matched against
the list of restricted addresses.

Segregation
of duties
(SoD)

Performed at the security table level to identify potential


conflicts
Performed at the transaction level to identify violations that
occurred

Aging

Single record age (number of days between Create Date and


Approval Date)
Multiple files aging (Invoice Create Date prior to PO Create Date)

Numeric
pattern
matching

Benford analysisTransaction amounts fail to follow expected


digital frequencies.
Numeric sequence or gapsSequences of check numbers
Frequent transactions have even dollar amounts.

Date/time
matching

Transaction dates occur on a weekend or holiday.


Transactions occur at odd hours.

10

Variance
tests

Comparison of the number of and amount of variances to a yearly


average:
Is there a product price variance spike?
Is there an excessive spike in vendor invoice counts?

Technology
Cost of storage and computing power has decreased

exponentially

Data

Third-party data is becoming increasingly Available


Companies are learning to do more with their internal

data

Proliferation of analytical techniques and


Tools
Great analytic ideas keep coming from statistics,

economics, machine learning, marketing,


Free tools like R

Everyone is entitled to their own opinions,


but not their own facts.
Daniel Patrick Moynihan US Senator

Analytics is all about making consistent


strategic business decisions based on the
facts not just subjective judgments or
opinions.

Big data analytics is the process of


examining large amounts of different data
types, or big data, in an effort to uncover
hidden patterns, unknown correlations and
other useful information.

Ingredients
Wildly increasing loads of data
Cultural Shifts as organizations learn to

appreciate , embrace and integrate predictive


technology
Improved software solutions

What is predicted?
The kind of behavior (i.e., action, event, or

happening) to predict for each individual, stock, or


other kind of element.

What is done about it?


The decision driven by prediction.

PA Application
Targeting Direct Marketing

What is Predicted:
Which customers will respond to marketing

contact.

What is done about it:


Contact customers more likely to respond

The Prediction Effect:


A little prediction goes a long way

Imagine you are have a company with a mailing


list of a million customers.
It cost you Rs 2 to mail to each one.
You have experience that 1 out of 100 will buy
your product.
Means 10,000 responses
You take your chance and mail to the entire list
It your profit is Rs 220 for each positive response,
then
Overall Profit= revenue-cost= (220*10,000 2*1
million)
Rs 200,000 is your profit.

How PA will help you to increase your


profitability?

Three Steps process


Characteristics of an Individual
Predictive Model

Predictive Score

Predictive Model
A mechanism that predicts a behavior of individual,

such as a click, buy, die, or lie. It takes characteristics


of the individual as input , provide a predictive score
as output. The higher the score, the more likely it is
that the individual will exhibit the predictive behavior.

If the individual
Is still in high school
AND

Expects to graduate college within three years


AND

Indicates certain military interest


AND

Has not been shown this ad yet


THEN the probability of clicking on the ad for Art Institute is

13.5%

Against the overall probability of 2.7%


It is interesting to note that who have indicated a
military interest are more likely to show interest in the
Art Institute

This a very simple model


There is a need to compare various models to
have most accurate prediction
You need super math for that consist of
complex algorithms.
Means before using a model you have to built
it.
Machine Learning builds the predictive
models

Machine learning is a subfield of computer


science and statistics that deals with the
construction and study of systems that can learn
from data, rather than follow only explicitly
programmed instructions.
Besides CS and Statistics, it has strong ties to
artificial intelligence and optimization, which
deliver both methods and theory to the field.
Machine learning is employed in a range of
computing tasks where designing and
programming explicit, rule-based algorithms is
infeasible.

Machine learning focuses on prediction,


based on known properties learned from the
training data.
Data mining focuses on the discovery of
(previously) unknown properties in the data.
This is the analysis step of Knowledge
Discovery in Databases.

Vous aimerez peut-être aussi