Vous êtes sur la page 1sur 9

Business Intelligence

[Type the company address]


Symbiosis Institute of Business Management
Gayathri T - 12137
Answer any four questions briefly.

1. Compare and contrast the process of BA and BI.

Business intelligence (BI) refers to computer-based techniques used in


spotting, digging-out, and analyzing business data, such as sales revenue by
products and/or departments, or by associated costs and incomes.
BI technologies provide historical, current, and predictive views of business
operations. Common functions of business intelligence technologies are
reporting, online analytical processing, analytics, data mining, business
performance management, benchmarking, text mining, and predictive
analytics.
Business intelligence aims to support better business decision-making. Thus
a BI system can be called a decision support system (DSS). Though the term
business intelligence is sometimes used as a synonym for competitive
intelligence, because they both support decision making, BI uses
technologies, processes, and applications to analyze mostly internal,
structured data and business processes while competitive intelligence
gathers, analyzes and disseminates information with a topical focus on
company competitors. Business intelligence understood broadly can include
the subset of competitive intelligence.

However, there is a very fine line of difference between BA & BI and this is
just same as the difference between Data and Information. Like data is raw
pieces and the information is the structured form of that raw piece. Similarly
Business analytics is something you use to measure the past performances
like sales figures, its deviance's, segmentation of market data for better use
etc. On the contrary BI is where you use these results obtained from your
analytics system.
Analyzing past data is therefore Business analytics, and using data for future
predictions is business intelligence.

2. A fresh MBA graduate who analyzed business sales transactions of a large


business house wrote in her report the following
‘‘Our sales were $25.5 million.’’

According to you, what more she should have written about it?

According to me, she should have also mentioned the points below:

1. Net profit
2. Expenses as a percentage of sales
3. Net sales ( Sales – Returns)

1. Explain the purpose of Data cleaning. What are the tools that you need to
apply for this purpose?

Data cleansing (also known as data scrubbing) is the name of a process of


correcting and - if necessary - eliminating inaccurate records from a particular
database. The purpose of data cleansing is to detect so called dirty data (incorrect,
irrelevant or incomplete parts of the data) to either modify or delete it to ensure
that a given set of data is accurate and consistent with other sets in the system.
This procedure can be performed both within a single and between multiple sets of
data, manually (where possible in simple cases) or automatically (in complex
operations).

Manual data cleansing is usually done by persons who read through a set of
records for verification of accuracy of these, correct spelling errors and complete
missing entries. During this operation some unnecessary or unwanted data is
removed in order to increase efficiency of data processing.
In such a scenario, the possibilities of the following are there:

1. Missing information for a column from one of the data sources;


2. Inconsistent information among different data sources;
3. Orphan records;
4. Outlier data points;
5. Different data types for the same information among various data sources,
leading to improper conversion;
6. Data breaching business rules
In order to ensure that the data warehouse is not infected by any of these
discrepencies, it is important to cleanse the data using a set of business rules,
before it makes its way into the data warehouse
The purpose of data cleansing is to detect so called dirty data (incorrect, irrelevant
or incomplete parts of the data) to either modify or delete it to ensure that a given
set of data is accurate and consistent with other sets in the system.
One tool we can apply in this case is the Missing Value Handling tool of xlminer
wherein we can either delete such entries, or put a substitute value for the missing
values. Also we can use normalization for data cleansing.
2. Bring out basic differences between Data warehouse and Data Mining. Give
suitable application from your area of specialization.

Data Mining provides the Enterprise with intelligence and Data Warehousing
provides the Enterprise with a memory.

Data warehousing is the process that is used to integrate and combine data from
multiple sources and format into a single unified schema. So it provides the
enterprise with a storage mechanism for its huge amount of data. On the other
hand, Data mining is the process of extracting interesting patterns and knowledge
from huge amount of data. So we can apply data mining techniques on the data
warehouse of an enterprise to discover useful patterns.

The primary differences between data mining and data warehousing are the
system designs, methodology used, and the purpose. Data mining is the use of
pattern recognition logic to identity trends within a sample data set and
extrapolate this information against the larger data pool. Data warehousing is the
process of extracting and storing data to allow easier reporting.
Data mining is intended for users who are statistically inclined. These analysts
look for patterns hidden in data, which they are able to extract using statistical
models. Data miners engage in question formulation based primarily on the "law
of large numbers" to identify potentially useful relationships between data
elements, which can be profitable to companies.

Data warehouse users, tend to be data experts who analyze by business


dimensions directly. Data warehousing analysts are concerned with what kinds of
purchases their customers make, and if the analyst can help the customer by
improving the customer experience.

Thus, if we have data regarding the pressure readings of a boiler, we analyze


those readings and identify a trend in the pattern of the readings and use this
logic to design a pressure release valve, it will be data mining. On the other hand
if we just use the data to identify at what times the pressure was dangerously
high for reporting, it is data warehousing.
In short if we do not carry out any process on the extracted data, its data
warehousing, else its data mining

3. Explain the purpose of R-square and Average.


R square is the degree to which a given regression model explains the variation in
the data. i.e. it’s a model which explains how well the model fits the data. It ranges
from 0 to 1, 1 being that the model completely explains the variation in the data.
An average of a variable is the mean value of a particular variable in a data set. The
most common method is the arithmetic mean but there are many other types of
central tendency, such as median.

Section 2 [4 x 2.5=10]

Write the interpretations of the output given below:

1.

The chart depicts the Data Exploration activity in BI wherein we try to explore the data at
hand. Through the histogram, we can gauge that the frequency of a variable can lie in
different range of values. The mean value centers around 25. We can clearly make out that
D
the histogram is left skewed, indicating that there are more number of values in the lower
range than in the higher range.

2.

Diagn
The snapshot provided to us indicates that a high Adj R2 gives the accuracy of the
models using the subsets with the number of predictors. In the above case the subset
with no. of coefficients as 7 and 11 has highest adj R2, but the subset with number of
coefficients as 7 has to be chosen as less number of coefficients gives better results as
there is less multi-colinearity in the model.
Multicollinearity refers to the redundancy in data which in turn leads to more error.
Hence, it should be ensured that Multicollinearity is always avoided.

3.

Rule # Conf. %Antece de nt (a ) Conse que nt (c) S upport(a ) S upport(c) S upport(a U c) Lift Ra tio
1 100 Green=> Red, W hite 2 4 2 2.5
2 100 Green=> Red 2 6 2 1.666667
3 100 Green, W hite= > Red 2 6 2 1.666667
4 100 Green=> W hite 2 7 2 1.428571
5 100 Green, Red= > W hite 2 7 2 1.428571
6 100 Orange= > W hite 2 7 2 1.428571

The table here is denoting the rules and the meaning attached to them. If we have a look
at Case 1, a person buying green will also buy red and white. The number of cases
supporting a person buying green in the dataset is two. The number of cases buying red
and white is 4 and the people who are buying green, red and white is two. Lift ratio for
the same is 2.5. Lift is nothing but the ratio of Confidence to Expected Confidence.

4.
The Regression model table here indicates that there are 12 predictors to gauge whether
a person should be given a loan or not. The general cutoff p value in the logistic
regression is 0.5. The p-value denotes the importance of a particular variable in the
logistic regression equation. Here only two variables that Is age and experience clear the
mandate of p value being more than 0.5 , hence all the other variables can be discarded
and our logistic regression equation can be expressed with these two variables only.

Vous aimerez peut-être aussi