Académique Documents
Professionnel Documents
Culture Documents
However, there is a very fine line of difference between BA & BI and this is
just same as the difference between Data and Information. Like data is raw
pieces and the information is the structured form of that raw piece. Similarly
Business analytics is something you use to measure the past performances
like sales figures, its deviance's, segmentation of market data for better use
etc. On the contrary BI is where you use these results obtained from your
analytics system.
Analyzing past data is therefore Business analytics, and using data for future
predictions is business intelligence.
According to you, what more she should have written about it?
According to me, she should have also mentioned the points below:
1. Net profit
2. Expenses as a percentage of sales
3. Net sales ( Sales – Returns)
1. Explain the purpose of Data cleaning. What are the tools that you need to
apply for this purpose?
Manual data cleansing is usually done by persons who read through a set of
records for verification of accuracy of these, correct spelling errors and complete
missing entries. During this operation some unnecessary or unwanted data is
removed in order to increase efficiency of data processing.
In such a scenario, the possibilities of the following are there:
Data Mining provides the Enterprise with intelligence and Data Warehousing
provides the Enterprise with a memory.
Data warehousing is the process that is used to integrate and combine data from
multiple sources and format into a single unified schema. So it provides the
enterprise with a storage mechanism for its huge amount of data. On the other
hand, Data mining is the process of extracting interesting patterns and knowledge
from huge amount of data. So we can apply data mining techniques on the data
warehouse of an enterprise to discover useful patterns.
The primary differences between data mining and data warehousing are the
system designs, methodology used, and the purpose. Data mining is the use of
pattern recognition logic to identity trends within a sample data set and
extrapolate this information against the larger data pool. Data warehousing is the
process of extracting and storing data to allow easier reporting.
Data mining is intended for users who are statistically inclined. These analysts
look for patterns hidden in data, which they are able to extract using statistical
models. Data miners engage in question formulation based primarily on the "law
of large numbers" to identify potentially useful relationships between data
elements, which can be profitable to companies.
Section 2 [4 x 2.5=10]
1.
The chart depicts the Data Exploration activity in BI wherein we try to explore the data at
hand. Through the histogram, we can gauge that the frequency of a variable can lie in
different range of values. The mean value centers around 25. We can clearly make out that
D
the histogram is left skewed, indicating that there are more number of values in the lower
range than in the higher range.
2.
Diagn
The snapshot provided to us indicates that a high Adj R2 gives the accuracy of the
models using the subsets with the number of predictors. In the above case the subset
with no. of coefficients as 7 and 11 has highest adj R2, but the subset with number of
coefficients as 7 has to be chosen as less number of coefficients gives better results as
there is less multi-colinearity in the model.
Multicollinearity refers to the redundancy in data which in turn leads to more error.
Hence, it should be ensured that Multicollinearity is always avoided.
3.
Rule # Conf. %Antece de nt (a ) Conse que nt (c) S upport(a ) S upport(c) S upport(a U c) Lift Ra tio
1 100 Green=> Red, W hite 2 4 2 2.5
2 100 Green=> Red 2 6 2 1.666667
3 100 Green, W hite= > Red 2 6 2 1.666667
4 100 Green=> W hite 2 7 2 1.428571
5 100 Green, Red= > W hite 2 7 2 1.428571
6 100 Orange= > W hite 2 7 2 1.428571
The table here is denoting the rules and the meaning attached to them. If we have a look
at Case 1, a person buying green will also buy red and white. The number of cases
supporting a person buying green in the dataset is two. The number of cases buying red
and white is 4 and the people who are buying green, red and white is two. Lift ratio for
the same is 2.5. Lift is nothing but the ratio of Confidence to Expected Confidence.
4.
The Regression model table here indicates that there are 12 predictors to gauge whether
a person should be given a loan or not. The general cutoff p value in the logistic
regression is 0.5. The p-value denotes the importance of a particular variable in the
logistic regression equation. Here only two variables that Is age and experience clear the
mandate of p value being more than 0.5 , hence all the other variables can be discarded
and our logistic regression equation can be expressed with these two variables only.