Vous êtes sur la page 1sur 48

1

Data and statistics

Copyright 2014 by McGraw-Hill Education (Asia). All rights

Assessment Methods and Types

Classification

Percentage

Assignments

20 %

Tests

20 %

Quizzes

20 %

Final exam

40 %

Total

100 %

Learning Objectives
Identify the data and its applications in business
and economy.
Describe the key aspects of data (Elements,
variables, observations and scales).
Differentiate between data classifications and
sources.
Define the descriptive statistics and statistical
inference.
Understand the role of computers in the statistical
analysis, and the data mining.
Point the light on the ethical guidelines for
statistical analysis.

1-4

Introduction
What does business statistics mean?
How does such type of knowledge help the
organizations and people in general?
How can we get information about the
events and use it in making decisions?
What is the meaning of data?
How the data can be collected, coded,
analysed, and interpreted?
All these questions can be answered by understanding
business statistics?
1-5

Statistics
The term statistics refers to the numerical
facts such as average, median, percent,
and index numbers that help us understand
the business and economic situations.
In broader sense, statistics is defined as
the art and science of collecting, coding,
analysing, presenting and interpreting the
data.
1-6

Applications of statistics in
business and economics
The most successful managers and
decision makers understand the information
and know how to use it effectively.
Statistics are used by different fields in the
business
Accounting
Finance
Marketing
Production
Economic

1-7

Data
Data are the facts and figures collected,
analyzed and summarized for presentation
and interpretation.
All the data collected in a particular study
are referred to as the data set.
Elements: are the entities on which the data
are collected.
Variables: are the characteristics of interest
for the elements.
Observations: the set of measurements
obtained for particular element.

1-8

Data
Va
ria
ble
s

Subject
Business
Economic
Accounting
Element

Attendance

collaboration Presentation

Collaborative

Good

Selfish

Very good

Boring

Excellent

s
on
i
t
va
er
s
Ob

1-9

Scales of measurements
There are four types of measurement
scales; nominal, ordinal, interval and ratio.
The scales are distinguished on the
relationships assumed to exist between
objects having different scale values.
The four scale types are ordered in that all
later scales have all the properties of earlier
scalesplus additional properties.
1-10

Continue
Nominal scale: it is not really a scale because it
does not scale objects along any dimension, it
simply labels objects. The nominal variable is
categorized variable but it not ordered.
Ordinal scale: Numbers are used to place objects
in order, but there is no information regarding the
differences (intervals) between points on the scale.
Interval scale: An interval scale is a scale on which
equal intervals between objects, represent equal
differences. The interval differences are meaningful.
Ratio scale : Have a true zero point, ratios are
meaningful
1-11

Comparison of Scales of
measurements
Nominal

Ordinal Interval

Ratio

Frequency distribution

Yes

Yes

Yes

Yes

Median and percentage

No

Yes

Yes

Yes

Add or subtract

No

No

Yes

Yes

Mean and standard deviation

No

No

Yes

Yes

Ratio or coefficient of
variation

No

No

No

Yes

1-12

Classification of data
The data can be classified base on two different
angles:
First: the nature of data: categorical and quantitative
data.
Second: the time of collection: Cross-sectional and
Time series data.

1-13

First: the nature of data


Categorical data: is the data that can be
classified into different specific categories
and usually use nominal indicators. It
always uses nominal or ordinal scale of
measurement.
Quantitative data: it is the data that use
numerical values to indicate how much or
how many. The quantitative data are
obtained using either the interval or ratio
scale of measurement.
1-14

Second: the time of collection


Cross-sectional data: the data that are
collected at approximately same point of
time. This data usually used to describe the
current state of the variable, or investigate
the relationships between the variables.
Time series data: the data that are collected
over several time periods. Ex, the
evaluation of the progress of the student
within three year. The time series data are
frequently found in business and economic.
1-15

Sources of data
The data are usually obtained from

existing sources or from surveys and


experimental studies.
First: existing sources: in some times,
data needed for particular application
already exist, which means the data are
collected by someone and ready to be
used. (Ex, universities, companies and
governments)
1-16

Sources of data..cont..
Second: statistical studies: sometimes the data
wanted for particular application are not found in
the existing sources. In such case, the data can be
obtained using a statistical study.
1. The experimental study; a variable of interest is
first identified. Then, one or more variables are
identified and controlled, so, that data can be
obtained about how they influence the variable of
interest.
2. The observational study; the statistical study
makes no attempt to control the variables of
interests.
1-17

The difference between experimental and


observational studies
Experimental

Observational

The researcher
undertakes some
experiment and not just
make observations

In observational study, no experiment is


conducted. In this type of study the researcher
relies more on data collected. He or she simply
makes an observation and arrives at a conclusion.

There is human
intervention in
experiments.

There is no human intervention in observational


study, the researcher observes things through
various studies

Hawthorne studies are


a good example for
experiments.

The study to determine the relation between


smoking and lung cancer is a typical example for
observational study.

1-18

Data acquisition errors


Managers should always be aware of the
possibility of data errors in statistical studies.
The error in data happens when the data obtained
is not equal to the actual data.
The errors can occur in different ways:
Typing errors.
Outliers.
Wrong answers.and so on.
1-19

Descriptive statistics
Descriptive statistics refer to the data summarized
and presented in form that is easy for the reader.
Such summarization can be tabular, graphical, or
numerical.
Several summarization methods can be found in
shapes such as tabulation, Par Chart, and Pie
Chart.
Such graphical charts and tabulation help the
reader in reading and interpreting the data.
1-20

Descriptive statistics..
Frequencies and percent frequencies of presentation
Frequency

Percent

Good

23.1

Very good

38.5

Excellent

38.5

Total

13

100.0

1-21

Statistical inference
Many situations require information or data
about a large group of elements (individuals,
products, companies..etc)
However, due to the time, cost, and other
considerations, data can be collected from
only a small portion of the group. The larger
group of elements in particular study is called
the population, and the smaller group is
called the sample.
1-22

Statistical inference..
For example, in case we want to know the average
of marks of all Undergraduate Malaysian Students
in business statistics, and there is no time to collect
the data from all universities or that collecting the
data is costly.
So, it is acceptable to choose a sample from the
population. This sample must represent the
population. It is simply adding the value of marks
and divided them by the number of students
(sample size).
1-23

Computers and statistical analysis


Statisticians usually use different types of
software to perform the statistical computations.
This could be due to the large amount of data,
and the complexity of outputs required. Such as
frequencies, correlation, regression..etc.
Therefore, several software programs can help in
this issue.
Programs such as Minitab, SAS, Excel, SPSS,
and Amos can help in analysing and presenting
the data. 1-24

Data mining
Data mining can be defined as the automated
extraction of predictive information from large data
bases.
The overall goal of the data mining process is to
extract information from a data set and transform it
into an understandable structure for further use.
Generally, data mining (sometimes called data or
knowledge discovery)

1-25

The key properties of data mining


Automatic discovery of patterns: the notion of
automatic discovery refers to the execution of data
mining models.
Prediction of likely outcomes: many forms of data
mining are predictive. For example, a model might
predict income based on education and other
demographic factors
Creation of actionable information: data mining
can derive actionable information from large volumes
of data.
Focus on large data sets and databases
1-26

Continue ..
With data mining, a retailer could use point-ofsale records of customer purchases to send
targeted promotions based on an individual's
purchase history.
By mining demographic data from comment, the
retailer could develop products meet the needs of
specific customer segments.

1-27

Process of data mining

1-28

Data Mining Models and Tasks

1-29

Ethical guidelines for statistical practices


Ethical issues arise in statistics because of the
importance role statistics play in the collection,
analysis, presentation and interpretation of the
data.
In statistical study, unethical behavior can take
many forms:
Improper sampling
Inappropriate analysis of the data
Development of misleading graphs
Use of inappropriate summary statistics
Biased interpretation of the statistical results.
1-30

2
Descriptive statistics
Tabular and Graphical
Presentation

Copyright 2014 by McGraw-Hill Education (Asia). All rights

Learning Objectives
Identify how the categorical data can be summarized.
Understand the meaning of frequency distribution,
relative and percent frequencies of the categorical
data.
Identify how the quantitative data can be summarized.
Discussion of the frequency distribution, relative and
percent frequencies of the quantitative data.
Identify the meaning of the steam- and- left display
and how can be conducted.
Identify the meaning of the cross-tabulation and how
can be implemented.
1-32

First: Summarizing Categorical Data


Frequency distribution: it is a tabular summary
of data showing the numbers (frequency) of items
in each of several non overlapping classes.
Relative frequency distribution: relative
frequency distribution is a tabular summary of
data showing the relative frequency of items in
each of several non overlapping classes.

1-33

Continue ..
Percent frequency distribution: a tabular
summarization of the data showing the percent
frequency of the data for each segment.

1-34

Continue (Bar
chart)..
(
Bar chart is a graphical device for summarizing a
frequency, relative or percent frequency distribution.
1- On one axis of the graph (usually the horizontal axis),
specify the labels that are used for classes.
2- Frequency, relative or percent frequency distribution can
be used for the other axis of the chart (usually the
horizontal axis).
3- Using a bar of fixed width drawn above each class label,
we extend the length of the bar until we reach the
frequency, relative or percent frequency distribution.
4- For categorical data, the bars should be separated to
emphasize the fact that each class is separated.
1-35

Continue

(Bie chart)..

Bie Chart provides another graphical device for


presenting categorical data summarized in a
frequency, relative frequency, or percent frequency
distribution.
- To construct Bie Chart;
1- Draw a circle to represent all data.
2- Use the relative frequencies to subdivide circle
into sectors, parts, or segments to the frequency,
relative frequency, or percent frequency for each
sector.
3- Since the circle contains 360 degrees, each
sector= the relative frequency of the sector 360.
1-36

Second: Summarizing Quantitative Data


First: Frequency distribution
As

the

same

in

qualitative

data,

frequency

distribution is a tabular summary of the data showing


the numbers (frequency) of items in each of several
non overlapping classes.
However, with quantitative data, the data must be
more careful in defining the non overlapping.

1-37

Continue

To define the frequency distribution of quantitative


data, three steps have to be taken:
Determine the number of non overlapping classes
Determine the width of the class
Determine the class limit

1-38

Number of classes
Classes are formed by specifying ranges that
will be used to group the data. Usually used
between 5 to 20.
It is recommended using 5 or 6 classes for a
small number of data. For the large number of the
data, a large number of classes will be used.

1-39

Width of the classes


In general, the width of the classes is recommended
to be same. There a relationship between the
number of class and the width, usually large number
of the class indicated a smaller class width and vice
versa.
The expression of the width of the class as following:

1-40

Class limit and midpoint


Class limit
Each data item belongs to one and only one class.
The lower class limit identifies the smallest possible data
value assigned to the class
The upper class limit identifies the largest possible data
value assigned to the class.
Class midpoint
The class midpoints are the values halfway between
lower and upper class limits.

1-41

Relative frequency and percent frequency


distributions for quantitative data.

1-42

Summarizing Quantitative Data, continue..

Second : graphical charts


1. Dot plot
One of the simplest graphical summaries.
It is represented by horizontal axis shows the range
of data and each data value is represented by dot
placed on the axis.

1-43

Summarizing Quantitative Data, continue..

2. Histogram
It is common graphical presentation of quantitative
data, and prepared for data previously summarized in
either frequency, relative frequency or percent
frequency distribution.
it is constructed by placing the variables on the
horizontal axis, and frequency, relative frequency or
percent frequency distribution on the vertical axis.
Frequency, relative frequency or percent frequency
distribution of each class is shown by drawing
rectangle that is determined by the class limits on the
horizontal axis , and whose height is the
corresponding frequency , relative frequency or
percent frequency distribution.
1-44

Cumulative frequency distribution


The cumulative frequency distribution uses the
number of classes, class widths, and class limits
developed for frequency distribution.
Cumulative relative frequency distribution shows
the distribution of data items, while a cumulative
percent frequency distribution shows the percentage
of data items with value less than or equal to the
upper limit of each class.
However, rather than showing the frequency of
each class, the cumulative frequency distribution
shows a number of data items with values less than
or equal to the upper class limit of each class.
1-45

3. Ogive graph
It is a graph of a cumulative distribution shows
data values on the horizontal axis and either the
cumulative frequency, the cumulative relative
frequency, or cumulative percent frequency on the
vertical axis.
The ogive is constructed by plotting a point
corresponding to the cumulative frequency of each
class.

1-46

Crosstabulation
It is a tabular summary of data for two variables,
and its usually used for examining the relationships
between two variables.

1-47

Exploratory data analysis


Stem- and leaf Display
It is one of the techniques for summarizing
the data.
1. Arrange the leading digits of each data
value to the left of a vertical line.
2. To the right of the vertical line, we record the
last digit for each data value.
3. The numbers on the left represent the stem
and the numbers on right are the leafs.

1-48

Vous aimerez peut-être aussi