Vous êtes sur la page 1sur 21

STATISTICAL / DATA PRESENTATION TOOLS

Descriptive statistics enable us to understand data through summary


values and graphical presentations. Summary values not only include
the average, but also the spread, median, mode, range, and standard
deviation.

It is important to look at summary statistics along with the data set


to understand the entire picture, as the same summary statistics may
describe very different data sets. Descriptive statistics can be
illustrated in an understandable fashion by presenting them
graphically using statistical and data presentation tools.
STATISTICAL / DATA PRESENTATION TOOLS

When creating graphic displays, keep in mind the following questions:

• What am I trying to communicate?


• Who is my audience?
• What might prevent them from understanding this display?
• Does the display tell the entire story?
STATISTICAL / DATA PRESENTATION TOOLS

Several types of statistical/data presentation tools exist, including:

(a) charts displaying frequencies (bar, pie, and Pareto charts,


(b) charts displaying trends (run and control charts),
(c) charts displaying distributions (histograms), and
(d) charts displaying associations (scatter diagrams).

Different types of data require different kinds of statistical tools.


There are two types of data. Attribute data are countable data or
data that can be put into categories: e.g., the number of people
willing to pay, the number of complaints, percentage who want
blue/percentage who want red/percentage who want yellow. Variable
data are measurement data, based on some continuous scale: e.g.,
length, time, cost.
CHOOSING DATA DISPLAY TOOLS

To Show Use Data Needed


Frequency of occurrence: Bar chart Tallies by category (data
Simple percentages or Pie chart can be attribute data or
comparisons of magnitude Pareto chart variable data divided into
categories)
Trends over time Line graph Measurements taken in
Run chart chronological order
Control chart (attribute or variable data
can be used)
Distribution: Variation not Histograms Forty or more
related to time measurements (not
(distributions) necessarily in
chronological order,
variable data)
Association: Looking for a Scatter diagram Forty or more paired
correlation between two measurements (measures
things of both things of interest,
variable data)
BOXPLOT

In descriptive statistics, a box plot is a convenient way of graphically


depicting groups of numerical data through their five-number summaries:

• the smallest observation (sample minimum),


• lower quartile (Q1),
• median (Q2),
• upper quartile (Q3),
• and largest observation (sample maximum).

A boxplot may also indicate which observations, if any, might be


considered outliers. Boxplots display differences between populations
without making any assumptions of the underlying statistical distribution:
they are non-parametric.

The spacings between the different parts of the box help indicate the
degree of dispersion (spread) and skewness in the data, and identify
outliers. Boxplots can be drawn either horizontally or vertically.
BOXPLOT
STEMPLOT
A stemplot (or stem-and-leaf plot), in statistics, is a device for
presenting quantitative data in a graphical format, similar to a
histogram, to assist in visualizing the shape of a distribution.

Unlike histograms, stemplots retain the original data to at least two


significant digits, and put the data in order, thereby easing the move
to order-based inference and non-parametric statistics.

A basic stemplot contains two columns separated by a vertical line.


The left column contains the stems and the right column contains the
leaves.

Stem and Leaf Graph used for Japanese Train Time Table
CONSTRUCTING A STEMPLOT

To construct a stem plot, the observations must first be sorted in


ascending order. Here is the sorted set of data values that will be
used in the following example:

44 46 47 49 63 64 66 68 68 72 72 75 76 81 84 88 106

Next, we must determine what the stems will represent and what the
leaves will represent.

Typically, the leaf contains the last digit of the number and the stem
contains all of the other digits. In the case of very large numbers, the
data values may be rounded to a particular place value (such as the
hundreds place) that will be used for the leaves. The remaining digits
to the left of the rounded place value are used as the stem.

In this example, the leaf represents the ones place and the stem will
represent the rest of the number (tens place and higher).
CONSTRUCTING A STEMPLOT
The stemplot is drawn with two columns separated by a vertical line.
The stems are listed to the left of the vertical line. It is important that
each stem is listed only once and that no numbers are skipped, even if
it means that some stems have no leaves. The leaves are listed in
increasing order in a row to the right of each stem.
It is important to note that when there is a repeated number in the
data (such as two 44's) then the plot must reflect such (so the plot
would look like 4 | 4 4 6 7 9 if it had the numbers 44 44 46 47 49)

4|4679
5|
6|34688
7|2256
8|148
9|
10 | 6
key: 6|3=63
leaf unit: 1.0
stem unit: 10.0
CONSTRUCTING A STEMPLOT

Rounding may be needed to create a stemplot. Based on the following


set of data, the stem plot below would be created:
-23.678758, -12.45, -3.4, 4.43, 5.5, 5.678, 16.87, 24.7, 56.8

For negative numbers, a negative is placed in front of the stem unit,


which is still the value X / 10. Non-integers are rounded. This allowed
the stem and leaf plot to retain its shape, even for more complicated
data sets. As in this example below:
-2 | 4
-1 | 2
-0 | 3
0|466
1|7
2|5
3|
4|
5|7
SCATTER PLOT
A scatter plot or scattergraph is a type of mathematical diagram
using Cartesian coordinates to display values for two variables for a set
of data.

The data is displayed as a collection of points, each having the value of


one variable determining the position on the horizontal axis and the
value of the other variable determining the position on the vertical axis.
This kind of plot is also called a scatter chart, scatter diagram and
scatter graph.
SCATTER PLOT
A scatter plot is used when a variable exists that is under the control
of the experimenter. If a parameter exists that is systematically
incremented and/or decremented by the other, it is called the control
parameter or independent variable and is customarily plotted along the
horizontal axis. The measured or dependent variable is customarily
plotted along the vertical axis. If no dependent variable exists, either
type of variable can be plotted on either axis and a scatter plot will
illustrate only the degree of correlation (not causation) between two
variables.
SCATTER PLOT
A 3D scatter plot allows for the visualization of multivariate data of
up to four dimensions. The Scatter plot takes multiple scalar
variables and uses them for different axes in phase space. The
different variables are combined to form coordinates in the phase
space and they are displayed using glyphs and colored using another
scalar variable
HISTOGRAM
In statistics, a histogram is a graphical representation, showing a
visual impression of the distribution of experimental data. It is an
estimate of the probability distribution of a continuous variable. A
histogram consists of tabular frequencies, shown as adjacent
rectangles, erected over discrete intervals (bins), with an area equal
to the frequency of the observations in the interval. The height of a
rectangle is also equal to the frequency density of the interval, i.e.,
the frequency divided by the width of the interval. The total area of
the histogram is equal to the number of data. A histogram may also
be normalized displaying relative frequencies. It then shows the
proportion of cases that fall into each of several categories, with the
total area equalling 1.

The categories are usually specified as consecutive, non-overlapping


intervals of a variable. The categories (intervals) must be adjacent,
and often are chosen to be of the same size.
HISTOGRAM
Histograms are used to plot density of data, and often for density
estimation: estimating the probability density function of the
underlying variable. The total area of a histogram used for probability
density is always normalized to 1. If the length of the intervals on the
x-axis are all 1, then a histogram is identical to a relative frequency
plot.
BAR CHART

A bar chart or bar graph is a chart with rectangular bars with


lengths proportional to the values that they represent. The bars can
also be plotted horizontally.
BAR CHART

Bar charts are used for plotting discrete (or 'discontinuous') data i.e.
data which has discrete values and is not continuous.

Some examples of discontinuous data include 'shoe size' or 'eye


color', for which you would use a bar chart.

In contrast, some examples of continuous data would be 'height' or


'weight'.

A bar chart is very useful if you are trying to record certain


information whether it is continuous or not continuous data.
PIE CHART
A pie chart (or a circle graph) is a circular chart divided into
sectors, illustrating proportion. In a pie chart, the arc length of each
sector (and consequently its central angle and area), is proportional to
the quantity it represents. When angles are measured with 1 turn as
unit then a number of percent is identified with the same number of
centiturns. Together, the sectors create a full disk. It is named for its
resemblance to a pie which has been sliced.

Pie chart of populations of English


native speakers
DOT PLOTS

A dot chart or dot plot is a statistical chart consisting of group of


data points plotted on a simple scale. Dot plots are used for
continuous, quantitative, univariate data. Data points may be labelled
if there are few of them.

Dot plots are one of the simplest statistical plots, and are suitable for
small to moderate sized data sets. They are useful for highlighting
clusters and gaps, as well as outliers.

Their other advantage is the conservation of numerical information.


When dealing with larger data sets (around 20–30 or more data
points) the related stemplot, box plot or histogram may be more
efficient, as dot plots may become too cluttered after this point.
DOT PLOTS

A dot plot of 50 random values from 0 to 9.


LINE GRAPHS
A line chart or line graph is a type of graph, which displays
information as a series of data points connected by straight line
segments. It is a basic type of chart common in many fields. It is an
extension of a scatter graph, and is created by connecting a series of
points that represent individual measurements with line segments. A
line chart is often used to visualize a trend in data over intervals of
time – a time series – thus the line is often drawn chronologically

Vous aimerez peut-être aussi