Académique Documents
Professionnel Documents
Culture Documents
TRAITEMENT DE L’INFORMATION
MATHÉMATIQUES APPLIQUÉES-OUTILS D’AIDE À LA DECISION
PRÉSENTÉ PAR DR KOWIR PAMBO BELLO
FONDATEUR DE TCHICOLE & SAGESSE
L’INTELLIGENCE À LA PORTÉE DE TOUS
INTRODUCTION
MOTIVATIONS
TABLEAUX ET jaune
noir
1
1
8%
8%
GRAPHIQUES
orange 2 17% moment effectif fréquence
rose 3 25%
variable sexe effectif fréquence matin 4 33%
rouge 1 8%
f 8 67% midi 4 33%
vert 2 17%
h 4 33% soir 4 33%
Total général 12 100%
Total général 12 100%
Répartition des couleurs Répartition des moments d'achat
Total général 12 100%
30%
25%
soir 33%
20%
15%
midi 33%
10%
5% 33%
matin
0%
bleu jaune noir orange rose rouge vert 0% 5% 10% 15% 20% 25% 30% 35%
COMMENT ÇA SE PASSE?
La variable moment a plusieurs modalités don’t “matin”, “midi” et “soir”
moment effectif fréquence
Dans la base de données, 4 personnes ont acheté des pagnes le matin
D’où l’effectif de la modalité “matin” est “4”.
matin 4 33%
𝑛1 = 4, 𝑛2 = 4, 𝑒𝑡𝑐. .
COMMENT ÇA SE PASSE?
Répartition des couleurs
Grâce aux effectifs où aux fréquences, on trace un diagramme circulaire 30%
DISTRIBUTION DES SEXES
25%
h 20%
33%
15%
10%
5%
f
67%
0%
bleu jaune noir orange rose rouge vert
120%
nombre de
35%
100%
30%
25% 80%
20%
60%
15%
10% 40%
5%
20%
0%
1 2 3 4
0%
0 0,5 1 1,5 2 2,5 3 3,5 4 4,5
COMMENT ÇA SE PASSE?
𝐹𝑐𝑐1 = 𝑓1 = 17%
Nombre de
pagnes 𝐹𝑐𝑐2 = 𝐹𝑐𝑐1 + 𝑓2 = 17% + 33%
vendus Effectif fréquence fréquence cumulée croissante fréquence cumulée décroissante
120% 100%
100% 80%
80% 60%
60%
40%
40%
20%
20%
0%
0% 0 0,5 1 1,5 2 2,5 3 3,5 4 4,5
0 1 2 3 4 5
Fcc
[7.5,9[ 2 17% 100% 17%
80%
total 12 100%
Histogramme des mesures
60%
45%
40%
35% 40%
30%
25% 20%
20%
15%
0%
10% 3 4.5 6 7.5 9
5%
0%
3 4.5 6 7.5
COMMENT ÇA SE PASSE?
classes effectif fréquence FCC FCD 𝑛2 = 5, 𝑙𝑒 𝑛𝑜𝑚𝑏𝑟𝑒 𝑝𝑎𝑔𝑛𝑒𝑠 𝑚𝑒𝑠𝑢𝑟𝑎𝑛𝑡 𝑒𝑛𝑡𝑟𝑒4.5𝑚 𝑒𝑡 6
[3,4.5[ 2 17% 17% 100% 𝑛2 5
𝑓2 = = = 0,42
[4.5,6[ 5 42% 58% 83% 𝑛 12
[6,7.5[ 3 25% 83% 42% 25% des pagnes vendus mesuraient entre 6 et 7.5 m
[7.5,9[ 2 17% 100% 17%
total 12 100% 58% des clients ont acheté des pagnes mesurant au plus 7.5 m
Une classe est un intervalle don’t la borne inf est fermée et la sup ouverte
COMMENT ÇA SE PASSE?
polygone des FCC et FCD classes effectif fréquence FCC FCD
120%
[3,4.5[ 2 17% 17% 100%
40%
20%
0%
3 4.5 6 7.5 9
TRY IT RIGHT NOW AND SUCCESS
NOM CLIENT COULEUR VOITURE TYPE VOIRURE ANNEE D'ARRIVEE NOMBRE D'ENFANTS MONTANT (MILLION)
ABAGA ROLAND ROUGE 4×4 2015 0 10
PAMBO BELLO KOWIR BLANCHE VILLE 2015 1 7.4
MBOUMBA JANVIER BLANCHE VILLE 2016 1 8.3
ALI BONGO SERAPHIN GRISE 4×4 2015 2 15.4
NZIGOU HERMANCE GRISE VILLE 2016 3 5.6
NGUEMA JULIETTE NOIRE VILLE 2015 0 7.4
TRAORE ALI ROUGE PICK UP 2018 5 18.6
OLUBADE KADER ROUGE VILLE 2018 4 9.4
KONE AWA NOIRE VILLE 2015 1 4.5
IBIKUNLE SOUAD BLANCHE 4×4 2016 0 14.3
YABA VIVIEN BLANCHE PICK UP 2016 3 20.5
NGUIMBI ARNOLD NOIRE PICK UP 2016 2 18.4
NZIENGUI OUSMAN ROUGE 4×4 2015 5 17.4
BOUKA STEPHANE GRISE PICK UP 2018 4 12.5
PARAMÈTRES DE TENDANCE CENTRALE-
MEASURES OF CENTRAL TENDENCY
WE WILL FIRST TALK ABOUT DESCRIPTIVE MEASURES OF QUANTITATIVE DATA. THE MOST
IMPORTANT CHARACTERISTIC OF A DATA SET, CENTRAL TENDENCY, WILL BE GIVEN.
A QUOI SERVENT-ILS?
• EXAMPLE
• THE MODE IS THE VALUE THAT OCCURS 35%
30%
MOST OFTEN IN THE DATA. IT IS IMPORTANT 25%
20%
TO NOTE THAT THERE MAY BE MORE THAN 15%
• MO= {2,3}
MEASURES OF POSITION
DESCRIPTION OF MEASURES OF POSITION
• WHILE MEASURES OF CENTRAL TENDENCY ARE IMPORTANT, THEY DO NOT TELL THE WHOLE
STORY. FOR EXAMPLE, SUPPOSE THE MEAN SCORE ON A STATISTICS EXAM IS 80%. FROM THIS
INFORMATION, CAN WE DETERMINE A RANGE IN WHICH MOST PEOPLE SCORED? THE
ANSWER IS NO. THERE ARE TWO OTHER TYPES OF MEASURES, MEASURES OF POSITION AND
VARIABILITY, THAT HELP PAINT A MORE CONCISE PICTURE OF WHAT IS GOING ON IN THE
DATA. IN THIS SECTION, WE WILL CONSIDER THE MEASURES OF POSITION AND DISCUSS
MEASURES OF VARIABILITY IN THE NEXT ONE.
• MEASURES OF POSITION GIVE A RANGE WHERE A CERTAIN PERCENTAGE OF THE DATA FALL.
THE MEASURES WE CONSIDER HERE ARE PERCENTILES AND QUARTILES.
LOOK AT IN A PICTURE
PERCENTILES QUARTILES
DEFINITIONS
• THE PTH PERCENTILE OF THE DATA SET IS A MEASUREMENT SUCH THAT AFTER THE DATA ARE
ORDERED FROM SMALLEST TO LARGEST, AT MOST, P% OF THE DATA ARE AT OR BELOW THIS
VALUE AND AT MOST, (100 - P)% AT OR ABOVE IT.
• THE MEDIAN IS THE VALUE WHERE FIFTY PERCENT OR THE DATA VALUES FALL AT OR BELOW IT.
THEREFORE, THE MEDIAN IS THE 50TH PERCENTILE.
• WE CAN FIND ANY PERCENTILE WE WISH. THERE ARE TWO OTHER IMPORTANT PERCENTILES.
THE 25TH PERCENTILE, TYPICALLY DENOTED, Q1, AND THE 75TH PERCENTILE, TYPICALLY
DENOTED AS Q3. Q1 IS COMMONLY CALLED THE LOWER QUARTILE AND Q3 IS COMMONLY
CALLED THE UPPER QUARTILE.
QUARTILES
THE LOWER
FORMULES QUARTILE THE MEDIAN
FORMULES THE UPPER QUARTILE
FORMULES
𝑛+1 𝑛+1 𝑛+1 𝑛+1 3(𝑛 + 1) 3(𝑛 + 1)
𝑚=𝐸 𝑎𝑛𝑑 𝑑 = 𝐷[ ] 𝑚=𝐸 𝑎𝑛𝑑 𝑑 = 𝐷[ ] 𝑚=𝐸 𝑎𝑛𝑑 𝑑 = 𝐷[ ]
4 4 2 2 4 4
𝑄1 = 𝑥(𝑚) + 𝑑(𝑥 𝑚+1 − 𝑥 𝑚 ) 𝑄2 = 𝑥(𝑚) + 𝑑(𝑥 𝑚+1 − 𝑥 𝑚 ) 𝑄3 = 𝑥(𝑚) + 𝑑(𝑥 𝑚+1 − 𝑥 𝑚 )
EXAMPLE EXAMPLE EXAMPLE
12 + 1 12 + 1 𝑄2 = 𝑥(6) + 0,5 𝑥 7 − 𝑥 6 = 𝟐, 𝟓 𝑄3 = 𝑥(9) + 0,75 𝑥 10 − 𝑥 9 =𝟑
𝑚=𝐸 = 3 𝑎𝑛𝑑 𝑑 = 𝐷 = 0,5
4 4
𝑄1 = 𝑥(3) + 0,25 𝑥 4 − 𝑥 3 =2
THE 5 - NUMBER SUMMARY
• A HELPFUL SUMMARY OF THE DATA IS CALLED THE FIVE NUMBER SUMMARY. THE FIVE NUMBER
SUMMARY CONSISTS OF FIVE VALUES:
• THE MINIMUM
Min=1
• THE LOWER QUARTILE, Q1 Q1=2
Q2=2,5
• THE MEDIAN (ALSO KNOWN AS Q2) Q3=3
Max=4
• THE UPPER QUARTILE, Q3
• THE MAXIMUM
MEASURES OF VARIABILITY
PARAMÈTRES DE DISPERSION
OVERVIEW
• INTRODUCE THE IDEA OF VARIABILITY, CONSIDER THIS EXAMPLE. TWO VENDING
MACHINES A AND B DROP CANDIES WHEN A QUARTER IS INSERTED. THE NUMBER OF PIECES OF
CANDY ONE GETS IS RANDOM. THE FOLLOWING DATA ARE RECORDED FOR SIX TRIALS AT EACH
VENDING MACHINE:
• VENDING MACHINE A PIECES OF CANDY FROM VENDING MACHINE A:
• 1, 2, 3, 3, 5, 4
• MEAN = 3, MEDIAN = 3, MODE = 3
• VENDING MACHING B PIECES OF CANDY FROM VENDING MACHINE B:
• 2, 3, 3, 3, 3, 4
• MEAN = 3, MEDIAN = 3, MODE = 3
There are many ways to describe variability or spread including:
Range
Interquartile range (IQR)
Variance and Standard Deviation
MEASURES OF VARIABILITY
R=4-1=3 1 𝑝
𝑠 2 = 𝑛−1 σ𝑖=1 𝑛𝑖 (𝑥𝑖 − 𝑥)ҧ 2 SAMPLE VARIANCE
𝜎 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Coefficient of variation is 𝐶𝑉 = = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝜇 𝑚𝑒𝑎𝑛
nombre de
pagnes
vendus effectif fréquence fréquence cumulée croissante fréquence cumulée décroissante
1 2 17% 17% 100%
2 4 33% 50% 83%
3 EXAMPLE
4 33% 83% 50%
4 2 17% 100% 17%
Total général 12 100%
1 1
• 𝑠2 = σ4𝑖=1 𝑛𝑖 (𝑥𝑖 − 𝑥)ҧ 2 = (2 × (1 − 2.5)2 + 4 × (2 − 2.5)2 +4 × (3 − 2.5)2 +2 ×
12−1 11
(4 − 2.5)2 ) = 0.99166
• 𝑆 = 𝑆 2 = 0.91166 = 0.95
𝑆 0.95
• 𝐶𝑉 = = = 0.38
𝑋ത 2.5
TRY IT AND SUCCESS
• GROUPE 1: RESUMER LA VARIABLE MOYENNE DE MATHS DU TABLEAU DE DONNÉES SUR LES ÉTUDIANTS
• GROUPE 2: RESUMER LA VARIABLE MOYENNE D’ANGLAIS DU TABLEAU DE DONNÉES SUR LES ÉTUDIANTS
• GROUPE 3: RESUMER LA VARIABLE NOMBRE D’ENFANTS DU TABLEAU DE DONNÉES SUR LA VENTE DE VOITURES.
• GROUPE 4: RESUMER LA VARIABLE PRIX DE LA VOITURE DU TABLEAU DE DONNÉES SUR LA VENTE DE VOITURES
STATISTIQUE DESCRIPTIVE
BIVARIEE
INTRODUCTION
HERE WE WANT TO EVALUATE AN ASSOCIATION OR RELATIONSHIP BETWEEN TWO VARIABLES,
ONE IS A RESPONSE VARIABLE AND THE OTHER THE EXPLONATORY VARIABLE.
TYPES OF VARIABLES-1
X=couleur rouge 1 0 0 1
dominante vert 0 1 1 2
Total
général 4 4 4 12
COMMENT CA SE PASSE?
Distribution conjointe
• IN THIS LESSON, WE WILL FIRST INTRODUCE THE SIMPLE LINEAR REGRESSION MODEL AND THE
CORRELATION COEFFICIENT. INFERENCES FOR THE SIMPLE LINEAR REGRESSION MODEL WILL BE
DISCUSSED, AND THE CRITICAL DISTINCTION BETWEEN INFERENCE FOR MEAN RESPONSE AND
INFERENCE FOR THE OUTCOME WILL BE CLARIFIED.
• REGRESSION ANALYSIS IS A TOOL TO INVESTIGATE HOW TWO OR MORE VARIABLES ARE
RELATED.
• FOR EXAMPLE, ONE MAY WISH TO USE A PERSON'S HEIGHT, GENDER, RACE, ETC. TO PREDICT A
PERSON'S WEIGHT. LET US FIRST CONSIDER THE SIMPLEST CASE: USING A PERSON'S HEIGHT
TO PREDICT THE PERSON'S WEIGHT.
DRAWING MODEL
THERE IS THE RESPONSE VARIABLE, NOTED Y, IT IS THE VARIABLE OF INTEREST OR DEPENDENT
VARIABLE, IN OUR EXAMPLE, Y=MEASURE OF THE LOIN CLOTHES.
THERE IS THE EXPLANATORY VARIABLE, NOTED X, ALSO CALLED PREDICTOR VARIABLE OR
INDEPENDENT VARIABLE. IN THE EXAMPLE, X=NUMBER OF LOIN CLOTHES SELLING.
WHEN THERE IS ONLY ONE PREDICTOR VARIABLE, WE REFER TO THE REGRESSION MODEL AS A
SIMPLE LINEAR REGRESSION MODEL.
IN STATISTICS, WE CAN DESCRIBE HOW VARIABLES ARE RELATED USING A MATHEMATICAL
FUNCTION. THE FUNCTION ALONG WITH OTHER ASSUMPTIONS IS CALLED A MODEL. THERE
ARE MANY MODELS WE CAN CONSIDER. IN THIS CLASS, WE WILL FOCUS ON LINEAR MODELS,
PARTICULARLY, WHEN THERE IS ONLY ONE PREDICTOR VARIABLE. WE REFER TO THIS MODEL AS
THE SIMPLE LINEAR REGRESSION MODEL.
WAYS TO DRAW THE LINEAR MODEL REGRESSION
1. USE PLOTS AND SUMMARY STATISTICS TO DESCRIBE THE RELATIONSHIP BETWEEN THE
RESPONSE VARIABLE AND THE PREDICTOR VARIABLE.
2. PERFORM A HYPOTHESIS TEST FOR THE POPULATION CORRELATION.
3. FIND THE REGRESSION EQUATION AND INTERPRET THE RESULTS.
4. APPLY THE REGRESSION MODEL AND KNOW THE LIMITATIONS.
5. FIND AN INTERVAL ESTIMATE FOR THE POPULATION SLOPE AND INTERPRET THE INTERVAL.
LINEAR RELATIONSHIPS
𝑝
σ𝑖=1 𝑛𝑖 𝑥𝑖 𝑦𝑖 −𝑥ҧ 𝑦ത
•𝑟=
𝜎𝑋 ×𝜎𝑌
PROPERTIES OF THE CORRELATION COEFFICIENT, 𝒓
• −1 ≤ 𝑟 ≤ 1, I.E. 𝑟 TAKES VALUES BETWEEN -1 AND +1, INCLUSIVE.
• THE SIGN OF THE CORRELATION PROVIDES THE DIRECTION OF THE LINEAR RELATIONSHIP. THE SIGN INDICATES
WHETHER THE TWO VARIABLES ARE POSITIVELY OR NEGATIVELY RELATED.
• A CORRELATION OF 0 MEANS THERE IS NO LINEAR RELATIONSHIP.
• THERE ARE NO UNITS ATTACHED TO 𝑟.
• AS THE MAGNITUDE OF 𝑟 APPROACHES 1, THE STRONGER THE LINEAR RELATIONSHIP.
• AS THE MAGNITUDE OF 𝑟 APPROACHES 0, THE WEAKER THE LINEAR RELATIONSHIP.
• IF WE FIT THE SIMPLE LINEAR REGRESSION MODEL BETWEEN Y AND X, THEN 𝑟 HAS THE SAME SIGN AS 𝛽1 , WHICH IS
THE COEFFICIENT OF X IN THE LINEAR REGRESSION EQUATION. -- MORE ON THIS LATER.
• THE CORRELATION VALUE WOULD BE THE SAME REGARDLESS OF WHICH VARIABLE WE DEFINED AS X AND Y.
EXAMPLE
X=nombre de
pagnes Y=MESURE(mètre)
4 6,5
3 7,8
3 5,6
𝑥ҧ = 2,5 𝑦ത = 5,86 𝜎𝑋 = 1; 𝜎𝑌 = 1,45
2 4,6
1 6,2
3 5,9
2 3,8 4 × 6.5 + 3 × 7.8 + 3 × 5.6 + ⋯ + 2 × 5.78
− 2.5 × 5.86
4 8,2 𝑟= 12 = 0.43
2 7,4 1 × 1.45
3 4,06
1 4,5
2 5,78