Chapter1 ML 21-22

28/11/2021
Apprentissage Automatique
IGE 42 - 5éme année (2021-2022)
Spécialité : Télécoms et Technologies Numériques
Dr. FEZZA S.
Objectifs
• 2 grands objectifs du cours
 apprendre les fondements théoriques
 savoir les mettre en pratique
• Appréhender les bases des méthodes de classification

et d'apprentissage machine.
• Être capables de distinguer les méthodes d'apprentissage, ainsi

que leurs contextes d'application.
• Mettre en pratique des méthodes de base en apprentissage.
• Être capables de modéliser un problème de classification réel

© FEZZA S. v21‐22
et complexe par un modèle d'apprentissage statistique approprié.
Machine Learning 2
1
28/11/2021
Autres cours
• Liens avec d’autres cours
• Algèbre
• Probabilités et statistiques
• Analyse de données
• Traitement et analyse de données massives

Machine Learning 3
TP
• Homework
• Code (avec commentaries)
• Rapport (5 pages max)
• Pour chaque chapitre
• Binôme
• fezza.student@gmail.com
Machine Learning 4
2
28/11/2021
Ressources
Machine Learning 5
Plan
Introduction
Régression
Apprentissage non supervisé
Séparateurs a vaste marge
Arbres de décision
Apprentissage bayésien
Réseaux de neurones artificiels
Modèles de Markov cachés
Apprentissage par renforcement
Machine Learning 6
3
28/11/2021
Plan
Introduction
Régression
Apprentissage non supervisé
Séparateurs a vaste marge
Arbres de décision
Apprentissage bayésien
Réseaux de neurones artificiels
Modèles de Markov cachés
Apprentissage par renforcement
Machine Learning 7
Objectifs
• Définir AI, ML, DL
• Intérêt du ML
• Caractérisation des techniques d’apprentissage machine
• Critères de performance
• Identifier si un problème relève ou non du ML.
• Donner des exemples de cas concrets relevant de grandes classes

de problèmes de ML.
Machine Learning 8
4
28/11/2021
ML Applications
Machine Learning 9
Artificial intelligence,
machine learning, and deep learning
Machine Learning 10
5
28/11/2021
Some quotes
Machine Learning 11
History of ML
Machine Learning 12
6
28/11/2021
Pourquoi utilisé le ML?

• 2 situations possibles :
1. On connait le calcul à effectuer pour résoudre notre problème:
Dans ce cas, c’est facile! On entre ce calcul dans l’ordinateur (c’est ce qu’on appelle la
programmation) et l’ordinateur nous donne le résultat.
Exemple : Déterminer la structure d’un pont
2. On ne connait pas le calcul qui résout notre problème:

On est bloqué. Impossible de donner à un ordinateur un calcul que nous ne connaissons pas.
Exemples : Reconnaitre un visage sur une photo, prédire le cours de la bourse, éliminer le cancer,
conduire une voiture
• Le ML a justement été inventé pour venir débloquer la situation 2 (quand on ne connait pas le
calcul), en utilisant une technique audacieuse  Laisser la Machine apprendre à partir
d’expériences
Machine Learning 13
Pourquoi utilisé le ML?
• Le ML peut servir à résoudre des problèmes

• que l’on ne sait pas résoudre,
• que l’on sait résoudre, mais qu’on sait pas formaliser en termes algorithmiques
(c’est le cas par exemple de la reconnaissance d’images ou de la compréhension du
langage naturel),
• que l’on sait résoudre, mais avec des procédures beaucoup trop gourmandes en
ressources informatiques (c’est le cas par exemple de la prédiction d’interactions
entre molécules de grande taille, pour lesquelles les simulations sont très lourdes).
• Le ML est donc utilisé quand les données sont abondantes (relativement), mais les
connaissances peu accessibles ou peu développées.
Machine Learning 14
7
28/11/2021
Définition du ML
• L’apprentissage automatique ou l’apprentissage par machine (Machine Learning) s'intéresse à
la conception, l'analyse, l'implémentation et l’application de programmes d’ordinateur capables de
s’améliorer, au fil du temps, soit sur la base de leur propre expérience, soit à partir des données
antérieures fournies par d’autres programmes.
• More engineering‐oriented definition:

A computer program is said to learn from experience E with respect to some task T and some
performance measure P, if its performance on T, as measured by P, improves with experience E.
—Tom Mitchell, 1997
Machine Learning 15
Définition du ML
Machine Learning 16
8
28/11/2021
Définition du ML
Machine Learning 17
Définition du ML
• How can we learn to perform image classification?
Task
Image
Classification
Machine Learning 18
9
28/11/2021
Définition du ML
Task Experience
Image
Data
Classification
Machine Learning 19
Définition du ML
Dog
Image
Classifier
Cat
ML model
Performance Accuracy
measure
Image
Classifier
ML model
Machine Learning 20
10
28/11/2021
Définition du ML
Task Experience
Image
Data
Classification Performance
measure
Accuracy
Machine Learning 21
Example 1: Handwritten digit classification

• Comment développer une intelligence artificielle ?
 Exemple : reconnaître des caractères manuscrits
• Par énumération de règles ?

 Si intensité du pixel à la position (15,24) est plus
grand que 50, et pixel à la position ... alors c’est
un «3»
• Trop fastidieux, difficile de couvrir tous les cas

d’espèce
Machine Learning 22
11
28/11/2021

• Comment développer une intelligence artificielle ?
 Exemple : reconnaître des caractères manuscrits
• En donnant à l’ordinateur la capacité d’apprendre à le faire!

• Laisser l’ordinateur faire des essais et apprendre de ses erreurs
Apprentissage Automatique / Machine Learning

Machine Learning 23

• E: Experience
MNIST dataset
Machine Learning 24
12
28/11/2021

• P: Performance measure
Classification:
... et l’algorithme retourne un «programme» capable de généraliser à de nouvelles données
données d’entraînement vs. généralisation
Machine Learning 25
Example 2: Spam emails

Machine Learning 26
13
28/11/2021
Example 2: Spam emails
Machine Learning 27
ML: a new programming paradigm

• Could a computer surprise us? Rather than programmers crafting data‐processing rules by hand,
could a computer automatically learn these rules by looking at data?
(model)
• A machine‐learning system is trained rather than explicitly programmed. It’s presented with many
examples relevant to a task, and it finds statistical structure in these examples that eventually allows
the system to come up with rules for automating the task.
• Machine Learning is the science of programming computers so they can learn from data.
Machine Learning 28
14
28/11/2021
Machine Learning Vs Traditional Programming
Machine Learning 29
Ingrédients du machine learning
• Le machine learning repose sur deux piliers fondamentaux :

• Les données, qui sont les exemples à partir duquel l’algorithme va apprendre
• L’algorithme d’apprentissage, qui est la procédure que l’on fait tourner sur ces
données pour produire un modèle. On appelle entraînement le fait de faire
tourner un algorithme d’apprentissage sur un jeu de données (training set).
• Ces deux piliers sont aussi importants l’un que l’autre. D’une part, aucun algorithme
d’apprentissage ne pourra créer un bon modèle à partir de données qui ne sont pas
pertinentes – c’est le concept garbage in, garbage out qui stipule qu’un algorithme
d’apprentissage auquel on fournit des données de mauvaise qualité ne pourra rien en
faire d’autre que des prédictions de mauvaise qualité. D’autre part, un modèle appris
avec un algorithme inadapté sur des données pertinentes ne pourra pas être de
bonne qualité.
• Le travail de machine learner ou de data scientist est un travail d’ingénierie consistant

à préparer les données afin d’éliminer les données aberrantes, gérer les données
manquantes, choisir une représentation pertinente, etc.
Machine Learning 30
15
28/11/2021
Notations
• Ensemble d’entraînement (training set), entrée (input), cible (target)
 on fournit à l’algorithme des données d’entraînement ...
 on note l’ensemble d’entraînement (training set)
= Number of training examples
= ith “input” variable / features
= ith “output” variable / “target” variable
Machine Learning 31
Comment apprendre ?
• Mais comment apprendre ?
Pour donner à un ordinateur la capacité d’apprendre, on utilise
des méthodes d’apprentissage qui sont fortement inspirées de la
façon dont nous, les êtres humains, apprenons à faire des
choses. Parmi ces méthodes, on compte :
 L’apprentissage supervisé (Supervised Learning)

 L’apprentissage non supervisé (Unsupervised Learning)
 L’apprentissage par renforcement (Reinforcement Learning)
Machine Learning 32
16
28/11/2021
Supervised Learning
• The idea of supervised learning is that the learning system is given
inputs and told which specific outputs should be associated with
them. We divide up supervised learning based on whether the
outputs are drawn from a small finite set (classification) or a large
finite or continuous set (regression).

Machine Learning 33
Supervised Learning
• The idea of supervised learning is that the learning system is given
inputs and told which specific outputs should be associated with
them. We divide up supervised learning based on whether the
outputs are drawn from a small finite set (classification) or a large
finite or continuous set (regression).

il y a une cible à prédire
(target value, “right answers” given)
Machine Learning 34
17
28/11/2021
Supervised Learning
• L’apprentissage supervisé est lorsqu’on a une cible à prédire
 classification : la cible est un indice de classe

o exemple : reconnaissance de caractères
x : vecteur des intensités de tous les pixels de l’image
y : identité du caractère
 régression : la cible est un nombre réel

o exemple : prédiction de la valeur d’une action à la bourse
x : vecteur contenant l’information sur l’activité économique
de la journée
y : valeur d’une action à la bourse le lendemain
Machine Learning 35
Supervised Learning
Price
Size2
Breast cancer (malignant, benign)
Machine Learning 36
18
28/11/2021
Supervised Learning
• Mais tout ça, on peut le faire dans Excel ?

• A ce stade, vous pourriez penser que calculer le prix d’un
appartement selon sa surface habitable, tout le monde peut le
faire dans Excel (Il existe même la fonction Régression dans
Excel).
• La force du Machine Learning, c’est qu’il est très facile de

développer des modèles très complexes qui peuvent analyser
des milliers de features (𝑥) qu’un être humain ne serait pas
capable de prendre en compte pour faire son calcul (et Excel
non plus).
Machine Learning 37
Supervised Learning
• Par exemple, pour prédire le prix d’un appartement (𝑦), un modèle de ML peut
prendre en compte :
• sa surface (𝒙(𝟏))
• sa localisation (𝒙(2))
• sa qualité (𝒙(3))
• sa proximité avec une école (𝒙(4))
• etc.
• De même, pour prédire si un email est un spam (𝑦), le ML peut analyser :

• le nombre de liens (𝒙(𝟏))
• le nombre de fautes d’orthographe (𝒙(2))
• La présence de prix (𝒙(3))
• etc.
• Plus il y a de features disponibles, plus il existe d’informations pour que le

modèle prenne des décisions ‘intelligentes’, c’est l’intelligence artificielle.
Machine Learning 38
19
28/11/2021
Unsupervised Learning
• Unsupervised learning doesn’t involve learning a function from
inputs to outputs based on a set of input‐output pairs. Instead,
one is given a data set and generally expected to find some
patterns or structure inherent in it.

Machine Learning 39
• Unsupervised learning doesn’t involve learning a function from
inputs to outputs based on a set of input‐output pairs. Instead,
one is given a data set and generally expected to find some
patterns or structure inherent in it.

cible n’est pas fournie
Machine Learning 40
20
28/11/2021
Supervised vs. Unsupervised

Supervised Learning
Machine Learning 41
Supervised vs. Unsupervised

Machine Learning 42
21
28/11/2021
Semi-supervised Learning
• Since labeling data is usually time‐consuming and costly, you will often have
plenty of unlabeled instances, and few labeled instances. Some algorithms can
deal with data that’s partially labeled.
Semi‐supervised learning with two classes (triangles and squares): the unlabeled
examples (circles) help classify a new instance (the cross) into the triangle class rather
than the square class, even though it is closer to the labeled squares
Machine Learning 43
Semi-supervised Learning use-case

• Supervised learning on unlabeled data and use the predicted outputs as input
for retraining other supervised learning models and test it on other unlabeled
data.
Machine Learning 44
22
28/11/2021
Quick note
Machine Learning 45
Reinforcement Learning
• Reinforcement Learning is a very different beast. The learning system, called an
agent in this context, can observe the environment, select and perform actions,
and get rewards in return (or penalties in the form of negative rewards. It must
then learn by itself what is the best strategy, called a policy, to get the most
reward over time. A policy defines what action the agent should choose when
it is in a given situation.

Machine Learning 46
23
28/11/2021
Reinforcement Learning
• Reinforcement Learning is a very different beast. The learning system, called an
agent in this context, can observe the environment, select and perform actions,
and get rewards in return (or penalties in the form of negative rewards. It must
then learn by itself what is the best strategy, called a policy, to get the most
reward over time. A policy defines what action the agent should choose when
it is in a given situation.

47
Types of ML
Machine Learning 48
24
28/11/2021
Types de problèmes en ML
Yann LeCun: “Most of human and animal learning is
unsupervised learning. If intelligence was a cake, unsupervised
learning would be the cake, supervised learning would be the
icing on the cake, and reinforcement learning would be the
cherry on the cake.”
Machine Learning 49
Instance-Based vs. Model-Based Learning

• One more way to categorize Machine Learning systems is by how
they generalize.
• Most ML tasks are about making predictions. This means that

given a number of training examples, the system needs to be able
to make good predictions for (generalize to) examples it has never
seen before. Having a good performance measure on the training
data is good, but insufficient; the true goal is to perform well on
new instances.
• There are two main approaches to generalization: instance‐based

learning and model‐based learning.
Machine Learning 50
25
28/11/2021
Instance-based learning
• Possibly the most trivial form of learning is simply to learn by heart. If you were to create a spam
filter this way, it would just flag all emails that are identical to emails that have already been flagged
by users—not the worst solution, but certainly not the best.
• Instead of just flagging emails that are identical to known spam emails, your spam filter could be
programmed to also flag emails that are very similar to known spam emails. This requires a
measure of similarity between two emails. A (very basic) similarity measure between two emails
could be to count the number of words they have in common. The system would flag an email as
spam if it has many words in common with a known spam email. This is called instance‐based
learning: the system learns the examples by heart, then generalizes to new cases by using a
similarity measure to compare them to the learned examples (or a subset of them).
Machine Learning 51
Model-based learning
• Another way to generalize from a set of examples is to build a model of these examples
and then use that model to make predictions. This is called model‐based learning.
• “Fit” a model to the training data

Machine Learning 52
26
28/11/2021
• For example, suppose you want to know if money makes people happy, so you
download the Better Life Index data from the OECD’s website and stats about gross
domestic product (GDP) per capita from the IMF’s website. Then you join the tables and
sort by GDP per capita.
Does money make people happier?
Machine Learning 53
• There does seem to be a trend here! Although the data is noisy (i.e., partly random), it
looks like life satisfaction goes up more or less linearly as the country’s GDP per capita
increases. So you decide to model life satisfaction as a linear function of GDP per capita.
This step is called model selection: you selected a linear model of life satisfaction with
just one attribute, GDP per capita
Machine Learning 54
27
28/11/2021
• A simple linear model
Machine Learning 55
• A simple linear model
Machine Learning 56
28
28/11/2021
Minimisation de perte (coût, erreur)
Machine Learning 57
Minimisation de perte (coût, erreur)

Machine Learning 58
29
28/11/2021
• Model selection consists in choosing the type of model and fully specifying its
architecture. Training a model means running an algorithm to find the model
parameters that will make it best fit the training data (and hopefully make good
predictions on new data).
Machine Learning 59
• Model selection consists in choosing the type of model and fully specifying its
architecture. Training a model means running an algorithm to find the model
parameters that will make it best fit the training data (and hopefully make good
predictions on new data).
Machine Learning 60
30
28/11/2021
• You are finally ready to run the model to make predictions. For example, say you want
to know how happy Cypriots are, and the OECD data does not have the answer.
Fortunately, you can use your model to make a good prediction: you look up Cyprus’s
GDP per capita, find $22,587, and then apply your model and find that life satisfaction is
likely to be somewhere around 4.85 + 22,587 × 4.91 × 10‐5 = 5.96.
Machine Learning 61
Main Challenges of ML
• Your main task is to select a learning algorithm and train

it on some data, the two things that can go wrong are:
1. bad data
2. bad algorithm
"It's not who has the best algorithm

that wins, It's who has the most data"
MLOps: From Model-centric to Data-

centric AI
Machine Learning 62
31
28/11/2021
• bad data
• Insufficient Quantity of Training Data
• For a toddler to learn what an apple is, all it takes is for you to point to an apple
and say “apple” (possibly repeating this procedure a few times). Now the child is
able to recognize apples in all sorts of colors and shapes. Genius.
• Machine Learning is not quite there yet; it takes a lot of data for most Machine
Learning algorithms to work properly. Even for very simple problems you typically
need thousands of examples, and for complex problems such as image or speech
recognition you may need millions of examples (unless you can reuse parts of an
existing model).
14 million images
Machine Learning 63
• bad data
existing model).
Machine Learning 64
32
28/11/2021
• bad data
existing model).
The importance of data vs. algorithms
Machine Learning 65
• bad data
• Non representative Training Data
• In order to generalize well, it is crucial that your training data be representative
of the new cases you want to generalize to.
Machine Learning 66
33
28/11/2021
• bad data
• Poor‐Quality Data
• Obviously, if your training data is full of errors, outliers, and noise (e.g., due to
poor quality measurements), it will make it harder for the system to detect the
underlying patterns, so your system is less likely to perform well. It is often well
worth the effort to spend time cleaning up your training data. The truth is, most
data scientists spend a significant part of their time doing just that. The following
are a couple of examples of when you’d want to clean up training data:
• If some instances are clearly outliers, it may help to simply discard them or
try to fix the errors manually.
• If some instances are missing a few features (e.g., 5% of your customers did
not specify their age), you must decide whether you want to ignore this
attribute altogether, ignore these instances, fill in the missing values (e.g.,
with the median age), or train one model with the feature and one model
without it.
Machine Learning 67
• bad data
• Irrelevant Features (garbage in, garbage out)
• Your system will only be capable of learning if the training data contains enough
relevant features and not too many irrelevant ones. A critical part of the success
of a Machine Learning project is coming up with a good set of features to train
on. This process, called feature engineering, involves the following steps:
• Feature selection: selecting the most useful features to train on among existing
features
• Feature extraction: combining existing features to produce a more useful one (as
dimensionality reduction algorithms can help)
• Creating new features by gathering new data
Machine Learning 68
34
28/11/2021
• bad algorithm
• Overfitting the Training Data
• It means that the model performs well on the training data, but it does not
generalize well.
Machine Learning 69
• bad algorithm
generalize well.
Machine Learning 70
35
28/11/2021
• bad algorithm
generalize well.
Machine Learning 71
• bad algorithm
• Generalization, not memorization
• Our goal is to build a system that can deal

with new data.
Machine Learning 72
36
28/11/2021
Training Experience
data
Data
Test
data
Machine Learning 73
• bad algorithm
• Underfitting the Training Data
• It occurs when your model is too simple to learn the underlying structure of the
data.
Machine Learning 74
37
28/11/2021
• bad algorithm
• Underfitting the Training Data
• It occurs when your model is too simple to learn the underlying structure of the
data.
Machine Learning 75
Machine Learning 76
38
28/11/2021
n=1
y y
Machine Learning 77
y y
Machine Learning 78
39
28/11/2021
n=0 n=1
y y
Machine Learning 79
n=0 n=1
y y
n=9 n=3
y y
Machine Learning 80
40
28/11/2021
Généralisation vs. quantité de données
• Plus la quantité de données d’entraînement augmente, plus le modèle entraîné

va bien généraliser
y y
Machine Learning 81
Capacité d’un modèle, performance

• Capacité d’un modèle
• aptitude d’un modèle à apprendre «par cœur»
• Exemple : plus n est grand, plus le modèle a de capacité.
• Plus la capacité est grande, plus la différence entre l’erreur
d’entraînement et l’erreur de test augmente.
n
Machine Learning 82
41
28/11/2021
Regularization
y y
Machine Learning 83
Hyperparameter Tuning and Model Selection

Machine Learning 84
42
28/11/2021
Split your data
Machine Learning 85

• Solution 1: Split your data Train/test
80% 20%
Machine Learning 86
43
28/11/2021
Split your data
Machine Learning 87
Train/test error and underfitting/overfitting

Machine Learning 88
44
28/11/2021
Split your data
80% 20%
Test set is only used once!
Machine Learning 89
Split your data
60% 20% 20%

Machine Learning 90
45
28/11/2021
Split your data
60% 20% 20%

Machine Learning 91
Split your data
60% 20% 20%

if the validation set is too small, then model

evaluations will be imprecise: you may end up
selecting a suboptimal model by mistake
Machine Learning 92
46
28/11/2021
Cross Validation (k-folds)

• k‐folds cross‐validation, example k =5
Machine Learning 93
How to detect underfitting and overfitting?

Machine Learning 94
47
28/11/2021
Machine Learning 95
Performance Measures
• Classification
• we want to assess a given classifier that learns how to predict y from x
• Classification accuracy is the number of correct predictions made divided by the

total number of predictions made, multiplied by 100 to turn it into a percentage.
• Confusion matrix – The confusion matrix is used to have a more complete

picture when assessing the performance of a model.
Machine Learning 96
48
28/11/2021
• Classification
• we want to assess a given classifier that learns how to predict y from x
• Classification accuracy is the number of correct predictions made divided by the

total number of predictions made, multiplied by 100 to turn it into a percentage.
• Confusion matrix – The confusion matrix is used to have a more complete

picture when assessing the performance of a model.
Machine Learning 97
• Classification
• The following metrics are commonly used to assess the performance of
classification models:
Machine Learning 98
49
28/11/2021
• Classification
Machine Learning 99
• Classification
Machine Learning 100
50
28/11/2021
• Classification
• Classification
• ROC – The receiver operating curve, also noted ROC, is the plot of TPR versus
FPR by varying the threshold.
51
28/11/2021
• Classification
• AUC – The area under the receiving operating curve, also noted AUC or AUROC,
is the area below the ROC.
• Classification
52
28/11/2021
• Classification
Label +  + +  
Score 0.9 0.8 0.6 0.4 0.3 0.1
Treshold >0.9 0.8‐0.9 0.6‐0.8 0.4‐0.6 0.3‐0.4 0.1‐0.3 <0.1

TP/P 0 1/3 1/3 2/3 1 1 1
FP/P 0 0 1/3 1/3 1/3 2/3 1
• Classification
Label + ‐ + + ‐ ‐
Score 0.9 0.8 0.6 0.4 0.3 0.1
Treshold >0.9 0.8‐0.9 0.6‐0.8 0.4‐0.6 0.3‐0.4 0.1‐0.3 <0.1

TP/P 0 1/3 1/3 2/3 1 1 1
FP/P 0 0 1/3 1/3 1/3 2/3 1
53
28/11/2021
• Regression
• Basic metrics – Given a regression model f, the following metrics are commonly
used to assess the performance of the model:
54

Chapter1 ML 21-22

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Chapter1 ML 21-22

Transféré par

Droits d'auteur :

Formats disponibles

28/11/2021

• Appréhender les bases des méthodes de classification

• Être capables de distinguer les méthodes d'apprentissage, ainsi

• Mettre en pratique des méthodes de base en apprentissage.

• Être capables de modéliser un problème de classification réel

et complexe par un modèle d'apprentissage statistique approprié.

• Traitement et analyse de données massives

Apprentissage non supervisé

Apprentissage non supervisé

• Caractérisation des techniques d’apprentissage machine

• Identifier si un problème relève ou non du ML.

• Donner des exemples de cas concrets relevant de grandes classes

Pourquoi utilisé le ML?

2. On ne connait pas le calcul qui résout notre problème:

Pourquoi utilisé le ML?

• Le ML peut servir à résoudre des problèmes

• More engineering‐oriented definition:

• How can we learn to perform image classification?

• How can we learn to perform image classification?

• How can we learn to perform image classification?

Example 1: Handwritten digit classification

• Par énumération de règles ?

• Trop fastidieux, difficile de couvrir tous les cas

Example 1: Handwritten digit classification

• En donnant à l’ordinateur la capacité d’apprendre à le faire!

Apprentissage Automatique / Machine Learning

Example 1: Handwritten digit classification

Example 1: Handwritten digit classification

Example 2: Spam emails

ML: a new programming paradigm

the system to come up with rules for automating the task.

Ingrédients du machine learning

• Le machine learning repose sur deux piliers fondamentaux :

• Le travail de machine learner ou de data scientist est un travail d’ingénierie consistant

manquantes, choisir une représentation pertinente, etc.

 on fournit à l’algorithme des données d’entraînement ...

 on note l’ensemble d’entraînement (training set)

 L’apprentissage supervisé (Supervised Learning)

 L’apprentissage supervisé (Supervised Learning)

 L’apprentissage supervisé (Supervised Learning)

 classification : la cible est un indice de classe

 régression : la cible est un nombre réel

• Mais tout ça, on peut le faire dans Excel ?

• La force du Machine Learning, c’est qu’il est très facile de

• De même, pour prédire si un email est un spam (𝑦), le ML peut analyser :

• Plus il y a de features disponibles, plus il existe d’informations pour que le

 L’apprentissage supervisé (Supervised Learning)

 L’apprentissage supervisé (Supervised Learning)

Supervised vs. Unsupervised

Supervised vs. Unsupervised

Semi-supervised Learning use-case

 L’apprentissage supervisé (Supervised Learning)

 L’apprentissage supervisé (Supervised Learning)

Instance-Based vs. Model-Based Learning

• Most ML tasks are about making predictions. This means that

• There are two main approaches to generalization: instance‐based

• “Fit” a model to the training data

Minimisation de perte (coût, erreur)

• Your main task is to select a learning algorithm and train

"It's not who has the best algorithm

MLOps: From Model-centric to Data-

• Generalization, not memorization

• Our goal is to build a system that can deal

Généralisation vs. quantité de données