Methodes Quantitatives en Gestion Unh2024

Université Nouveaux Horizons
Faculté des Sciences Economiques et de Gestion

Année académique 2024-2025
Classe de troisième Année de Baccalauréat
Cours de Méthodes quantitatives en Gestion

APPRENDRE A MESURER LES CONCEPTS LATENTS
Prof. Eddy BALEMBA Kanyurhi

Laboratoire d’Economie Appliquée au Développement
Faculté des Sciences Economiques et de Gestion
Université Catholique de Bukavu
Etudes, publications et affiliations
Etudes postdoctorales, Centre Européen de Recherche en Microfinance (CERMI), Université

de Mons (2017-2018)
Doctorat en Sciences Economiques et de Gestion (Warocqué School of Business and
Economics, Université de Mons (2015)
Certificat de recherche doctorale (Université Libre de Bruxelles (2012)
Master complémentaire en Microfinance (Solvay Brussels School of Economics and
Management , Université Libre de Bruxelles) (2009)
Licencié en Sciences Economiques et Gestion, Université Catholique de Bukavu (2001)
Balemba EBK, Bugandwa, DAM, Murhula, P and Bitakuya, W (2023), Employee Job
Satisfaction in Microfinance Institutions: Scale Developement and Validation, Revue Finance,
Contrôle et Stratégie, 26-2, pp 1-59
Balemba EBK, Lusheke B, Bugandwa MA, Murhula P, Buchekuderhwa C et Kadundu P (2023),
When unethical practices harm relationship outcomes: testing the influence of Consumer
Perceived Unethical Behaviour on Trust and Satisfaction in the Banking sector, International
Journal of Bank Marketing, Forthcoming
Balemba, E.B.K., Murhula, P., Mushigo, B., Mbantshi, H., Bugandwa D (2023), Linking
consumers’ perceived barriers towards mobile money, attitudes and continuance usage intention.
Journal of Financial Services Marketing, Forthcoming
Bugandwa, D.B.M., Kanyurhi, E.B., Juwa, G.B. and Hongo, A.M. (2022), “Savings
Groups in the Democratic Republic of Congo”, in Redford, D.T. and Verhoef, G.
(Ed.), Transforming Africa, Emerald Publishing Limited, Bingley, pp. 97-115.
https://doi.org/10.1108/978-1-80262-053-520221009
Bugandwa T, Balemba EBK, Bugandwa DM, and Haguma B (2021), Linking
Corporate Social Responsability and Trust in Banking sector : explororing
disagregatted relations, International Journal of Bank Marketing, Vol. 39 No. 4, pp.
592-617
Chubaka J, Balemba EBK, Bugandwa DM and Chubaka P (2021), Measuring Price
Fairness and Its impact of customers’ trust and Switching Intentions in Microfinance
Institutions, Journal of Financial Services Marketing, 27(2) : 111-135
 Balemba, E.B.K., Bucekuderhwa, C., Kadundu, P., Haguma, B., Chubaka, N.,
Kadurha, L. et Mirindi, J (2021), Religiosité, Philanthropie et Performance des
entrepreneurs en République Démocratique du Congo, in « Gundolf, K. et Janssen,
F., Entrepreneuriat, Spiritualité et Religion : des sphères antinomiques ou
étroitement liées? Deboeck Supérieur, Belgique, Collection Méthodes &
Recherches Management, Première Édition Octobre 2021, 304 pages,
9782807330290
 Chubaka P, Balemba EK, Bugandwa D et Labie M (2019), Appropriation des
coopératives d’épargne et de credit par leurs membres à Bukavu: mesure et
déterminants, Mondes en Développement, 47 (188) : 127-148
 Haguma B, Balemba EK et Bitakuya W (2019), Relation entre la microfinance et la
performance perçue des PME à Bukavu: rôles moderateur et médiateur de
l’opportunité entrepreneuriale et la prise de risque, Revue Finance, Contrôle et
Stratégie, 22 (4), 1-41
 Balemba EB, Bucekuderhwa, C, et Kadura, L (2018), les déterminants du crowdfunding

hors ligne: une étude empirique sur les entrepreneurs et les investisseurs potentiels à
Bukavu, Innovations, Revue de l’Economie et du Management de l’innovation, 2/2018
(56) : 187-215
 Balemba EK (2017), Customer Satisfaction with Services of Microfinance Institutions:
Scale Development and Validation, Strategic Change: Brifiengs in Entrepreneurial
Finance, 26 (6): 563-574
 Balemba and Bugandwa (2016), Internal Marketing, Employee Job Satisaction and
Perceived organizational performance, International Journal of Bank Marketing, 34 (5):
773-796
Méthodes d’enseignement et d’évaluation
Methodes d’enseignement
Exposés magistraux
Illustrations avec logiciels SPSS et LISREL
Travaux en groupes
Recherche sur terrain et présentation des résultats
Méthodes d’évaluation
Interrogations et examens écrits
Recherche sur le terrain
Exposés
Compétences
 Finalités et compétences
1. Appliquer les méthodes et techniques analytiques
2. Intégrer les savoirs de différents domaines pour formuler des réponses systémiques
Livres et articles de base
 Malhotra N, Decaudin JM , Bauguerra A et Bories D (2011), Etudes Marketing ,

Sixième Edition, Pearson Education
 Giannelloni JL et Vernette E (2012), Etudes de Marché, Vuilbert Gestion, troisième
Edition
 Devellis RF (2012), Scale Development: theory and Applications, Sage
 Byrne B. (2009). “Structural Equation Modeling With LISREL, PRELIS, and SIMPLIS:
Basic Concepts, Applications, and Programming”, New York, Psychology Press Taylor and
Francis
 Brown, T. (2006). Confirmatory factor analysis for applied research. New York, London:
The Guilford Press
 Churchill, G., (1979), A Paradigm for Developing Better Measures of Marketing Constructs,
Journal of Marketing Research, Vol.16, n°1, p. 64-73.
Balemba, K. (2017). Customer satisfaction with the services of microfinance institutions: Scale
development and validation. Strategic Change, 26, 563–574.
Bagozzi, R.P., Yi, Y., & Phillips, L.W. (2012). Specification, evaluation, and interpretation of
structural equation models, Journal of the Academic Marketing. Science, 40, 8–34.
Walsh, G. & Beatty, S. (2007). Customer-based corporate reputation of a service firm: scale
development and validation, Journal of the Academy of Marketing Science, 3, 127-143
Bugandwa T, Balemba EBK, Bugandwa DM, and Haguma B (2021), Linking Corporate
Social Responsability and Trust in Banking sector : exploring disagregatted relations,
International Journal of Bank Marketing, Vol. 39 No. 4, pp. 592-617
Chubaka J, Balemba EBK, Bugandwa DM and Chubaka P (2021), Measuring Price Fairness
and Its impact of customers’ trust and Switching Intentions in Microfinance Institutions,
Journal of Financial Services Marketing, 27(2) : 111-135
Balemba EBK, Bugandwa, DAM, Murhula, P and Bitakuya, W (2022), Employee Job
Satisfaction in Microfinance Institutions: Scale Developement and Validation, Revue
Finance, Contrôle et Stratégie, 26-2, pp 1-59
Plan
Introduction à la mesure en gestion

Analyse factorielle exploratoire
Analyse factorielle confirmatoire
Fiabilité et validité d’une échelle de mesure
Introduction aux équations structurelles
Exemples illustratifs sous SPSS et LISREL
Introduction à la mesure en gestion
On utilise la mesure tout au long des journées.

Mesures physiques telles que le thermomètre, le pédomètre, les lattes, …
Phénomènes à priori invisibles et non mesurables.
Fait étonnant
Chercheurs en sciences de gestion revendiquent le recours à la mesure quantitative pour
étudier les phénomènes invisibles
Affection, l’attitude, l’estime de soi, l’idéologie, le pouvoir ou l’influence politique, …
Recherche quantitative
Commencer la mesure après que le chercheur ait formulé une question de recherche et
déterminé les variables et les unités d’analyse qu’il compte utiliser dans son projet de
recherche.
Processus de développement d’outils de mesure
On ne préoccupe pas de savoir si une variable est explicative (indépendante) ou expliquée
(dépendante)
On développe des définitions claires permettant de créer des outils fiables permettant
l’obtention des résultats pertinents.
Mesure quantitative est un processus déductive.

Implique l’utilisation des concepts (construits), d’une idée pour le développement d’une
mesure ou d’une procédure pour l’observation empirique du phénomène étudié.
Processus commence par des concepts et s’achève par des indicateurs spécifiques concrets.
Les mesures ainsi développées vont être utilisées pour produire les données sous forme de
nombre.
Processus interactif dans la mesure où les concepts deviennent plus clairs lorsque le chercheur
arrive à développer les outils pour leur mesure.
Raisons de la mesure : extension de nos sens
Astronome ou le biologiste utilisent respectivement le télescope ou le microscope pour

étendre la vision naturelle de l’humain.
Mesures scientifiques plus sensibles
Varient moins en fonction de l’observateur et donnent une information quantitative
plus exacte.
Mesures des phénomènes sociales également sensées donner une information précise sur
la réalité sociale.
Mesures scientifiques permettent de donner plus de précision et d’objectivité
Observer des phénomènes qui autrement seraient invisibles et qui pourtant sont
théorisés.
Phénomènes invisibles ne sont pas que propres aux sciences sociales.

Impossible de voir le « magnétisme » en sciences naturelles par exemple, .
Le phénomène heureusement théorisé et nous pouvons observer ses effets grâce à
l’aimant.
On verra par exemple comment les métaux bougent au passage de l’aimant.
Cet instrument permet de « voir » ou de mesurer le champ magnétique enseigné
généralement en physique.
Les chercheurs en sciences naturelles ont développé bien d’autres outils pour mesurer
l’infiniment petit (molécules, organismes tels les microbes…) ou l’infiniment grand
(grosses masses géologiques, planètes, …) non-observables à l’œil nu.
Sciences sociales
Certains des phénomènes que les chercheurs essaient de mesurer sont « visibles » (âge,
sexe, race, revenu, …) ;
La plupart difficilement observables (attitudes, satisfaction, professionnalisme, …)
Chercheurs des sciences naturelles inventent des mesures indirectes pour approcher les
objets invisibles…
Chercheurs en sciences sociales créent des mesures pour les aspects du monde social qui
sont difficilement observables.
Une entreprise souhaite évaluer la satisfaction des travailleurs par rapport à leur travail
(job satisfaction)
Important de créer un outil qui permet de mesurer cette satisfaction.
Mesure de la satisfaction devra être systématique et produire des données
quantitatives précises que d’autres peuvent répliquer.
 Voir les différentes échelles de mesure de la satisfaction
Job Satisfaction Survey (Spector, 1985)
Job Satisfaction of Industrial Salesmen (Churchill et al., 1974)
Job Satisfaction in Microfinance Institutions (Balemba et al., 2023)
Processus de la mesure
Mesure part toujours d’un concept.

Chercheur doit bien distinguer ce qui l’intéresse de toute autre chose et ceci relève tout
simplement du bon sens.
Processus de mesure ne se limite pas à posséder un outil de mesure (ex : microscope).
Les chercheurs auront généralement besoin de 3 éléments
◦ Concept (ou construit)
◦ Instrument de mesure
◦ Aptitude de reconnaître l’objet de son étude.
Je souhaite mesurer la satisfaction au travail :

1. Que signifie « satisfaction au travail ? »
2. Ce concept comme construit prend différentes valeurs : satisfaction élevée ou faible, très
élevée, très faible…
Créer une mesure pour ce construit.
 Prendre la forme d’une série de questions, un examen des durées de travail, une description
des conditions de travail, …
 Distinguer ce concept (satisfaction au travail) des autres concepts (satisfaction des clients,
motivation des travailleurs, …)
Travail du chercheur en sciences sociales est donc plus difficile que celui du chercheur des
sciences naturelles
1. Mesures utilisées impliquent de parler avec les gens et observer leurs comportements.
2. Réponses de ces personnes peuvent être ambiguës et influencées par le fait même que les
personnes savent qu’ils font l’objet d’une étude.
3. Interaction avec les sujets d’étude peut être source de beaucoup de biais.
Mesure et design de la recherche
Les chercheurs ont besoin des mesures pour collecter leurs données et
éventuellement tester les hypothèses.
1. Choisissent un sujet général
2. Raffinent en un problème (question) précis de recherche.
3. Mesure peut commencer
Mesure et design de la recherche : conceptualisation
Processus de mesure commence par la conceptualisation

Conceptualisation définie comme étant un processus consistant à raffiner un concept en lui
donnant une définition et/ou un contenu théorique.
Une définition conceptuelle est une définition en termes abstraits et théoriques.
Elle s’appuie sur d’autres concepts.
C’est quoi la « Satisfaction ». Pas de magie pour définir ce concept
◦ Réfléchir minutieusement
◦ Observer directement, s’entretenir avec d’autres chercheurs et des praticiens,
◦ Lire ce que d’autres ont dit et alors proposer des essais de définition.
Une bonne définition doit avoir un sens précis et clair, éviter toute ambiguïté.
Définition claire des concepts permet le développement d’une des meilleures
explications théoriques.
Un même concept peut avoir plusieurs définitions
Les chercheurs peuvent ne pas s’accorder sur les définitions d’un même concept.
Certains articles scientifiques se sont proposé de conceptualiser des concepts
clés.
Parasuraman et al (1985), Huston (2010), Naver et Slatter 1990 etc.
Exemple : orientation entrepreneuriale, orientation marché, orientation des parties prenantes,

etc.
Définitions conceptuelles sont liées à des cadres théoriques et à des points de vue
idéologiques (ou jugement de valeurs).
Théorie des conflits,
Définit la « classe sociale » comme étant le pouvoir et la propriété qu’un groupe d’individus
possède ou ne possède ou manque dans une société.
Théorie structuro-fonctionnaliste,
Définit la « classe sociale » en termes d’individus partageant un même statut social, même
style (mode) de vie ou qui s’identifient à un même groupe.
Quand les chercheurs ne s’accordent pas sur les définitions des concepts,
Indispensable de toujours se positionner de manière explicite par rapport à la définition qui
est utilisée dans la recherche.
Intégrer les définitions de certains auteurs pour proposer une définition originale qui va
guider la recherche.
Certains concepts sont plus abstraits et complexes ( morale, confiance) que d’autres (ex :
revenu, âge)
Chercheur doit être conscient du niveau de cette complexité pour évaluer l’approche à
adopter.
Démarche concrète de la conceptualisation
Morale d’un enseignant (Neuman Lawrence, p. 134 ; 2ème colonne ).
Mesure et design de la recherche : opérationnalisation
Une fois que le chercheur a « sa » définition (pour sa recherche dans la mesure

où elle peut être modifiée)
Passer à l’opérationnalisation de ce cette définition.
Définition opérationnelle est une définition en termes d’opérations spécifiques,
d’instruments de mesure ou de procédures.
On l’appelle également « indicateur » ou mesure d’un construit (le construit
est un concept opérationnalisé).
Plusieurs façons de mesurer un concept

Les unes meilleures que les autres, ou les unes plus pratiques que d’autres.
L’essentiel pour le chercheur
1.Ajuster sa mesure à sa définition conceptuelle,
2.Tenir compte des contraintes pratiques (temps, argent, sujets disponibles, …) et aux
techniques de recherche connues ou susceptibles d’être apprises.
Possible de développer une nouvelle mesure à partir du départ ou d’utiliser des mesures
existantes et déjà utilisées par d’autres.
Processus de mesure pour deux variables qui sont liées au sein d’une théorie.
Trois niveaux doivent être considérés
1. Niveau plus abstrait
Chercheur s’intéresse à la relation causale entre deux construits, c’est-à-dire une hypothèse
conceptuelle.
2. Niveau des définitions opérationnelles
Chercheur s’intéresse au test d’une hypothèse empirique en vue de déterminer le degré
d’association entre les indicateurs.
Etudes des corrélations, des tendances,…
◦ Le troisième niveau est celui du monde empirique.
1.Indicateurs opérationnels d’une variable (questionnaire ou items) sont logiquement reliés à
un construit
2.Approcher correctement la réalité empirique du monde sociale en rapport avec le niveau
conceptuel.
Le processus de mesure relie les trois niveaux

Fait passer dans une logique déductive de l’abstrait au concret.
Trois étapes clés
◦ Conceptualiser une variable pour lui donner une définition conceptuelle claire.
◦ Opérationnaliser la définition conceptuelle par une définition opérationnelle en
proposant des indicateurs pour le concept (construit)
◦ Utiliser ces indicateurs (items) par leur application empirique.
Lectures obligatoires
1. Churchill, G. (1979), “Paradigm for Developing Better Measures of Marketing

Constructs, Journal of Marketing Research”, Vol.16 No.1, pp. 64-73
2. Parasuraman, A., Zeithaml, V.A. Berry, L.L. (1988), “SERVQUAL: A multiple-Item
scale for measuring consumer perceptions of service quality”, Journal of retailing, Vol.
64 No. 1, pp. 12-40
3. Churchill, J.R., Ford, N.M., and Walker, O.C. (1974), “Measuring the Job Satisfaction
of Industrial Salesmen”, Journal of Marketing Research, Vol. XI, pp. 254-60
4. Camus, S. (2004), Proposition d'échelle de mesure de l'authenticité perçue d'un produit
alimentaire, Recherche et Applications en Marketing, vol. 19, n° 4/2004
5. Chubaka J, Balemba EBK, Bugandwa DM and Chubaka P (2021), Measuring Price
Fairness and Its impact of customers’ trust and Switching Intentions in Microfinance
Institutions, Journal of Financial Services Marketing, https://doi.org/10.1057/s41264-
021-00102-3
Chap.2
Analyse factorielle exploratoire
Introduction to factor analysis
Factor analysis is by far the most often used multivariate technique of research studies,
specially pertaining to social and behavioural sciences.
It is a technique applicable when there is a systematic interdependence among a set of
observed or manifest variables
The researcher is interested finding out something more fundamental or latent variables
which creates this commonality.
We might have data, say, about an individual’s income, education, occupation and dwelling area
We want to infer from these some factor (such as social class) which summarizes the
commonality of all the said four variables.
The technique used for such purpose is generally described as factor analysis.
Factor analysis, thus, seeks to resolve a large set of measured variables in terms of relatively
few categories known as factors.
This technique allows the researcher to group variables into factor (based on correlation
between variables)
The factors so derived may be treated as new variable (often termed as latent variables)
Their value derived by summing the values of the original variables which have been
grouped into the factor.
The meaning and name of such new variable is subjectively determined by the
researcher.
Since the factors happen to be linear combinations of data, the coordinates of
each observation or variable is measured to obtain what are called factor
loadings.
Such factor loadings represent the correlation between the particular variable
and the factor, and are usually place in a matrix of correlations between the
variable and the factors.
Mathematical basis
The mathematical basis of factor analysis concerns a data matrix (also termed
as score symbolized as S.
The matrix contains the scores of N persons of k measures.
Thus a1 is the score person 1 on measure a, a2 is the score of person 2 on
measure a, and kN is the score of person N on measure k.
The score matrix then take the form as shown following: SCORE MATRIX (or
Matrix S)
Mathematical basis
a b c k
1 a1 b1 c1 k1
2 a2 b2 c2 k2
Persons 3 a3 b3 c3 k3
(objects)
. . . . .
. . . . .
. . . . .
N aN bN cN kN
Variables
Mathematical basis
Mathematical basis
For realistic results

We resort to the technique of rotation, because such rotations reveal different
structures in the data.
Factor scores are obtained which help in explaining what the factors mean.
They also facilitate comparison among groups of items as groups.
With factor scores
One can also perform several other multivariate analyses such as multiple
regression, cluster analysis, multiple discriminant analysis, etc.
Mathematical basis: data adequency
Est-ce qu’il est possible d’obtenir un bon résumé ?

On peut considérer l’analyse factorielle comme une compression de
l’information.
Elle n’est possible que si les données présentent une certaine redondance.
Si les variables sont parfaitement corrélées, un seul axe factoriel suffit, il
restituera 100% de l’information disponible.
Si elles sont deux à deux indépendantes, le nombre adéquat de facteurs à
retenir est égal au nombre de variables.
La matrice de corrélation – impliquée dans le calcul de la solution – est la
matrice unité (ou matrice identité).
Mathematical basis : le test de sphéricité de Bartlett
Propose une mesure globale en s’appuyant sur une démarche statistique.

Il vise à détecter dans quelle mesure la matrice de corrélation R= (r ij)(p x p) calculée
sur nos données (matrice observée) diverge significativement de la matrice unité
(matrice théorique sous hypothèse nulle H0).
Essayer de procéder à un résumé est illusoire lorsque l’hypothèse nulle n’est pas
démentie par les données.
Nous calculons le déterminant |R| de la matrice de corrélation.

H0, |R| = 1
H1, |R| = 0. S’il y a des colinéarités parfaites
|R| est inférieur à 0.00001, on considère qu’il y a de très fortes redondances
dans les données, c.-à-d. elles ne recèlent qu’un seul type d’information.
|R| se rapproche de 1, l’analyse ne servira pas à grand-chose car les variables
sont quasiment orthogonales deux à deux.
Test de Bartlett vise justement à vérifier si l’on s’écarte significativement de cette situation de
référence |R| = 1.
Test basé sur le déterminant d’une estimation de la matrice de corrélation
Hypothèses
Ho: variables sont globalement indépendantes.
H1: variables sont globalement dépendantes.
Statistique de test s’écrit :
ddl = p*(p-1)/2
 La significativité du test jugée par la valeur du Khi-deux significatif.
Mathematical basis: test de Kaiser-Meyer-Oklin (KMO)
L’indice KMO participe de la même idée : est-ce qu’il est possible de trouver
une factorisation intéressante des données ?
Le point de départ est toujours la matrice de corrélation.
On sait que les variables sont plus ou moins liées dans la base.
La corrélation brute entre deux variables est influencée par les (p-2) autres.
Nous utilisons la corrélation partielle pour mesurer la relation (nette) entre
deux variables en retranchant l’influence des autres.
L’indice cherche alors à confronter la corrélation brute avec la corrélation
partielle.
Si la seconde est nettement plus faible (en valeur absolue), cela veut dire que la
liaison est effectivement déterminée par les autres variables.
Cela accrédite l’idée de redondance et donc la possibilité de mettre en place
une réduction efficace de l’information.
Si la seconde est équivalente, voire plus élevée en valeur absolue,
Il a une relation directe entre les deux variables. Elle sera difficilement prise
en compte par l’analyse factorielle.
L’indice KMO varie entre 0 et 1.

Proche de 0, les corrélations partielles sont identiques aux corrélations brutes.
Une compression efficace n’est pas possible. Les variables sont deux à deux
orthogonales.
Proche de 1, nous aurons un excellent résumé de l’information sur les premiers axes
factoriels.
Grilles de lecture :
« Mauvais » en dessous de 0.5, « bon » entre 0.8 et 0.9; ou encore, « inacceptable » en
dessous de 0.5, « médiocre » entre 0.5 et 0.6, « moyen » entre 0.6 et 0.7, « bien » entre
0.7 et 0.8, « très bien » entre 0.8 et 0.9
Frequent used terms in factor analysis
Factor
A factor is an underlying dimension that account for several observed variables.
There can be one or more factors depending upon the nature of the study and the number of
variables involved in it.
Factor-loadings
Factor-loadings are those values which explain how closely the variables are related to each
one of the factors discovered.
They are also known as factor-variable correlations,
Factor-loadings work as key to understanding what the factors mean
It is the absolute size (rather than the signs, plus or minus) of the loadings that is important in
the interpretation of a factor.
Communality (h²)
Communality, symbolized as h², shows how much of each variable accounted for the
underlying factor then together.
A high value of communality means that not much of the variable is left over after whatever the
factors represent is taken into consideration.
It is worked out in respect of each variable as under:
h² of the ith variable = (ith factor loading of factors A)² + (ith factor loading of factor B)² + … (ith
factor loading of factor N)²
Eigen value (or latent root)
When we take the sum of squared values of factor loadings relating to a

factor, then such sum is referred to as Eigen Value or latent root.
Eigen value indicates the relative importance of each factor in accounting for
the particular set off variables being analyzed.
Total sum of squares

When Eigen values of all factors are totaled, the resulting value is termed as
the total sum of squares.
This value, when divided by the number of variables (involved in a study),
results in an index that shows how the particular solution accounts what all the
variables taken together represent.
 If the variables are all very different from each other, this index will be low.
If they fall into one or more highly redundant groups, and if the extracted
factors account for all the groups, the index will then approach unity.
Rotation
 Rotation, in the context of factor analysis, is something like staining a
microscope slide.
 Just as different stains on it reveal different structures in the tissue, different
rotations reveal different structures in the data.
 Different rotations give results that appear to be entirely different,
 From a statistical point of view, all results are taken as equal, none superior
or inferior to others.
From the standpoint of making sense of the results of factor analysis, one must
select the right rotation.
If the factors are independent orthogonal rotation is done and if the factors are
correlated, an oblique rotation is made.
Communality for each variables will remain undisturbed regardless of rotation
but the eigen values will change as result of rotation.
Factor scores
Factor score represents tire degree to which each respondent gets high scores
on the group of items that load high on each factor.
 Factor scores can help explain what the factors mean.
With such scores, several other multivariate analyses can performed.
Important Methods used in exploratory factor analysis
There are several methods of factor analysis, but they do not necessarily give
same results.
As such factor analysis is not a single unique method but a set of techniques.
Three main methods of factor analysis are:
1.The centroid method;
2.The principal components method;
3.The maximum likelihood method.
Important Methods used in exploratory factor analysis:
Centroid Method
This method of factor analysis, developed by L.L. Thurstone,

it was quite frequently used until about 1950 before the advent of large
capacity high speed computers.
The centroid method tends to maximize the sum of loadings, disregarding signs
 It is the method which extracts the largest sum of absolute loadings for each
factor in turn.
Centroid Method
Centroid Method
It is defined by linear combinations in which ail weights are either +1.0 or -
1.0.
The main merit of this method is that it is relatively simple, can be easily
understood and involves simpler computations.
If one understands this method, it becomes easy to understand the mechanics
involved in other methods of factor analysis.
Centroid Method
Various steps involved in this method are as follows:

The product moment formula is used for working out the correlation
coefficients.
This method starts with the computation of a matrix of correlations, R,
wherein unities are place in the diagonal spaces.
If the correlation matrix so obtained happens to be positive manifold, the
centroid method requires that the weights for all variables be +1.0.
The variables are not weighted; they are simply summed.
But in case the correlation matrix is not a positive manifold, then reflections
must be made before the first centroid factor is obtained.
Centroid Method
The first centroid factor is determined as under:

The sum of the coefficients (including the diagonal unity) in each column of
the correlation matrix is worked out.
Then the sum of these column sums (i) is obtained.
The sum of each column obtained as per (a) above is divided by the square
root of T obtained in (b) above, resulting in what are called centroid loadings.
This way each centroid loading (one loading for one variable) is computed.
The full set of loadings so obtained constitute the first centroid factor (say A).
Centroid Method
To obtain second centroid factor (say B), one must first obtain a matrix of
residual coefficients.
The loadings for the two variables on the first centroid factor are multiplied.
This is done for all possible pairs of variables (in each diagonal space is the
square of the particular factor loading).
The resulting matrix of factor cross products may be named as Q1.
Then Q1 is subtracted clement by element from the original matrix of
correlation, R, and the result i the first matrix of residual coefficients, R1.
Centroid Method
Centroid Method
The numerator in the above formula is what is found in R1 corresponding to the

entry for variables 1 and 2.
In the denominator, the square of the term on the left is exactly what is found in
the diagonal element for variable 1 in R1.
Likewise the partial variance for 2 is found in the diagonal space for that
variable in the residual matrix.)
Centroid Method
Since in R1 the diagonal terms are partial variances and the off-diagonal terms
are partial co-variances,
it is easy to convert the entire table to a matrix of partial correlations.
For this purpose one has to divide the elements in each row by the square-root
of the diagonal element for that row and then dividing the elements in each
column by the square-root of the diagonal element for that column.
Centroid Method
After obtaining R1, one must reflect some of the variables in it, meaning thereby
that some of the variables are given negative signs in the sum [This is usually
done by inspection)
The aim in doing this should be to obtain a reflected matrix, R’1, which will
have the highest possible sum of coefficient (T)].
Centroid Method
For any variable which is so reflected

The sign of all coefficients in that column and row residual matrix are
changed,
When this is done, the matrix is named as ‘reflected matrix’ form which the
loadings are obtained in the usual way
The full set of loadings so obtained constitutes the second centroid factor (say
B).
Thus loadings on the second centroid factor are obtained from R’1.
Centroid Method
For subsequent factors (C, D, etc.) the same process outlined above is repeated.
After the second centroid factor is obtained, cross products are computed
forming, matrix, Q2, is then subtracted from R1 (and not from R’1) resulting in
R2.
To obtain a third factor (C) one should operate on R2 in the same way as on
R1.
Centroid Method
Variables
1 2 3 4 5 6 7 8
1 1000 709 204 081 626 113 155 774

Variables
2 709 1000 051 089 581 098 083 652
3 204 051 1000 671 123 689 582 072
4 081 089 671 1000 022 798 613 111
5 626 581 123 022 1000 047 201 724
6 113 098 689 798 047 1000 801 120
7 155 083 582 613 201 801 1000 152
8 774 652 072 111 724 120 152 1000

Centroid Method
oUsing the centroid method of factor analysis, work out the first and second
centroid factors from the above information.
oGiven correlation matrix, R, is a positive manifold and as such the weights for
all variables be +1.0.
owe calculate the first centroid factor (A) as under:
mportant Methods used in exploratory factor analysis: Centroid
Method
Centroid Method
Variables Factor loadings concerning first Centroid factor A
1 0.683
2 0.618
3 0.642
4 0.641
5 0.629
6 0.624
7 0.679
Centroid Method
To obtain the second centroid factor B,

We first of ah develop (as shown on the next page) the first matrix of factor
cross product,
Q1: First Matrix of Factor Cross Product (Q,)
Important Methods used in exploratory factor analysis: Centroid
Method
First Matrix of Factor Cross Product (Q1)
First
centroid 0.693 0.618 0.642 0.641 0.629 0.694 0.679 0.683
factor A
0.693 0.480 0.428 0.445 0.444 0.436 0.481 0.471 0.473
0.618 0.428 0.382 0.397 0.396 0.389 0.429 0.420 0.422
0.642 0.445 0.387 0.412 0.412 0.404 0.446 0.436 0.438
0.641 0.444 0.396 0.412 0.411 0.403 0.445 0.435 0.438
0.629 0.436 0.389 0.404 0.403 0.396 0.437 0.427 0.430
0.694 0.481 0.429 0.446 0.445 0.437 0.482 0.471 0.474
0.679 0.471 0.420 0.436 0.435 0.427 0.471 0.461 0.464
0.683 0.473 0.422 0.438 0.438 0.430 0.474 0.464 0.466
Centroid Method
First Matrix of Residual Coefficient (R1)
Variables
1 2 3 4 5 6 7 8
1 0.520 0.281 -0.241 -0.363 0.190 -0.368 -0.316 0.301
Variables 2 0.281 0.618 -0.346 -0.307 0.192 -0.331 -0.337 0.230
3 -0.241 -0.346 0.588 0.259 -0.281 0.243 0.146 -0.366
4 -0.363 -0.307 0.259 0.589 -0.381 0.353 0.178 -0.327
5 0.190 0.192 -0.281 -0.381 0.604 -0.390 -0.217 0.294
6 -0.368 -0.331 0.243 0.353 -0.390 0.518 0.330 -0.354
7 -0.316 -0.337 0.146 0.178 -0.226 0.330 0.539 -0.312
8 0.301 0.230 -0.366 -0.327 0.294 -0.354 -0.312 0.534
Now we obtain first matrix of residual coefficient (R1) by subtracting Q1 from R as shown above
Centroid Method
Reflecting the variables 3, 4, 6 and 7, we obtain reflected matrix of residual

coefficient (R’1) as under and then we can extract the second centroid factor (B)
from it as shown on the next page.
Reflected Matrix of Residual Coefficients (R’1) and Extraction of 2nd
Centroid Factor (B)
Important Methods used in exploratory factor analysis: Centroid Method
Centroid Method
Variables Factor loadings
Centroid factor A Centroid factor B
1 0.693 0.563
2 0.618 0.577
3 0.642 -0.539
4 0.641 -0.602
5 0.629 0.558
6 0.694 -0.630
7 0.679 -0.518
8 0.683 0.593
Centroid Method
Work out the communality and eigen values from the final results obtained in
Example 15.1. Also explain what they (along with the said two factors) indicate.
We work out the communality and eigen values for the given problem as
under:
Important Methods used in exploratory factor analysis: Centroid
Method
Variables Factor loadings Communality (h²)
Centroid factor A Centroid factor B
1 .693 .563 (.693)² + (.563)² = .797
2 .618 .577 (.618)² + (.577)² = .715
3 .642 -.539 (.642)² + (-.539)² = .703
4 .641 -.602 (.641)² + (-.602)² = .773
5 .629 .558 (.629)² + (.558)² =.707
6 .694 -.630 (.694)² + (-.630)² = .879
7 .679 -.518 (.679)² + (-.518)² = .729
8 .683 .593 (.683)² + (.593)² = .818
Eigen value (variance
accounted for i.e. common
variance) Proportion of total 3.490 2.631 6.121
variance Proportion of
common variance) .44 .33 .77
(44%) (33%) (77%)
Principal Components Method
Principal components (PC) analysis

Procedure to convert a set of observations of possibly correlated variables into
a set of values of linearly uncorrelated variables called principal components.
The number of principal components is less than or equal to the number of
original variables.
Transformation is defined in such a way that the first principal component
accounts for the largest variability in the data,
Principal components are guaranteed to be independent under some
conditions.
The PC technique is very popularly used for factor analysis.
PC method of factor analysis seeks to maximize the sum of squared loadings of each factor extracted in
turn.
The aim of the principal components method
Construction out of a given set of variables X.’s (j = 1, 2, ..., k) of new variables (p1), called principal
components which are linear combinations of the X s.
p1 = a11 X1 + a12 X2 + ... + a1kXk
p2 = a21 X1 + a22, X2 + ... +a2kXk
. . .
. . .
Pk = ak1X1 + ak2X2 + … + akkXk
The method is being applied mostly by using standardized variables

The aij’s are called loadings and are worked out in such a way that the
extracted principal components satisfy two conditions:
1.Principal components are uncorrelated (orthogonal)
2.The first principal component (p1) has the maximum variance, the second
principal component (p2) has the next maximum variance and so on.
Variables
X1 X2 X3 … Xk
Variables
X1 r11 r12 … r1k
X2 r21 r22 … r2k
X3 r31 r32 … r3k
. . . .
. . . .
Xk rk1 rk2 … rkk
The first step is to obtain the sum of coefficients in each column, including the diagonal
element. The vector of column sum is referred to as and when U, is normalized, we call it
Va1.
This is done by squaring and summing the column sums in and then dividing each
element in Ua1 by the square row of the sum of squares (which may be termed as
normalizing factor).
Then elements in Va1 are accumulatively multiplied by the first row of R to obtain the
first element in a new vector Ua2.
For instance, in multiplying Va1 by the first row of R, the first element in would be
multiplied by the r11 value and this would he added to the product of the second element
in Va1 multiplied by the r12, value, which would be added to the product of third element
in Va1 multiplied by the r13 value, and so on for all the corresponding elements in Va1 and
the first row of R
To obtain the second element of Ua2, the same process would be repeated i.e.,
the elements in Va1, are accumulatively multiplied by the 2nd row of R.
The same process would be repeated for each row of R and the result would
be a new vector Ua2. Then Ua2, would be normalized to obtain Va2.
One would then compare V0, and they are nearly identical, then convergence
is said to have occurred (If convergence does not occur, one should go on
using these trial vectors again and again till convergence occurs).
To obtain factor B, one seeks solutions for Vb, and the actual factor loadings for
second component factor B.
The same procedures are used as we had adopted for finding the first factor,
except that one operates off the first residual matrix, R1 rather than the original
correlation matrix R (We operate on R1 in just the same way as we did in case
of centroid method stated earlier).
This very procedure is repeated over and over again to obtain the successive
PC factors (viz. C, D, etc.),
Important Methods used in exploratory factor analysis: Principal
Components Method
Variables
1 2 3 4 5 6 7 8
1 1000 .709 .204 .081 .626 .113 .155 .774
Variables 2 .709 1000 .051 .089 .581 .098 .083 .652
3 .204 .051 1000 .671 .123 .689 .582 .072
4 .081 .089 .671 1000 .022 .798 .613 .111
5 .626 .581 .123 .022 1000 .047 .201 .724
6 .113 .098 .689 .798 .047 1000 .801 .120
7 .155 .083 .582 .613 .201 .801 1000 .152
8 .774 .652 .072 .111 .724 .120 .152 1000
Column sums Ua1 3.662 3.263 3.392 3.385 3.324 3.666 3.587 3.605
Normalizing Ua1 we obtain Va1

i.e. Va1 = Ua/Normalizing
factor* .371 .331 .344 .343 .337 .363 .365 .356
Normalisation des variables

Then we obtain Ua2 by accumulatively multiplying Va1, row by row into R and
the result comes as under:
Ua2: [1.296,1.143,1.201,1.201,1.165,1.308,1.280,1.275]
Normalizing it we obtain (normalizing factor for Ua2 will be worked out as
above and will be = 3.493)
Va2: [.371, .327, .344, .344, .334, .374, .366, .365]
We compute the loadings on the first principal component by multiplying Va by
the square root of the number tint we obtain for normalizing Ua2.
Variables Characteristic X Facteur de = Principal
vector Va) normalisation Component 1
1 .371 X 1.868 = .69
2 .331 X 1.868 = .62
3 .334 X 1.868 = .64
4 .343 X 1.868 = .64
5 .337 X 1.868 = .63
6 .372 X 1.868 = .70
7 .363 X 1.868 = .68
8 .365 X 1.868 = .68
Variables Principal component II
1 +57
2 +59
3 -52
4 -59
5 +57
6 -61
7 -49
8 -61
Variables Principal components Communality h²
I II
1 0.69 +0.57 (0.69)² + (0.57)² = 0.801

2 0.62 +0.59 (0.62)² + (0.59)² = 0.733
3 0.64 -0.52 (0.64)² + (-0.52)² = 0.680
4 0.64 -0.59 (0.64)² + (-0.59)² = 0.758
5 0.63 +0.57 (0.63)² + (0.57)² = 0.722
6 0.70 -0.61 (0.70)² + (-0.61)² = 0.862
7 0.68 -0.49 (0.68)² + (-0.49)² = 0.703
8 0.68 -0.61 (0.68)² + (-0.61)² = 0.835
Eigen value i.e. common
variance 3.4914 2.6007 6.0921
Proportion .436 of total .325 .761
(43.6%) variance (32.5%) (76%)
Chap.3.
Introduction à l’analyse factorielle
confirmatoire
Plan
 Ressemblances et différences entre l’analyse factorielle exploratoire et

confirmatoire
 Objectifs et avantages de l’analyse factorielle confirmatoire
 Les paramètres d’un modèle d’analyse factorielle confirmatoire
 Les équations fondamentales d’un modèle d’analyse factorielle confirmatoire
 Identification d’un modèle d’analyse factorielle confirmatoire
 Estimation des paramètres d’un modèle d’analyse factorielle confirmatoire
 Description des indices d’ajustements d’un modèle d’analyse factorielle
confirmatoire
Ressemblances et différences entre l’analyse factorielle
exploratoire et confirmatoire : Common Factor Model
Purpose of CFA is to identify factors that account for the variation and covariation among a set of
indicators.
EFA and CFA are based on the common factor model
Many of the concepts and terms apply to CFA (such as factor loadings, unique variances,
communalities, and residuals).
EFA is generally a descriptive or exploratory procedure
CFA the researcher must prespecify all aspects of the factor model: the number of factors, the
pattern of indicator–factor loadings, and so forth.
CFA requires a strong empirical or conceptual foundation to guide the specification and
evaluation of the factor model.
CFA is typically used in later phases of scale development or construct validation—after the
underlying structure has been tentatively established by prior empirical analyses using EFA, as
well as on theoretical grounds.
exploratoire et confirmatoire : Common Factor Model
EFA and CFA often rely on the same estimation methods (e.g., maximum likelihood, or ML).
When a full information estimator such as ML is used, the factor models arising from EFA
and CFA can be evaluated in terms of how well the solution reproduces the observed
variances and covariances among the input indicators (goodness-of-Fit evaluation).
Quality of EFA and CFA models is determined in part by the size of resulting parameter
estimates (magnitude of factor loadings and factor intercorrelations)
How well each factor is represented by observed measures (e.g., number of indicators per
factor, size of indicator communalities, factor determinacy).
exploratoire et confirmatoire: Standardized and Unstandardized
Solutions
EFA is to completely standardize all variables in the analysis.

A correlation matrix is used as input in EFA
Factors and indicators are completely standardized
Factor variances equal 1.0
 Factor loadings are interpreted as correlations or standardized regression coefficients.
Much of the analysis does not standardize the latent or observed variables in CFA
CFA typically analyzes a variance–covariance Matrix (needed to produce an unstandardized
CFA solution)
CFA input matrix is composed of indicator variances on the diagonal (a variance equals the
indicator’s standard deviation squared; i.e., VAR = SD2), and indicator covariances in the off-
diagonal (COVxy = rxySDxSDy).
exploratoire et confirmatoire: Standardized and Unstandardized
Solutions
Results of CFA include an unstandardized solution (parameter estimates expressed in the original
metrics of the indicators), and possibly a partially standardized solution (relationships involving
unstandardized indicators and standardized latent variables, or vice versa).
Many key aspects of CFA are based on unstandardized estimates, such as the standard errors
and significance testing of model parameters.
CFA may entail the analysis of both unstandardized variance–covariance structures and mean
structures (as the result of standardization in EFA, indicator means are presumed to be zero).
Indicator means are included as input in CFA, the analysis can estimate the means of the factors
and the intercepts of the indicators.
An indicator intercept is interpreted as the predicted value of the indicator when the factor—or
predictor is zero.
exploratoire et confirmatoire: standardized and Unstandardized
Solutions
Completely standardized solutions are most commonly reported CFA research

SEM methodologists often express a strong preference for reporting unstandardized solutions
Completely standardized values are potentially misleading.
True nature of the variance and relationships among indicators and factors can be masked
when these variables have been standardized
Original metric of variables is expressed in meaningful units, unstandardized estimates more
clearly convey the importance or substantive significance of the effects (Willett, Singer, &
Martin, 1998).
exploratoire et confirmatoire: Indicator Cross ‑Loadings/Model
Parsimony
EFA and CFA differ in the manner by which indicator cross-loadings are handled in solutions
entailing multiple factors
All indicators in EFA freely load on all factors
Solution is rotated to maximize the magnitude of primary loadings and minimize the
magnitude of cross-loadings.
Factor rotation does not apply to CFA.
Rotation is not necessary in CFA
Simple structure is obtained by specifying indicators to load on only one factor.
CFA models are typically more parsimonious than EFA solution
Primary loadings and factor correlations are freely estimated, no other relationships are
specified between the indicators and factors
CFA attempts to reproduce the observed relationships among input indicators with fewer
parameter estimates than EFA.
Ressemblances et différences entre l’analyse factorielle exploratoire
et confirmatoire: Indicator Cross‑Loadings/Model Parsimony
Table 3.1 presents the factor loading matrices of three analyses of the same data set (N = 1,050
adolescents):
1. CFA (Model A),
2. EFA with oblique rotation (Model B),
3. EFA with orthogonal rotation (Model C).
Eight antisocial behaviors are used as indicators in the analyses entails two factors
1. Property Crimes (e.g., shoplifting, vandalism) and Violent Crimes (e.g., fighting,
aggravated assault).
Parsimony
Path diagrams of Models A and B in Figure 3.1 correspond to Models A and B in Table 3.1.
 Model B path diagram can be edited to conform to an orthogonal EFA by removing the
double-headed curved arrow reflecting the factor correlation.
Each indicator in EFA loads on all factors.
Rotation (either orthogonal or oblique) is used to foster the interpretability of the factor
loadings (i.e., to maximize large loadings, to minimize small loadings).
Rotation does not affect the fit of the EFA solution
Parsimony
CFA model is more parsimonious than the EFA models

All indicator cross-loadings prespecified to equal zero;
Y1–Y4 on Violent Crimes = 0, Y5–Y8 on Property Crimes = 0
There are only 8 factor loading estimates in the CFA, compared to 16 factor loading
estimates in the EFAs.
Rotation in CFA is not required.
Another consequence of fixing cross-loadings to zero in CFA
Factor correlation estimates in CFA are usually of higher magnitude than analogous EFA
solutions.
Parsimony
Ressemblances et différences entre l’analyse factorielle exploratoire
et confirmatoire: Indicator Cross‑Loadings/Model Parsimony
Parsimony
Parsimony
exploratoire et confirmatoire: Unique Variances
CFA offers the researcher the ability to specify the nature of relationships among the
measurement errors (unique variances) of the indicators.
Within EFA the relationships among unique variances are not specified.
 CFA typically entails a more parsimonious
CFA usually attempts to reproduce the observed relationships among indicators with
fewer parameter estimates than EFA)
Possible to estimate such relationships when this specification is substantively
justified and other identification requirements are met
EFA’s identification restrictions

Factor models must be specified under the assumption that measurement error is random.
Correlated measurement error can be modeled in a CFA solution.
The CFA model presented in Model A of Figure 3.1 (Figure 3.1A) depicts a two-factor
measurement model
All measurement error is presumed to be random.
Underlying assumption of this specification
Observed relationship between any two indicators loading on the same factor (Y7 and Y8) is
due entirely to the shared influence of the latent dimension; that is, if Factor 2 is partialed out,
the intercorrelations between these indicators will be zero.
The model presented in Figure 3.1C depicts the same CFA measurement model, Exception that
a correlated error has been specified between Y2 and Y3.
Indicators Y2 and Y3 are related in part because of the shared influence of the latent
dimension (Factor 1), some of their covariation is due to sources other than the common
factor.
Specification of correlated errors may be justified
1.Basis of source or method effects that reflect additional indicator covariation resulting from
common assessment methods (e.g., observer ratings, questionnaires);
2.Reversed or similarly worded test items
3.Differential susceptibility to other influences, such as response set, demand characteristics,
acquiescence, reading difficulty, or social desirability (Brown, 2003; Marsh, 1996).
The inability to specify correlated errors ( the nature of the relationships among unique
variances) very significant limitation of EFA.
A common consequence of this EFA limitation is the tendency to extract and interpret
methods factors that have little substantive basis (Brown, 2003).
Psychometric literature exists on the Rosenberg (1965) Self-Esteem Scale (SES), a questionnaire
that consists of four positively worded items (I feel good about myself) and three negatively
worded items (At times I think I am no good at all).
EFA produced two SES factors composed of negatively and positively worded items that were
interpreted as substantively meaningful ( Positive Self-Evaluation vs. Negative Self-
Evaluation).
Strong conceptual basis did not exist in support for distinct dimensions of positive and
negative self-esteem.
Marsh (1996) evaluated various SES measurement models corresponding to previously reported
solutions using CFA
One-factor model without error covariances, two-factor models and correlated uniqueness
(residual) models.
Results indicated the superiority of a unidimensional solution (Global Self-Esteem) with
method effects (correlated residuals) associated with the negatively worded items.
Existence of a single dimension of self-esteem, but need for an error theory to account for the
additional covariation among similarly worded items
Model could not be estimated in EFA because EFA does not allow for the specification of
correlated indicator errors.
exploratoire et confirmatoire: Model Comparison
CFA framework allows a researcher to impose other restrictions on the factor solution
Such as constraining all the factor loadings or all the unique variances to be equal
Viability of these constraints evaluated by statistically comparing whether the fit of the more
restricted solution is worse than a comparable solution without these constraints.
Direct statistical comparison of alternative solutions is possible when the models are nested.
Nested model contains a subset of the free parameters of another model (which is often
referred to as the parent model).
Two confirmatory Factor Analysis for Applied Research models
1. Model P, a one-factor model composed of six indicators allowed to load freely onto the
factor
2.Model N, a one-factor model identical to Model P, except that the factor loadings are
constrained to load equally onto the factor.
 Models are structurally the same (i.e., they consist of one factor and the same six indicators)
Difference in their number of freely estimated versus constrained parameters.
Parameters freely estimated
Researcher allows the analysis to find the values for the parameters in the CFA solution (e.g.,
factor loadings, factor correlations, unique variances) that optimally reproduce the variances
and covariances of the input matrix.
Fixed parameters
Researcher assigns specific values (fixes cross-loadings to zero to indicate no relationship
between an indicator and a factor)
Parameters are constrained, the researcher does not specify the parameters’ exact values, but
places other restrictions on the magnitude these values can take on.
Case of Model N
Researcher instructs the analysis to optimally reproduce the input matrix under the condition
that all factor loadings are the same.
Model N is nested under Model P
Contains a subset of Model P’s free parameters.
Fit of Model N can be statistically compared to the fit of Model P (through methods such as
the Khi-deux difference test ) to directly evaluate the viability of the condition of equal factor
loadings
EFA entails only freely estimated parameters
Fixed parameters cannot be specified and comparative model evaluation of this nature is not
possible
CFA can be used to statistically determine whether the various measurement parameters of a
factor model (e.g., factor loadings) are the same in two or more groups (males and females).
Purposes and Advantages of CFA
Specification of CFA is strongly driven by theory or prior research evidence.

CFA researcher usually tests a much more parsimonious solution by indicating the number of
factors, the pattern of factor loadings (and cross-loadings, which are usually fixed to zero),
and an appropriate error theory (random or correlated indicator error).
CFA allows for the specification of relationships among the indicator uniquenesses (error
variances), which may have substantive importance (e.g., correlated errors due to method
effects).
Every aspect of the CFA model is specified in advance.
The acceptability of the specified model

Evaluated by goodness of fit and by the interpretability and strength of the resulting
parameter estimates.
CFA is more appropriate than EFA in the later stages of construct validation and test
construction,
Prior evidence and theory support more “risky” a priori predictions regarding latent structure.
Modeling flexibility and capabilities of CFA (specification of an error theory) afford
sophisticated analyses of construct validity
Convergent and discriminant validity of dimensions are evaluated in context of (partialing out
the influence of) varying assessment methods.
CFA offers a very strong analytic framework for evaluating the equivalence of measurement
models across distinct groups (demographic groups such as sexes, races, or cultures).
Accomplished by either multiple-groups solutions (simultaneous CFAs in two or more
groups) or “multiple indicators, multiple causes” (MIMIC) models
CFA framework is superior in terms of its modeling flexibility and its ability to examine every
potential source of invariance in the factor solution, including latent means and indicator
intercepts.
These capabilities permit a variety of important analytic opportunities in applied research,
Evaluation of whether a scale’s measurement properties are invariant across population
subgroups (are the number of factors, factor loadings, item intercepts, etc., that define the
latent structure of a questionnaire equivalent in males and females?).
Measurement invariance is an important aspect

Aspect of scale development, as this endeavor determines whether a testing instrument is
appropriate for use in various groups
Multiple-groups CFA can be used to evaluate the generalizability of a variety of important
constructs (are the diagnostic criteria sets used to define mental disorders equivalent across
demographic subgroups such as race and gender?).
Approach can be used to examine group differences in the means of the latent dimensions.
CFA is analogous to ANOVA,
It is superior to ANOVA because group comparisons are made in the context of measurement
invariance
ANOVA simply assumes that a given observed score reflects the same level of the latent
construct in all groups.
Another advantage of CFA and SEM
Ability to estimate the relationships among variables adjusting for measurement error.
A key limitation of ordinary least squares (OLS) approaches such as correlational and multiple
regression analysis
Assumption that variables have been measured without error
They are perfectly reliable, meaning that all of an observed measure’s variance is true score
variance.
Assumption rarely holds in the social and behavioral sciences,
Rely heavily on variables that have been assessed by questionnaires, independent observer
ratings, and so forth.
Estimates derived from OLS methods (e.g., correlations, regression coefficients) are usually
attenuated to an unknown degree by measurement error in the variables used in the analysis.
CFA and SEM allow for such relationships to be estimated after adjustments for
measurement error and an error theory
Relationship between the two constructs is reflected by their factor intercorrelation
(r between Factor 1 and Factor 2) as opposed to the observed relationships among the
indicators that load on these factors.
Factor correlation is a better estimate of the population value of this relationship than
any two indicator pairings (e.g., r between Y1 and Y4)
Adjusted for measurement error; that is, shared variance among the factor’s
indicators is operationalized as true-score variance, which is passed on to the latent
variable
Researcher will wish to relate the factors revealed by EFA to other variables.
Requires the researcher to compute factor scores to serve as proxies for the factors in
subsequent analyses.
This practice is limited by the issue of factor score indeterminacy
For any given EFA, an infinite number of sets of factor scores can be computed that are
equally consistent with the factor loadings.
In CFA and SEM, indeterminacy of factor scores is not a problem because this analytic
framework eliminates the need to compute factor scores
The latent variables themselves are used in the analysis.
CFA and SEM offer the researcher considerable modeling flexibility
Additional variables can be readily brought into the analysis to serve as correlates predictors,
or outcomes of the latent variables
CFA is used as a precursor to SEM which specifies structural relationships (e.g., regressions)
among the latent variables).
A structural equation model can be broken down into two major components:
1.The measurement model
 Specifies the number of factors, how the various indicators are related to the factors, and
the relationships among indicator errors ( CFA model)
2.The structural model
 Specifies how the various factors are related to one another (e.g., direct or indirect effects,
no relationship, spurious relationship).
Consider the two basic path diagrams in Figure 3.2.
Both diagrams depict models entailing the same set of indicators and the same factors
Two diagrams
First diagram (A) represents a measurement model (CFA model entailing three
intercorrelated factors)
Second diagram (B) reflects a structural model to indicate that the relationship between
Factor X and Factor Y is fully mediated by Factor Z (as with factor loadings, direct effects
among latent variables are depicted by unidirectional arrows in Figure 3.2B).
Relationships among the latent variables are allowed to intercorrelate freely in the CFA
model (analogous to an oblique EFA solution),
 Exact nature of the relationships is specified in the structural model; that is, Factor X has a
direct effect on Factor Z, Factor Z has a direct effect on Factor Y, and Factor X has an
indirect effect on Factor Y.
Measurement (CFA) model with three parameters relating the factors to one another: factor
correlations between X and Y, X and Z, and Y and Z (depicted by double-headed, curved
arrows in Figure 3.2A).
Structural model, there are only two structural parameters, X → Y and Y → Z.
Structural portion of this solution is overidentified,
Exist fewer structural parameters ( X → Y and Y → Z) in the model than the number of
possible relationships among the factors (correlations between X and Y, X and Z, and Y and
Z).
Structural model is more parsimonious than the measurement model
Attempts to reproduce the relationships among the latent variables with one less freely
estimated parameter.
Overidentified nature of the structural portion of this model,

Its goodness of fit may be poorer than that of the measurement model.
Structural portion of this model will result in poor fit if the product of the Factor X → Factor
Z path and Factor Z → Factor Y path does not closely approximate the correlation between
Factors X and Y estimated in the measurement model.
Indirect effects structural model in Figure 3.2B will be poor-fitting because the product of
the X → Z and Z → Y direct effects [(.40)(.50) = .20] does not approximate the correlation
between Factors X and Y (.60; see Figure 3.2A).
Purpose of this discussion is to illustrate that goodness of model fit is determined by how
adequately both the measurement and structural portions of a model are specified.
A key aspect of CFA evaluation
Ability of the parameters from the measurement model (e.g., factor loadings and factor
correlations) to reproduce the observed relationships among the indicators.
CFA model is misspecified ( failure to specify the correct number of factors, pattern of
factor loadings), a poor-fitting solution will result.
Poor fit may also arise from a misspecified structural model which, like the model
depicted in Figure 3.2B, often possesses fewer freely estimated parameters than its
corresponding measurement model.
Various potential sources of poor fit in CFA models involving multiple indicators, the
researcher should establish a viable measurement model prior to pursuing a structural
solution.
Difficult to determine the extent to which poor fit is attributable to the measurement or structural
aspects of the solution.
Consider the scenario where the measurement model in Figure 3.2A is well specified (good-
fitting, strong, and interpretable factor loadings and factor correlations),
Researcher begins by testing the structural model shown in Figure 3.2B.
Poor fit would be due to the misspecified structural model
Researcher may falsely suspect the measurement aspect of the model.
Poor fit cannot arise from the structural portion of a CFA measurement model because the factors
are usually specified as freely intercorrelated.
CFA solutions are a useful prelude to SEM solutions, which aim to reproduce the relationships
among latent variables with a more parsimonious set of structural parameter estimates.
Parameters of a CFA Model
All CFA models contain factor loadings, unique variances, and factor variances.
1. Factor loadings are the regression slopes for predicting the indicators from the latent
variable.
2. Unique variance is variance in the indicator that is not accounted for by the latent
variables.
3. Unique variance is typically presumed to be measurement error and is thus often referred
to as such (other synonymous terms include error variance and indicator unreliability).
Unstandardized solution
 Factor variance expresses the sample variability or dispersion of the factor
 Extent to which sample participants’ relative standing on the latent dimension is similar or
different.
CFA may include error covariances (referred to as correlated uniquenesses,

correlated residuals, or correlated errors)
Suggest that two indicators covary for reasons other than the shared influence
of the latent variable (e.g., see Figure 3.1C).
Error covariances are often specified on the basis of method effects
CFA solution consists of two or more factors
A factor covariance is usually specified to estimate the relationship between
the latent dimensions
CFA is often confined to the analysis of variance–covariance structures.

Aforementioned parameters (factor loadings, error variances and covariances, factor
variances and covariances) are estimated to reproduce the input variance–covariance matrix.
The analysis of covariance structures is based on the implicit assumption that indicators are
measured as deviations from their means (i.e., all indicator means equal zero).
CFA model can be expanded to include the analysis of mean structures
CFA parameters also strive to reproduce the observed sample means of the indicators
(which are included along with the sample variances and covariances as input data).
CFA models also include parameter estimates of the indicator intercepts and the latent
variable means, which are often used in multiple-groups CFA to test whether distinct groups
differ in their relative standing on latent dimensions
Latent variables in CFA may be either exogenous or endogenous.

An exogenous variable is a variable that is not caused by other variables in the solution (such
as Factor X in Figure 3.2B).
Endogenous variable is caused by one or more variables in the model (i.e., other variables in
the solution exert direct effects on the variable, as in Factor Y in Figure 3.2B).
Exogenous variables can be viewed as synonymous to X, independent, or predictor (causal)
variables.
Endogenous variables are equivalent to Y, dependent, or criterion (outcome) variables.
Structural models, an endogenous variable may be the cause of another endogenous variable, as is
the case of Factor Z in Figure 3.2B.
CFAs are typically considered to be exogenous (latent X) variable solutions (e.g., Figure
3.2A).
CFA is a pure measurement model

Some researchers (methodologists using LISREL software) choose to specify the analysis
as a latent Y solution.
Various reasons for this including the ability to accomplish useful specification tricks in
LISREL
Specifications for scale reliability evaluation
Greater simplicity and the fact that many statistical papers use latent Y specifications to
present information
Specifying a pure CFA measurement model as a latent X or latent Y solution has no impact
on the fit and parameter estimates of the solution.
LISREL notation for the parameters and matrices of a CFA solution for latent X and latent
Y specifications presented in Figures 3.3 and 3.4
Not necessary to understand this notation in order to specify CFA models in most
software packages.
Knowledge of this notational system is useful because most sourcebooks and quantitative
papers rely on it to describe the parameters and equations of CFA and SEM.
Lowercase Greek symbols correspond to specific parameters (i.e., elements of a matrix
such as l)
Uppercase Greek letters reflect an entire matrix (the full matrix of factor loadings, L).
Factor loadings are symbolized by lambdas (λ) with x and y subscripts in the case of exogenous
and endogenous latent variables
The unidirectional arrows (→) from the factors (,, n1) to the indicators ( X1, Y1) depict direct
effects (regressions) of the latent dimensions onto the observed measures;
The specific regression coefficients are the lambdas (λ).
Thetas (Θε) represent matrices of indicator error variances and covariances—theta-delta
(Θδ) in the case of indicators of latent X variables,
Theta-epsilon (Θε) for indicators of latent Y variables.
Symbols δ and ε are often used in place of Θδ and Θε
Unidirectional arrows connect the thetas to the observed measures (X1–X6),

Arrows do not depict regressive paths
Θδ and Θε are symmetric variance–covariance matrices consisting of error
variances on the diagonal, and error covariances, if any, in the off-diagonal.
Some notational systems do not use directional arrows in the depiction of
error variances in order to avoid this potential source of confusion
Onotational variation is to symbolize error variances with ovals because, like
latent variables, measurement errors are not observed).
Factor variances and covariances are notated by phi (ɸ) and psi (Ψ) in latent X and latent Y
models
Bidirectional arrows are used to symbolize covariances (correlations)
Curved arrows indicate the covariance between the factors (ɸ21, Ψ21) and the error covariance
of the X5 and X6 indicators (δ21, ε21)
When relationships are specified as covariances, the researcher is asserting that the variables
are related (e.g., ξ1 and ξ2).
Specification makes no claims about the nature of the relationship,
Lack of knowledge regarding the directionality of the association (ξ 1 → ξ2) or the
unavailability to the analysis of variables purported to account for this overlap
λx11 measure loads on the first exogenous factor (ξ1), and λx21 indicates that X2 also
loads on ξ1.
This numeric notation assumes that the indicators are ordered X1, X2, X3, X4, X5,
and X6 in the input variance–covariance matrix.
Input matrix is arranged in this fashion, the lambda X matrix (Λx) in Figure 3.3 will
be as follows
Two notations
First numerical subscript refers to the row of Λx (i.e., the positional order of the X indicator),
Second numerical subscript refers to the column of Λx (i.e., the positional order of the
exogenous factors, ξ).
λx52 conveys that the fifth indicator in the input matrix (X5) loads on the second latent X
variable (ξ2).
Thus Λx and Λy are full matrices whose dimensions are defined by p rows (number of
indicators)
Latent Y notation for a two-factor CFA model with one error covariance.
Factor variances, factor means, and indicator intercepts are not depicted in the path diagram.
Elements meanings
The zero elements of Λx (λx12, λx41) indicate the absence of cross-loadings (e.g., the
relationship between ξ2 and x2 is fixed to zero).
A similar system is used for variances and covariances among factors (ɸ in Figure 3.3, Ψ in
Figure 3.4) and indicator errors (δ and ε in Figures 3.3 and 3.4, respectively).
CFA solution reflect variances and covariances, they are represented by m × m symmetric
matrices with variances on the diagonal and covariances in the off-diagonal.
Phi matrix (ɸ) in Figure 3.3 will look as follows:
Specificities
δ11 through δ66 are the indicator errors and δ65 is the covariance of the measurement errors of
indicators X5 and X6.
Diagonal elements are indexed by single digits in Figures 3.3 and 3.4 (δ6 is the same as δ66).
The zero elements of Θδ(δ21) indicate the absence of error covariances (these relationships
are fixed to zero).
Indicator intercepts are symbolized by tau (ɽ), and latent exogenous and endogenous means
are symbolized by kappa (κ) and alpha (α) respectively.
LISREL notation also applies to structural component of models that entail directional
relationships among exogenous and endogenous variables.
Gamma (γ, matrix: ┌) denotes regressions between latent X and latent Y variables, and beta
(ß, matrix: B) symbolizes directional effects among endogenous variables.
Fundamental Equations of a CFA Model CFA
CFA aims to reproduce the sample variance–covariance matrix by the parameter estimates of the
measurement solution (e.g., factor loadings, factor covariances, etc.).
Figure 3.3 has been revised
Parameter estimates have been inserted for all factor loadings, factor correlation, and indicator
errors (see now Figure 3.5).
Completely standardized values are presented, although the same concepts and formulas apply to
unstandardized solutions.
The first three measures (X1, X2, X3) are indicators of one latent construct (ξ 1), whereas the
next three measures (X4, X5, X6) are indicators of another latent construct (ξ 2 ).
Indicators X4, X5, and X6 are congeneric (Jöreskog, 1971) because they share a common
factor (x2).
An indicator is not considered congeneric if it loads on more than one factor.
Congeneric factor loadings
Variance of an indicator is reproduced by multiplying its squared factor loading by the
variance of the factor, and then summing this product with the indicator’s error variance.
Predicted covariance of two indicators that load on the same factor is computed as the
product of their factor loadings times the variance of the factor.
The model-implied covariance of two indicators that load on separate factors is estimated as
the product of their factor loadings times the factor covariance
Parameter estimates in the solution presented in Figure 3.5,

Variance of X2 can be reproduced by the following equation (using latent X
notation):
Case of completely standardized solutions
Reproduce the variance of an indicator by simply squaring its factor loading (.802) 2 and
adding its error (.36), because the factor variance will always equal 1.00
The squared factor loading represents the proportion of variance in the indicator that is
explained by the factor
Communality of X2 is
ξ1 accounts for 64% of the variance in X2.

Completely standardized solution presented in Figure 3.5

Errors represent the proportion of variance in the indicators that is not explained by the
factor for example, δ2 = .36,
Indicating that 36% of the variance in X2 is unique variance (e.g., measurement error).
These errors (residual variances) can be readily calculated as 1 minus the squared factor
loading
Predicted covariance (correlation) between X2 and X3 is estimated as follows:

Case of completely standardized solutions

Factor variance will always equal 1.00, so the predicted correlation between two
congeneric indicators can be calculated by the product of their factor loadings
Model-implied corrélation of X4, X5 = .80(.75) = .60.
Predicted covariance (correlation) between X3 and X4 (indicators that load on separate
factors) is estimated as follows:
Factor correlation (ɸ21) rather than the factor variance is used in this calculation.
6 variances and 15 covariances (completely standardized) that are estimated by the two-
factor measurement model.
Correlation between the errors of the X5 and X6 indicators (δ65 = .20).
Covariation between the indicators is not accounted for fully by the factor (ξ 2)
X5 and X6 share additional variance due to influences other than the latent construct
Equation to calculate the predicted correlation of X5 and X6 includes the correlated
error:
CFA model identification
Estimate the parameters in CFA, the measurement model must be identified.

A model is identified
The basis of known information it is possible to obtain a unique set of parameter estimates
for each parameter in the model whose values are unknown (e.g., factor loadings, factor
correlations).
Model identification pertains in part to the difference between the number of freely estimated
model parameters and the number of pieces of information in the input variance–covariance
matrix.
Before this issue is addressed, an aspect of identification specific to the analysis of latent
variables is discussed scaling the latent variable
CFA model identification: Scaling the Latent Variable
Researcher to conduct a CFA

Every latent variable must have its scale identified.
Latent variables are unobserved and thus have no defined metrics (units of measurement).
Units of measurement must be set by the researcher.
Most often accomplished in one of two ways.
1. Researcher fixes the metric of the latent variable to be the same as one of its indicators.
The indicator selected to pass its metric on to the factor is often referred to as a marker or
reference indicator
When a marker indicator is specified, a portion of its sample variance is passed on to the
latent variable.
Suppose X1 is selected as the marker indicator for ξ1 and has a sample variance (δ 11) of 16.
Because X1 has a completely standardized factor loading on ξ1 of .90, 81% of its variance
is explained by ξ1; .902 = .81 (cf. Eq. 3.5).
81% of the sample variance in X1 is passed on to x1 to represent the factor variance of ξ1
2. Variance of the latent variable is fixed to a specific value usually 1.00.

Standardized and a completely standardized solution are produced.
Fit of this model is identical to that of the unstandardized model (i.e., models
estimated using marker indicators).
The former strategy produces an unstandardized solution
Tests of measurement invariance across groups and evaluations of scale
reliability.
Method of scale setting can be considered superior to the marker indicator
approach,
Indicators have been assessed on an arbitrary metric, and when the completely
standardized solution is of more interest to the researcher
A third method of scaling latent variables that is akin to effects coding in ANOVA (Little,
Slegers, and Card, 2006) have introduced.
Priori constraints are placed on the solution,
Set of factor loadings for a given construct average to 1.00 and the corresponding indicator
intercepts sum to zero.
Variance of the latent variables reflects the average of the indicators’ variances explained by
the construct, and the mean of the latent variable is the optimally weighted average of the
means for the indicators of that construct.
Nonarbitrary because the latent variable will have the same unstandardized metric as the
average of all its manifest indicators.
CFA model identification: Statistical Identification
Parameters of a CFA model can be estimated only if the number of freely estimated parameters
does not exceed the number of pieces of information in the input variance–covariance matrix.
A model is underidentified when the number of unknown (freely estimated) parameters
exceeds the number of known information (elements of the input variance–covariance
matrix).
An underidentified model cannot be solved because there are an infinite number of parameter
estimates that result in perfect model fit.
x + y = 7 (3.11)
There are 2 unknowns (x and y) and 1 known (x + y = 7).
This equation is underidentified because the number of unknown parameters (x and y)
exceeds the known information
x and y can take on an infinite number of pairs of values to solve for x + y = 7 (x = 1, y

= 6; x = 2, y = 5; etc.).
Knowns are usually the sample variances and covariances of the input indicators.
CFA involves the analysis of mean structures, the sample means of the indicators
are also included in the count of known information,
Indicator intercepts and latent means are included in the count of parameter
estimates
Input matrix is composed of 3 knowns (pieces of information): the variances of X1
and X2, and the covariance of X1 and X2.
The unknowns of the CFA solution are the freely estimated model parameters.
There are 4 freely estimated parameters: 2 factor loadings (λx11, λx21) and 2 indicator
errors (δ1, δ2).
Metric of x1 is set by fixing its variance to 1.0.
Factor variance (ɸ11) is fixed, it is not included in the count of unknowns.
May opt to define the metric of ξ1 by choosing either X1 or X2 to serve as a marker
indicator.
Factor variance (ɸ11) contributes to the count of freely estimated parameters, but the
factor loading of the marker indicator is not included in this tally because it is fixed
to pass its metric on to ξ1
CFA model in Figure 3.6A is underidentified
Number of unknowns (4 freely estimated parameters) exceeds the number of knowns (3
elements of the input matrix = 2 variances, 1 covariance).
This model aims to reproduce the sample covariance of X1 and X2.
Sample covariance corresponds to a correlation of .64
λx11, λx21, δ1, and δ2 can take on an infinite number of sets of values to reproduce an X1–X2
correlation of .64.
Predicted correlation between two indicators that load on the same factor is the product of
their factor loadings.
Endless pairs of values that can be estimated for lx11 and lx21 that will produce a perfectly
fitting model λx11 = .80, λx21 = .80; λx11 = .90, λx21 = .711; λx11 = .75, λx21 = .853).
Possible to identify the Figure 3.6A model if additional constraints are imposed on the
solution.
Researcher can add the restriction of constraining the factor loadings to equality.
Number of knowns (3) will equal the number of unknowns (3), and the model will be just-
identified.
Just-identified models there exists one unique set of parameter estimates that perfectly fit the
data.
Only factor loading parameter estimate that will reproduce the observed X1–X2 correlation
(.64) is .80; λx11 = .80 and λx21 = .80, solved by imposing the equality constraint.
Imposing constraints may assist in model identification by reducing the number of freely
estimated parameters
Such restrictions are often unreasonable on the basis of evidence or theory.
Just-identified models can be “solved.”

Number of knowns equals the number of unknowns, in just-identified models
there exists a single set of parameter estimates that perfectly reproduce the
input matrix.
Consider this example from simultaneous equations algebra:
Number of unknowns (x, y) equals the number of knowns (x + y = 7, 3x –y = 1).

Basic algebraic manipulation, it can be readily determined that x = 2 and y = 5;
that is, there is only one possible pair of values for x and y.
CFA model in Figure 3.6B.
Input matrix consists of 6 knowns (3 variances, 3 covariances)
Model consists of 6 freely estimated parameters: 3 factor loadings and 3 indicator
errors (again assume that the variance of ξ1 is fixed to 1.0).
CFA model is just-identified
Produce a unique set of parameter estimates (λx11, λx21, λx31, δ1, δ2, δ3) that perfectly
reproduce the correlations among X1, X2, and X3
Just-identified CFA models can be conducted on a sample input matrix

Goodness-of- model-fit evaluation does not apply because, by nature, such solutions
always have perfect fit.
This is also why goodness of fit does not apply to traditional statistical analyses such as
multiple regression
These models are inherently just-identified.
OLS multiple regression, all observed variables are connected to one another either by
direct effects, X1, X2 → Y, or freely estimated correlations, X1 ↔ X2.
CFA model of a construct consisting of 3 observed measures may meet the conditions of
identification (as in Figure 3.6B),
True if the errors of the indicators are not correlated with each other.
Model depicted in Figure 3.6C is identical to that in Figure 3.6B, with the exception of a
correlated residual between indicators X2 and X3.
 Additional parameter (δ32) now brings the count of freely estimated parameters to 7, which
exceeds the number of elements of the input variance–covariance matrix (6).
Thus the Figure 3.6C model is underidentified and cannot be fit to the sample data.
A model overidentified
Number of knowns (i.e., number of variances and covariances in the input matrix) exceeds the
number of freely estimated model parameters.
One-factor model depicted in Figure 3.7 (Model A) is structurally overidentified
10 elements of the input matrix (4 variances for X1–X4, 6 covariances), but only 8 freely
estimated parameters (4 factor loadings, 4 error variances; the variance of x1 is fixed to 1.0).
The difference in the number of knowns (a) and the number of unknowns (b; i.e., freely
estimated parameters) constitutes the model’s degrees of freedom (df).
Three cases
1.Overidentified solutions have positive df
2.Just-identified models have 0 df (number of knowns equals the number of unknowns)
3.Underidentified models have negative df (they cannot be solved or fit to the data).
Second model in Figure 3.7 (Model B) is also overidentified with df = 1
There are 10 elements of the input matrix and 9 freely estimated parameters thus resulting in 1
df)
Final example of an overidentified solution,
Measurement model presented in Figure 3.5.
21 pieces of information in the input matrix (6 variances, 15 covariances).
Becomes cumbersome to count the elements of the input matrix as the number of variables
increases
Following formula readily provides this count:
Reflecting the 6 variances (p) and the 15 covariances [p(p+1)/ 2].

Specification of the Figure 3.5 model entails 14 freely estimated parameters : 4 factor
loadings, 6 error variances, 1 error covariance, 2 factor variances, 1 factor covariance.
The loadings of X1 and X4 are not included in this count because they are fixed in order to
pass their metrics onto ξ1 and ξ 2.
Model overidentified with df = 7 (21 knowns minus 14 unknowns).
Degrees of freedom used in many descriptive indices of goodness of model fit

Important aspect of overidentified solutions
 Goodness-of-fit evaluation applies—specifically, evaluation of how well the model is able
to reproduce the input variances and covariances (i.e., the input matrix) with fewer unknowns
(i.e., freely estimated model parameters).
 Just-identified models, available known information indicates that there is one best value for
each freely estimated parameter in the overidentified solution.
 Just-identified models, overidentified models rarely fit the data perfectly
Specification of a model to have at least 0 df is a necessary but not sufficient condition

for identification.
Empirically underidentified solution
Model is statistically just-identified or overidentified, but aspects of the input matrix
or model specification prevent the analysis from obtaining a unique and valid set of
parameter estimates.
The most obvious example of empirical underidentification is the case where all
covariances in the input matrix equal 0.
Empirical underidentification can result from other patterns of (non)relationships in
the input data.
CFA model identification: Guidelines for Model
Identification
Regardless of the complexity of the model latent variables must be scaled,

Specifying marker indicators or fixing the variance of the factor (usually to a value of
1.00).
Number of pieces of information in the input matrix (e.g., indicator variances and
covariances) must equal or exceed the number of freely estimated model parameters (e.g.,
factor loadings, factor variances–covariances, indicator error variances–covariances).
CFA model identification: Guidelines for Model
Identification
Case of one-factor models, a minimum of three indicators is required.

When three indicators are used (and no correlated errors are specified; e.g., Figure 3.6B), the
one-factor solution is just-identified and goodness-of-fit evaluation does not apply,
When four or more indicators are used (and no correlated errors are specified; e.g., Figure
3.7A), the model is overidentified and goodness of fit can be used in evaluating the
acceptability of the solution.
Case of models that entail two or more factors and two indicators per latent construct,
Solution will be overidentified—provided that every latent variable
Solutions are susceptible to empirical underidentification, a minimum of three indicators
per latent variable is recommended.
Estimation of CFA Model Parameters
Objective of CFA
Obtain estimates for each parameter of the measurement model (factor loadings, factor
variances and covariances, indicator error variances and possibly error covariances)
 Produce a predicted variance–covariance matrix (symbolized as Σ) that resembles the sample
variance–covariance matrix (symbolized as S) as closely as possible.
 Overidentified models perfect fit is rarely achieved (Σ ≠ S).
Find a set of factor loadings (λx11, λx21, λx31, λx41) that yield a predicted covariance matrix (Σ)
that best reproduces the input matrix (S)
 Find parameter estimates for λx11 and λx21 such that the predicted correlation between X1 and
X2 (λx11ɸ11 λx21) closely approximates the sample correlation of these indicators (s21)
Fitting function, a mathematical operation to minimize the difference between Σ and S.
Concepts meanings
|S| is the determinant of the input variance–covariance matrix;
p is the order of the input matrix (i.e., the number of input indicators)
 Ln is the natural logarithm.
| Σ | is the determinant of the predicted variance–covariance matrix
Determinant and trace summarize important information about matrices such as S and Σ.
The determinant is a single number that reflects a generalized measure of variance for the
entire set of variables contained in the matrix.
The trace of a matrix is the sum of values on the diagonal (e.g., in a variance–covariance
matrix, the trace is the sum of variances).
Objective of ML
Minimize the differences between these matrix summaries (i.e., the determinant and trace) for
S and Σ.
Most clearly illustrated in the context of a perfectly fitting model

Determinant of S will equal the determinant of Σ,
Difference between the logs of these determinants will equal 0.
Similarly, (S)(Σ–1) will equal an identity matrix with all ones in the diagonal.
Diagonal elements are summed (via the trace function),
Result will be the value of p. Subtracting p from this value yields 0.
Perfect model fit, FML equals 0.
Underlying principle of ML estimation in CFA

Find the model parameter estimates that maximize the probability of observing the available
data if the data are collected from the same population again.
ML aims to find the parameter values that make the observed data most likely (or, conversely,
that maximize the likelihood of the parameters, given the data).
Finding the parameter estimates for an overidentified CFA model is an iterative procedure.
Computer program (such as LISREL, Mplus, or EQS) begins with an initial set of parameter
estimates and repeatedly refines these estimates in an effort to reduce the value of FML (to
minimize the difference between S and Σ ).
Each refinement of the parameter estimates to minimize FML is an iteration.

Program conducts internal checks to evaluate its progress in obtaining parameter estimates
that best reproduce S (i.e., that result in the lowest FML value).
Convergence of the model is reached when the program arrives at a set of parameter estimates
that can not be improved upon to further reduce the difference between S and Σ .
A given model and S, the researcher may encounter minor differences across programs in FML,
goodness-of-fit indices, and so forth, as the result of variations across software packages in
minimization procedures and stopping criteria
Latent variable solution will fail to converge.
Convergence is often related to the quality and complexity of the specified model (e.g., the
number of restrictions imposed on the solution) and the adequacy of the starting values.
Convergence may not be reached because the program has stopped at the maximum number
of iterations (set by either the program’s default or a number specified by the user).
This problem may be rectified by simply increasing the maximum number of iterations or
possibly using the preliminary parameter estimates as starting values.
A program may also cease before the maximum number of iterations has been reached
because its internal checks indicate that progress is not being made in obtaining a solution that
minimizes FML.
Outcome may stem from more innocuous issues such as the scaling of the indicators and
the adequacy of the starting values
Starting values affect the minimization process in a number of ways.
Initial values are similar to the final model parameter estimates, fewer iterations are
required to reach convergence.
Starting values are quite different from the final parameter estimates,
Greater likelihood that nonconvergence will occur or that the solution will contain
Heywood cases (e.g., communalities > 1.00, negative error variances).
CFA researcher usually does not need to be concerned about starting values
Most latent variable software programs have incorporated sophisticated methods for
automatically generating these initial estimates
Strategy for selecting starting values varies somewhat, depending on the type of model (e.g.,
multiple-groups solution, a solution with nonlinear constraints)
ML has several requirements that render it an unsuitable estimator in some

circumstances.
ML is more prone to Heywood cases.
ML is more likely to produce markedly distorted solutions if minor misspecifications
have been made to the model.
Key assumptions of ML
1.Sample size is large (asymptotic)
2.Indicators have been measured on continuous scales (i.e., approximate interval-level
data);
3.Distribution of the indicators is multivariate normal.
Estimation of CFA Model Parameters: Illustration
Basic path model is tested with single indicators of behavioral inhibition (x), school
refusal (y), and social anxiety (z) in a group of school-age children (N = 200).
Whether the relationship between behavioral inhibition (x) and school refusal (y) is
fully mediated by social anxiety (z).
Model is somewhat unrealistic (assumes no measurement error in x, y, and z, and does
not conform to the typical strategy for evaluating mediated effects; cf. MacKinnon,
2008)
Simplified nature will foster the illustration of the concepts and calculations
introduced in the preceding and subsequent sections.
Paths between x and z, and z and y, must equal their observed relationships
Given the way that the model is specified (e.g., x and z, and z and y, are linked by direct
effects), full reproduction of their observed covariances (correlations) is guaranteed (for
algebraic proof of this fact, see Jaccard & Wan, 1996).
Model also possesses one nontautological (i.e., overidentified) relationship involving x and y.
Model will generate a unique set of parameter estimates
 A simple tracing rule is used
Predicted correlation (and covariance) between x and y will be the product o the paths
between x and z and between z and y
Estimation of CFA Model Parameters:
Illustration
Model-implied relationship between x and y will not necessarily equal the observed
relationship between these variables.
 Proximity of S to Σ depends entirely on the ability of the path model to reproduce the
observed zero-order relationship between x and y.
Model is thus overidentified with 1 df corresponding to the nontautological relationship
between x and y.
Another way to determine whether the model has 1 df
Take the difference between the number of elements of the input matrix (a = 6 = 3 variances,
covariances) and the number of freely estimated parameters (b = 5 = 2 regressive paths, the
variance of x, and the 2 residual variances of y and z).
Model is overidentified, goodness-of- fit evaluation will apply.

A good-fitting model will be obtained if the model’s parameter estimates reasonably
approximate the observed relationship between x and y.
Model-implied relationship differs considerably from the observed x–y, relationship, a
poor-fitting model will result.
Model-implied correlation between x and y is .30; ryx = (pzx)(pyx) = .6(.5) = .30 (the
predicted covariance of x and y is predicted to be 1.2).
Predicted relationship differs from the observed correlation between x and y (.70), thus
suggesting a poor-fitting model (As shown in Figure 3.8)
SAS PROC IML to calculate the residual matrix (sample matrix minus the predicted matrix) and
FML (Table 3.2)
Relationship between x and y is the only nontautological effect in this model, this is the only
element of the residual matrix that can take on a value other than zero.
Residual correlation and covariance for x and y are .40 and 1.6, respectively.
Calculation of FML on the basis of variance-covariance matrice (Table 3.2)
Same FML value will be obtained if correlation matrices are used (variance–covariance
matrices are often preferred in order to obtain unstandardized solutions and valid standard
errors, and to permit other options such as multiple-groups evaluation).
Fitted model results in an FML value of 0.4054651, reflecting the discrepancy between S and Σ
.
Descriptive Goodness of Fit Indices
Classic goodness-of-fit index is Khi-deux.

Under typical ML model estimation, Khi-deux is calculated As
Latent variable software programs (Mplus, LISREL starting with Version 9.1) increasingly
calculate b by multiplying FML by N instead of N –1.8
Using N, the Figure 3.8 model Khi-deux is 81.093 (0.4054651 * 200).
Model is associated with 1 df, the critical Khi-deux value (a = .05) is 3.84 (i.e., Khi-deux = z2
= 1.962 = 3.8416).
The model Khi-deux of 81.093 exceeds the critical value of 3.84, and thus the null hypothesis
that S = Σ is rejected.
A statistically significant (llatent variable software programs provide the exact probability
value of the model ) supports the alternative hypothesis that S ≠ Σ ,
Meaning that the model estimates do not sufficiently reproduce the sample variances and
covariances (model does not fit the data well).
 Khi-deux is steeped in the traditions of ML and SEM (e.g., it was the first fit index to be
developed)
Rarely used in applied research as a sole index of model fit.
1.Many instances (small N, non-normal data), its underlying distribution is not Khi-deux
distributed (compromising the statistical significance tests of the model Khi-deux);
2.Inflated by sample size (e.g., if N were to equal 100 in the Figure 3.8 model, Khi-deux =
40.55), and thus large-N solutions are routinely rejected on the basis of Khi-deux
3.Based on the very stringent hypothesis that S = Σ .
Descriptive Goodness of Fit Indices: Absolute Fit
Absolute fit indices assess model fit at an absolute level

Evaluate the reasonability of the hypothesis that S = Σ without taking into account other
aspects
Standardized root mean square residual (SRMR).
SRMR can be viewed as the average discrepancy between the correlations observed in the
input matrix and the correlations predicted by the model
Root mean square residual (RMR), reflects the average discrepancy between observed and
predicted covariances.
RMR can be difficult to interpret because its value is affected by the metric of the input
variable
SRMR can be calculated by (1) summing the squared elements of the residual
correlation matrix and dividing this sum by the number of elements in this matrix
(on and below the diagonal)
a = p(p + 1)/2 (Eq. 3.14)
Taking the square root of this result.
SRMR of the Figure 3.8 solution would be computed as follows:
Recommended index from this category is the root mean square error of approximation
(RMSEA)
RMSEA is a population-based index that relies on the noncentral Khi-deux
distribution
Distribution of the fitting function (e.g., FML) when the fit of the model is not
perfect.
The noncentral Khi-deux distribution includes a noncentrality parameter (NCP), which
expresses the degree of model misspecification.
The NCP is estimated as Khi-deux –df (if the result is a negative number, NCP = 0).
Fit of a model is perfect, NCP = 0 and a central Khi-deux distribution holds.
Fit of the model is not perfect

NCP is greater than 0 and shifts the expected value of the distribution to the
right of that of the correspondingcentral Khi-deux
RMSEA is an “error of approximation” index
Assesses the extent to which a model fits reasonably well in the population
(as opposed to testing whether the model holds exactly in the population; cf.
c2).
To foster the conceptual basis of the calculation of RMSEA, the NCP is
rescaled to the quantity d: d = Khi-deux –df/(N).
RMSEA :
RMSEA compensates for the effect of model complexity by conveying

discrepancy in fit (d) per each df in the model.
Sensitive to the number of model parameters
Being a population-based index
RMSEA is relatively insensitive to sample size.
RMSEA = SQRT(.40 /1) = 0.63 where d = (81.093 –1)/ 200 = 0.40.
Rare to see the RMSEA exceed 1.00.

RMSEA values of 0 indicate perfect fit (and values very close to 0 suggest good model
fit).
Non central Khi-deux distribution can be used to obtain confidence intervals for RMSEA
(a 90% interval is typically used).
The confidence interval indicates the precision of the RMSEA point estimate.
Methodologists recommend including this confidence interval when reporting the RMSEA
Unless N is very large, complex models are usually associated with wide RMSEA
confidence intervals).10
Browne and Cudeck (1993) have developed a statistical test of closeness of

model fit using the RMSEA.
Close fit (CFit) is operationalized as RMSEA values less than or equal to .05.
This one-sided test appears in the output of most software packages as the
probability value that RMSEA ≤ .05.
Some methodologists have argued for stricter guidelines ( p > .50; Jöreskog &
Sörbom, 1996a).
Descriptive Goodness of Fit Indices: Comparative Fit
Comparative fit indices (incremental fit indices; e.g., Hu & Bentler, 1998)
Evaluate the fit of a user-specified solution in relation to a more restricted, nested baseline
model.
Baseline model is a “null” or “independence” model in which the covariances among all
input indicators are fixed to zero
Comparative fit indices often look more favorable
Some indices from this category have been found to be among the best behaved of the host
of indices that have been introduced in the literature.
Comparative fit index (CFI; Bentler, 1990) is computed as follows:
Elements definition
T is the Khi-deux value of the target model (i.e., the model under evaluation);
dfT is the df of the target model;
 Khi-deux B is the Khi-deux value of the baseline model (i.e., the “null” model)
DfB is the df of the baseline model.
Max indicates to use the largest value—for example, for the numerator, use (Khi-deux
T –dfT) or 0, whichever is larger.
The Khi-deux B and dfB of the null model are included as default output in most
software programs.
CFI has a range of possible values between zero and one

Values closer to one implying good model fit.
CFI is based on the NCP (i.e., l = Khi-deux T –dfT)
Uses information from expected values of Khi-deux T or Khi-deux B (or both, in the case of
the CFI) under the noncentral Khi-deux distribution associated with S ≠ Σ
Using the results of the Figure 3.8 model:
CFI = 1 –[( 81.093 –1) / (227.887 –3)] =0 .0644
Another popular and generally well-behaved is the Tucker–Lewis index (TLI; Tucker &
Lewis, 1973)
TLI has features that compensate for the effect of model complexity
TLI includes a penalty function for adding freely estimated parameters that do not
markedly improve the fit of the model.
TLI is calculated by the following formula:
Elements definition
Khi-deux T is the Khi-deux value of the target model (i.e., the model under evaluation)
dfT is the df of the target model;
Khi-deux B is the Khi-deux value of the baseline model (i.e., the “null” model); and dfB is
the df of the baseline model.
TLI is non-normed
Its values can fall outside the range of zero to one.
Values approaching one are interpreted in accord with good model fit.
TLI for the Figure 3.8 solution is
TLI = [(227.877 / 3) –( 81.093 / 1)] / [(227.877 / 3) –1] = –0.068
Obtaining a Solution for a Just-Identified
Factor Model
Factor analysis typically entails matrix algebra and other mathematical

operations that are very cumbersome to demonstrate and conduct by hand
calculation.
Just-identified factor model such as the one depicted in Figure 3.6B
Be calculated on the basis of principles discussed in this chapter and the help
of the algebra of simultaneous equations.
Factor Model
Factor Model
a = λx11, b = λx21, and c = λx31

Number of knowns (6 elements of the input matrix) equals the number of unknowns (6
parameters = 3 factor loadings, 3 errors; the factor variance is fixed to 1.0).
Model is just-identified
Its parameter estimates will perfectly reproduce the input matrix.
Two indicators loading on the same factor, multiplying their factor loadings provides the
model estimate of their zero-order correlation.
Model is just-identified
Products of the loadings will perfectly reproduce the zero-order relationships among X1, X2,
and X3.
Factor Model
Systems of equations are as follows:

Equation 1. ab = .595
Equation 2. ac = .448
Equation 3. bc = .544
Problem is also just-identified
There are 3 unknowns (a, b, c) and 3 knowns (the 3 equations).
Systems of equations can be solved in various ways (substitution, elimination, matrices).
Factor Model
Step 1: Rearrange Equations 1 and 2, so that b and c are the outputs.
1. Equation 1. ab = .595b = .595/a
2. Equation 2. ac = .448c = .448/a
3. Equation 3. bc = .544
Step 2: Substitute Equations 1 and 2 in Equation 3.
1. Equation 3. bc = .544
2. (.595/a)(.448/a) = .544
Step 3: Solve Equation 3.
Equation 3. (.595/a)(.448/a) = .544
.26656/a2 = .544
.26656/.544 = a2
.49 = a2
.70 = a
Obtaining a Solution for a Just-Identified Factor Model
Step 4: Now that we know that a = .70, it is straightforward to solve for b and c,
using the original equations.
1. Equation 1. = .595/.70 = b = .85
2. Equation 2. = .448/.70 = c = .64
Factor loadings are .70, .85, and .64, for λx11, λx21, and λx31
 Multiplying these loadings together reproduces the input correlations
perfectly: 70(.85) = .595, .70(.64) = .448, .85(.64) = .544.
Step 5: Because this is a completely standardized solution,

Errors for X1, X2, and X3 can be obtained by squaring the loadings and
subtracting the result from 1.0.
1.d1 = 1 –0.49 = .51
2.d2 = 1 –0.72 = 0.28
3.d3 = 1 –0.41 = 0.59
Every unknown parameter in the factor model has been solved,

Solution is as follows:
Chap. 4
Fiabilité et Validité d’une échelle de mesure
Alpha de Cronbach : Reliability of measurement
Is widely used as a measure of reliability.
Its connection to the definition of reliability may be less evident than is the case for other
measures of reliability
Alpha may appear more mysterious than other reliability computation methods to those
who are not familiar with its internal working.
An exploration of the logic underlying the computation of alpha provides a sound basis for
comparing how other computational methods capture the essence of what we mean by
reliability.
We will concentrate on the more general form that applies to items having multiple
response options.
Reliability of measurement
You can think about all the variability in a set of item scores as due to one of two things:
1. Actual variation across individuals in the phenomenon that the scale measures (i.e., true
variation in the latent variable)
2. Error.
Classical measurement models
 Phenomenon definition (e.g., patients’ desire for control of interactions with a physician) as
the source of all shared variation,
 Error definition as any remaining, or unshared, variation in scale scores (e.g, a single
item’s unintended double meaning)
Reliability of measurement
Another way to think about this is to regard total variation as having two components
1. Signal (i.e., true differences in patients, desire for control)
2.Noise ( i.e., score differences caused by everything but true differences in desire for control).
Computing alpha
Partitions the total variance among the set of items into signal and noise components.
The proportion of total variation that is signal equals alpha.
Reliability of measurement : the covariance Matrix
To understand internal consistency more fully, it helps to examine the

covariance matrix of a set of scale items.
A covariance matrix for a set of scale items reveals important information
about the scale as a whole.
A covariance matrix is a more general form of a correlation matrix. In a
correlation matrix, the data have been standardized, with the variances set to
1.0.
In a covariance matrix , the data entries are unstandardized, thus , it contains
the same information, in unstandardized form, as a correlation matrix.
The diagonal elements of a covariance matrix are variances-covariance’s of

items with themselves-just as the unities along the main diagonal of a correlation
matrix are variables variances standardized to 1.0 and also their correlation with
themselves.
Its off-diagonal values are covariance’s, expressing relationships between
pairs of unstandardized variables just as correlation consists of (a) variances
( on the diagonal) for individual variables and (b) covariance’s ( off the
diagonal) representing the unstandardized relationship between variable pairs.
A typical covariance matrix for three variables ( x 1 , x2 and X3) is shown in
table 3.1.
X1 X2 X3
X1 Var1 Cov1.3 Cov1.3

X2 Cov1.2 Var2 Cov2.3
X3 Var1.3 Cov2.3 Var3
σ2 1 σ1,2 σ1,3
σ1,2 σ2, 2 σ2,3
σ1,3 σ2,3 σ2 3
The same matrix is displayed somewhat more compactly using the customary
symbols for matrices, variance’s, and covariance’s
Let us focus our attention on the properties of a covariance matrix for a set of
items that, when added together, make up a scale.
The covariance matrix presented above has three variables X1, X2, and X3.
Assume that these variables are actually scores for three items and that the
items (X1, X2 and X3) when added together make up a scale we will come Y.
What can this matrix tell us about the relationship of the individual items to the
scale as a whole?
A covariance matrix has a number of interesting (well, useful, at least)

properties.
Among these is the fact that adding all the elements in the matrix together
(i.e., summing the variances, which are along the diagonal, and the
covariance’s, which are off the diagonal) gives a value that is exactly equal to
the variance of the scale as a whole, assuming that the items are equally
weighted.
So if we add all the terms in the symbolic covariance matrix :
σ2 1 σ1,2 σ1,3
σ1,2 σ2, 2 σ2,3
σ1,3 σ2,3 σ2 3
The variance of a scale (Y) made up of three equally weighted items (x1, x2 ,
x3) has the following relationship to the covariance matrix of the items : σσ2 Y
= C2 1
σ2 1
Reliability of measurement : Alpha and the covariance
Matrix
Alpha
Defined as the proportion of a scale’s total variance that is attributable to a
common source, presumably the true score of a latent variable underlying the
items.
Compute alpha
it would be useful to have a value of the scale’s total variance and a value for
the proportion that is « common variance ».
The covariance matrix is just what we need in order to do this.
Matrix
Matrix
All the variation in items that is due to the latent variable Y is shared or
common
The terms joint and communal are also used to describe this variation
When Y varies (as it will, for example, accross individuals having different
levels of the attribute it represents), scores of all the items will vary with it
because its is a cause of those scores.
If Y is high , all the item scores will tend to be high , if Y is low, they will
tend to be low.
Matrix
This means that the items will tend to vary jointly (i.e., be correlated with one
another).
So the latent variable affects items and, thus, they are correlated.
The error terms, in contrast are the unique variation that each item possesses.
Whereas all items share variability due to Y, no two share any variation from
the same error source under our classical measurement assumptions.
The value of a given error term affects the only one item.
Matrix
Matrix
The total variance of the scale (σ2Y ) defined as the sum of all elements in the
matrix
The sum of the individual item variances (σ2X ) computed by summing entries
along the sum main diagonal.
These two values can be given a conceptual interpretation.
The sum of the whole matrix is, by definition, the variance of Y, the scale
made up of the individual items.
Matrix
All the variances (diagonal elements) are single-variable or “variable-with themselves.
Each variance contains information about only one item.
Each represents information that is based on a single item not joined variation shared
among items
The item’s variance does not quantify the that item, irrespective of what causes it
The off-diagonal elements of the covariance matrix all involve pairs of terms and, thus,
common (or joint) variation between two of the scale’s items (covariation).
The element in the covariance matrix ( and, hence, the total variance of Y) consist of
covariation (joint variance, if you will) plus “ non joint” or noncommunal” variation
concerning items considererd individually.
Reliability of measurement : Alpha and the covariance Matrix
Covariance’s and only the covariance’s represent communal variation,
All noncommunal variation must be represented in the variances along the main diagonal
of the covariance matrix
The term Σσ2Y , the total variance, of course, is expressed by the sum of all the matrix
elements.
The ratio of non joint variation to total variation in Y as
Σσ21/ σ2Y
This ratio corresponds to the sum of the diagonal values in the covariance matrix. It thus
follows that we can express the proportion of joint, or communal, variation as what is left
over-in other words, the complement of this value as shown:
1- (Σσ21/ σ2y )
Matrix
This value corresponds to the sum of all the off-diagonal values of the
covariance matrix.
The formula involving subtraction from 1 is a legacy of the days when
computers were not available to do calculations,
Computing the total variance for Y and the variance for each
individual item (i) were probably operations that had already been done
for other purposes.
Matrix
Even if there were no need to calculate these variances for other purposes,
consider the computational effort involved.
A formula that quantifies communal variance as what remains after
removing noncommunal from total variance makes more practical sense
than it might first appear to.
The value represented by the formula Σσ21/ σ2y ) or, equivalently, Σσ2ij/ σ2y
First would seem to capture the definition of alpha ( i.e. the communal
portion of total variance in a scale that can be attributed to the items ‘s
common source, of the latent variable ) we still need one more correction
Matrix
The correlation matrix in this instance would consist of a 5 x 5 with all values equal
to 1.0.
The denominator of the preceding equation, representing the total variance of the
scale comprising the five items, would thus equal 25.
The numerator, however, would equal only 20, thus yielding a reliability of 20/25
(or .80) rather than 1.0. Why is this so?
The total number of elements in the covariance matrix is k2.
The number of elements in the matrix that are noncommunal
The number that are communal (all those not on the diagonal) is k2 - k.
The fraction in our last formula has a numerator based on k - k values and a
denominator based on k values.
Matrix
To adjust our calculations so that the ratio expresses the relative magnitudes
rather than the numbers of terms that are summed in the numerator and
denominator,
We multiply the entire expression representing the proportion of communal
variation by values to counteract the differences in numbers of terms
summed.
To do this, we multiply by k2/ (k2 - k), or, equivalently, k / (k - 1).
This limits the range of possible values for alpha to between 0.0 and 1.0.
It should soon become apparent that k / (k - 1) is always the multiplier that will
yield an alpha of 1.0 when the items are all perfectly correlated.
 α = k/k-1/(1- (Σσ21/ σ2y )
Matrix
To summarize
A measure's reliability equals the proportion of total variance among its items that is due
to the latent variable.
The formula for alpha expresses this by specifying the portion of total variance for the
item set that is unique,
Subtracting this from 1 to determine the proportion that is communal, and multiplying
by a correction factor to adjust for the number of elements contributing to the earlier
computations.
Alternate-forms Reliability
If two strictly parallel forms of a scale exist, then the correlation between
them can be computed as long as the same people complete both
parallel forms.
Assume that a researcher first developed two equivalent sets of items
measuring patients' desire for control when interacting with physicians,
Administered both sets of items to a group of patients,
Correlated the scores from one set of items with the scores from the
other set.
This correlation would be the alternate-forms reliability.
Alternate-forms Reliability
Recall that parallel forms are made up of items, all of which (either within
or between forms) do an equally good job of measuring the latent variable.
Both forms of the scale have identical alphas, means, and variances
and measure the same phenomenon.
Parallel forms consist of one set of items that have more or less
arbitrarily been divided into two subsets that make up the two parallel,
alternate forms of the scale.
Under these conditions, the correlation between one form and the other is
equivalent to correlating either form with itself, as each alternate form is
equivalent to the other.
Split-Half Reliability
We usually do not have two versions of a scale that conform strictly to
the assumptions of parallel tests.
Alternate forms are essentially of a single pool of items
Take the set of items that make up a single scale (i.e., a scale that doe
have any alternate form),
Divide that set of items into two subset.
Correlate the subsets to assess reliability.
Split-Half Reliability
A reliability measure of this type is called a split-half reliability.

Split-reliability is really a class rather than a single type of
computational me because there are a variety of ways in which the scale
can be split in half.
Method is to compare the first half of the items with the second half
If the items making up the scale question were scattered throughout a
lengthy questionnaire, the respond might be more fatigued when
completing the second half of the scale.
Reliability of measurement : pratical example
Voici un exemple fictif (pour simplifier les calculs) de résultats obtenus à une
échelle d’anxiété.
Les scores à chaque item de l’échelle pouvant varier de 1 à 3, il est
impossible de calculer l’indice de fidélité par la formule du kr-20.
Nous devons donc calculer l’alpha de Cronbach.
Sujets Item1 Item2 Item3
1 2 3 2
2 1 1 2
3 2 2 2
4 3 3 3
5 1 1 1
Sujets Item1 Item2 Item3 Total X X2
1 2 3 2 7 49
2 1 1 2 4 16
3 2 2 2 6 36
4 3 3 3 9 81
5 1 1 1 3 9
Somme 29 191
Somme des I 9 10 10
Somme des I2 19 24 22
SI2 .59 .80 .40
Calculez la variance des scores totaux et les variance de chaque item en utilisant
la formule suivante :
S2X = ( ∑x2- (∑x)2/N)/N
Pour la variance des scores au test :
S2X = (191-(29)2/5)/5
S2X = ( 191-168,2)/5
S2X= 4,56
Calculer la variance de chacun des item en utilisant la même formule. Voici le calcul pour la
variance de l’item 1
Variance de l’item1
S2X = (19-(9)2/5)/5
S2X = .56
Variance de l’item 2
S2X = (24-(10)2/5)/5
S2X = .80
Variance de l’item 3
S2X = (22-(10)2/5)/5
S2X = . 40
En additionnant les variances des 3 items on obtient :
∑i=1j S2i = . 56+.80+.40= 1.76
Utilisons la formule du alpha en y subsituant les valeurs déjà calculées (j étant
le nombre d’items):
α = (j/j-1) (S2X - ∑i=1j S2i )/ S2X
α = (3/3-1) (4,56 – 1.76)/ 4.56
α = (3)/3-1 (.614)
α = ,921
Case Processing Summary

N %
Cases Valid 5 100,0
Excludeda 0 ,0
Total 5 100,0
a. Listwise deletion based on all variables in the procedure.
Reliability Statistics
Cronbach's Alpha N of Items
,921 3
Item Statistics
Mean Std. Deviation N
ITEM1 1,80 ,837 5
ITEM2 2,00 1,000 5
ITEM3 2,00 ,707 5
Item-Total Statistics
Scale Mean if Scale Variance if Corrected Item- Cronbach's Alpha

Item Deleted Item Deleted Total Correlation if Item Deleted
ITEM1
4,00 2,500 ,945 ,800
ITEM2
3,80 2,200 ,843 ,909
ITEM3
3,80 3,200 ,791 ,938
Validity
Validity is an overused term.

Sometimes, it is used to mean “true” or “correct.”
There are several general types of validity.
When we say that an indicator is valid, it is valid for a particular purpose and
definition.
The same indicator may be less valid or invalid for other purposes.
The measure of morale discussed above (e.g., questions about feelings toward
school) might be valid for measuring morale among teachers but invalid for
measuring morale among police officers.
Validity
Measurement validity tells us how well the conceptual and operational

definitions mesh with one other
The better the fit, the higher is the measurement validity.
Validity is more difficult to achieve than reliability.
We cannot have absolute confidence about validity, but some measures are
more valid than others.
The reason is that constructs are abstract ideas, whereas indicators refer to
concrete observation.
Validity
Validity is part of a dynamic process that grows by accumulating evidence over

time, and without it, all measurement becomes meaningless.
Some researchers use rules of correspondence (discussed earlier) to reduce the
gap between abstract ideas and specific indicators.
A teacher who agrees with statements that “things have gotten worse at this
school in the past years” and that “there is little hope for improvement” is
indicating low morale.
Validity : face validity
Is the most basic and easiest type of validity to achieve.
It is a judgment by the scientific community that the indicator really measures the construct.
It addresses the question:
On the face of it, do people believe that the definition and method of measurement fit?
For example, few people would accept a measure of college student math ability by asking
students what 2 + 2 equals.
This is not a valid measure of college-level math ability on the face of it.
Recall that the principle of organized skepticism in the scientific community means that
others scrutinize aspects of research.
Validity: content validity
Content validity addresses this question
Is the full content of a definition represented in a measure?
A conceptual definition holds ideas; it is a “space” containing ideas and concepts.
Measures should sample or represent all ideas or areas in the conceptual space.
Content validity involves three steps.
Specify the content in a construct’s definition.
Sample from all areas of the definition.
Develop one or more indicators that tap all of the parts of the definition.
 Examples
Validity: content validity
Let us consider an example of content validity.

I define feminism as a person’s commitment to a set of beliefs creating full equality between
men and women in areas of the arts, intellectual pursuits, family, work, politics, and
authority relations.
I create a measure of feminism in which I ask two survey questions:
1.Should men and women get equal pay for equal work?
2.Should men and women share household tasks?
 My measure has low content validity because the two questions ask only questions about some
areas
Validity: Criterion validity
Criterion validity uses some standard or criterion to indicate a construct accurately.

The validity of an indicator is verified by comparing it with another measure of the same construct
in which a researcher has confidence.
The two subtypes of this type of validity are concurrent and predictive
Concurrent validity
We need to associate an indicator with a preexisting indicator that we already judge to be valid (i.e.,
it has face validity).
For example, we create a new test to measure intelligence.
For it to be concurrently valid, it should be highly associated with existing IQ tests (assuming the
same definition of intelligence is used).
Validity: Criterion validity
This means that most people who score high on the old measure should also
score high on the new one, and vice versa.
The two measures may not be perfectly associated, but if they measure the
same or a similar construct, it is logical for them to yield similar results.
Construct Validity
Construct validity is for measures with multiple indicators.

It addresses this question: If the measure is valid, do the various indicators
operate in a consistent manner?
It requires a definition with clearly specified conceptual boundaries.
The two types of construct validity are convergent and discriminant.
Validity Convergent validity
Convergent validity applies when multiple indicators converge or are
associated with one another.
It means that multiple measures of the same construct hang together or
operate in similar ways.
I measure the construct “education”:
 by asking people how much education they have completed, looking up school
records, and asking the people to complete a test of school knowledge.
If the measures do not converge (i.e., people who claim to have a college degree
but have no records of attending college, etc)
My measure has weak convergent validity, and I should not combine all three
indicators into one measure.
Validity: Convergent validity
Convergent validity tested by two methods.
1.Inspected the value of the average variance extracted for each dimension.
The AVE has recommended values of .50 or higher to provide evidence of for convergent
validity (Walsh and Beatty, 2007).
Average variance extracted for each five dimension of customer satisfaction is greater
than .50 (Fornell and Lacker, 1981), hence indicating convergent validity.
Convergent validity tested by comparing the t-values for all items with the standard norm of
1.96.
1.All 17 items exhibit t-values well above the 1.96 standard (Anderson and Gerbing, 1988).
1.
Validity : discriminant validity
The opposite of convergent validity

Indicators of one construct “hang together,” or converge, but also are
negatively associated with opposing constructs.
Discriminant validity says that if two constructs A and B are very different,
measures of A and B should not be associated.
I have ten items that measure political conservatism. People answer all ten in
similar ways.
I also put five questions that measure political liberalism on the same
questionnaire.
My measure of conservatism has discriminant validity if the ten conservatism
items converge and are negatively associated with the five liberalism ones.
Model Dimensional Adjustment Indices
Numbers CFI IFI GFI AGFI RMSEA
1 5 .95 .95 .95 .95 .05
2 4 .90 .90 .91 .88 .07
3 3 .88 .88 .90 .86 .08
4 2 .87 .87 .89 .89 .08
5 1 .65 .65 .75 .68 1.14

Main AVE 1 2 3 4 5
dimensions
SOCIAL .63 1
ACCESSIBILIT .76 .06*(.004) 1

Y
TANGIBLE .52 .14**(.02) .24**(.06) 1
PRICE .72 .14**(.02) .50**( .25) .32**(.10) 1

CONDITIONS
RELIABILITY .64 .12**(.01) .32**( .10) .39**(.15) .39*(.15) 1
Average 2.68 4.73 5.47 4.32 5.57
Standard 1.24 1.67 1.27 1.50

Validity: nomoligical validity
To establish nomological validity,
Examine how well a scale relates to other variables.
CBR scale
Four customer outcome scales measuring customer satisfaction, loyalty, trust, and word of
mouth.
These four measures are expected to be positively associated with corporate reputation.
To show a measure has nomological validity,
Correlation between the measure and other related constructs should behave as expected in
theory (Churchill 1995)
Nomological validity (Balemba, 2017)

Structural model linking customer satisfaction, as measured by the average index of the
five dimensions, loyalty (α = .83), customer retention (α =. 80), and the intention to leave
MFIs (α = .92).
Direct positive relationship between customer satisfaction and loyalty, which in turn leads
to retention (Heskett et al., 2008).
Direct positive relationship between customer satisfaction and retention (Bugel, Verhoef,
& Buunk, 2011)
Direct negative relationship between retention and intention to leave the institution
(Athanassopoulos et al., 2001)
Estimating a structural model that connects employee job satisfaction with its antecedents and
consequences (Walsh and Beatty, 2007).
Internal marketing and perceived organisational performance as the main employee job
satisfaction antecedent and consequence in reference to both a service-profit chain model
(Heskett et al., 2008) and equity theory (Schneider et al., 1985).
There is a positive relationship between internal service quality, EJS, firm profitability and
revenue growth (Gelade and Young, 2005).
Illustrations pratiques
Bugandwa, C. T., K. E Balemba, A. D. Bugandwa Mungu, N. A. and M. B Haguma

(2021), Linking Corporate Social Responsibility to Trust in the Banking sector:
exploring disaggregated relations, International Journal of Bank Marketing, 39 (4):
592-617
Cubaka P, Balemba EK, Bugandwa D et Labie M (2019), Appropriation des
coopératives d’épargne et de crédit par leurs membres à Bukavu: mesure et
déterminants, Mondes en Développement, 47 (188) : 127-148
Haguma B, Balemba EK et Bitakuya W (2019), Relation entre la microfinance et la
performance perçue des PME à Bukavu: rôles modérateur et médiateur de
l’opportunité entrepreneuriale et la prise de risque, Finance, Contrôle et Stratégie, 22
(4), 1-41
Illustrations pratiques
1. Balemba EK (2017), Customer Satisfaction with Services of Microfinance

Institutions: Scale Development and Validation, Strategic Change:
Brifiengs in Entrepreneurial Finance, 26 (6): 563-574
2. Balemba EK and Bugandwa DM (2016), Internal Marketing, Employee Job
Satisfaction and Perceived Organizational performance in microfinance
institutions, International Journal of Bank Marketing, 34 (5): 773-796
Chap 4
Structural Equation Models

Introduction
Structural Equation Modeling (SEM) :

Statistical methodology that takes a confirmatory (i.e., hypothesis-testing)
approach to the multivariate analysis of a structural theory bearing on some
phenomenon.
The theory represents “causal” processes that generate observations on
multiple variables (Bentler, 1988).
Introduction
The term structural equation modeling conveys two important aspects of the
procedure:
1.The causal processes under study are represented by a series of structural (i.e.,
regression) equations
2.The structural relations can be modeled pictorially to enable a clearer
conceptualization of the theory under study.
The hypothesized model can then be tested statistically in a simultaneous analysis of
the entire system of your variables to determine the extent-to which it is consistent with
the data.
 If goodness-of-fit is adequate, the model argues for the plausibility of postulated
relations among variables; if it is inadequate, the tenability of such relations is
rejected.
Introduction
Several aspects of SEM set it apart from the older generation of multivariate procedures (see
Fornell, 1982):
1.It takes a confirmatory, rather than an exploratory, approach to the data analysis (although
aspects of the latter can be addressed).
By demanding that the pattern of intervariable relations be specified a priori, SEM lends
itself well to the analysis of data for inferential purposes.
By contrast, most other multivariate procedures are essentially descriptive by nature (e.g.,
exploratory factor analysis), so that hypothesis testing is difficult, if not impossible.
Whereas traditional multivariate procedures are incapable of either assessing or correcting for
measurement error, SEM provides explicit estimates of these parameters.
Whereas data analyses using the former methods are based on observed measurements only,
those using SEM procedures can incorporate both unobserved (i.e. latent) and observed
variables.
Introduction
SEM has become a popular methodology for nonexperimental research, where

methods for testing theories are not well developed and ethical considerations
make experimental design unfeasible (Bentler, 1980).
Structural equation modeling can be utilized very effectively to address
numerous research problems involving nonexperimental research
Introduction: latent Versus Observed Variables
In the behavioral sciences, researchers are often interested in studying

theoretical constructs that cannot be observed directly.
These abstract phenomena are termed latent variables, or factors.
Examples of latent variables in psychology, sociology, education , economics
Self-concept and motivation;
Powerlessness and anomie;
Verbal ability and teacher expectancy;
Capitalism and social class.
Introduction: latent Versus Observed Variables
Latent variables are not observed directly, it follows that they cannot be measured
directly.
The researcher must operationally define the latent variable of interest in terms of
behavior it.
The unobserved variable is linked to one that is observable, thereby making its
measurement possible.
Assessment of the behavior constitutes the direct measurement of an observed variable,
 The indirect measurement of an unobserved variable (i.e., the underlying construct).
The term behavior is used here in the very broadest sense to include scores on a
particular measuring instrument.
Introduction
Observation may include self-report responses to an attitudinal scale, scores on

an achievement test,
These measured scores (i.e., measurements) are termed observed or manifest
variables;
They serve as indicators of the underlying construct that they are presumed
to represent
Introduction
The researcher examines the covariation among a set of observed variables in

order to gather information of their underlying latent constructs (i.e., factors).
There are two basic types of factor analyses
1.Exploratory factor analysis (EFA)
2.Confirmatory factor analysis (CFA).
Introduction
EFA is designed for the situation where links between the observed and latent
variables are unknown
The analysis thus proceeds in an exploratory mode to determine how, and
to what extent the observed variables a linked to their underlying factors.
The researcher wishes to identify the minimal number of factors that underlie
(or account for) covariation among the observed variables.
This factor analytic approach is considered to be exploratory in the sense that
the researcher has no prior knowledge that the items do, indeed, measure
the intended factors.
Introduction
CFA is appropriately used when the researcher has some knowledge of the
underlying latent variable structure.
Based on knowledge of the theory, empirical research, or both, he or she
postulates relations between the observed measures and the underlying
factors a priori, and then the other sized structure statistically.
Accordingly, a priori specification of the CF A model would allow all sport
competence self-concept items to be free to load on that factor, but restricted
to have zero loadings on the remaining factors.
The model would then be evaluated by statistical means to determine the
adequacy of its goodness of fit to the sample data.
Introduction
The factor analytic model (EFA or CF A) focuses solely on how, and the
extent which the observed variables are linked to their underling latent-factors.
It is concerned with the extent to which the observed variables are generated
by the underlying latent constructs and thus strength of the regression paths
from the factors to the observed variables (the factor loadings) are of
primary interest.
The CFA model focuses solely on the link between factors and their
measured variables, within the framework of SEM, it represents what has
been termed a measurement model.
Introduction : the full latent Variable Model
The full latent variable (LV) model allows for the specification of the regression structure
among the latent variables.
The researcher can hypothesize the impact of one latent construct on another in the modeling
of causal direction.
This model is termed full (0 complete) because it comprises both a measurement model and
a structure model;
The measurement model
Depicting the links between the latent variables and their observed measures (i.e., the CF A
model)
The structural model
Depicting the links among the latent variables themselves.
Introduction: the full latent Variable Model
Statistical models provide an efficient and convenient way of describing the

latent structure underlying a set of observed variables.
Expressed either diagrammatically or mathematically via a set of equations
 such models explain how the observed and latent variables are related to one
another.
A researcher postulates a statistical model based on his or her knowledge of
the related theory, on empirical research in the area of study, or some
combination of both.
Once the model is specified, the researcher then tests its plausibility based
on sample data that comprise all observed variables in the model.
The primary task in this model-testing procedure is to determine the

goodness of fit between the hypothesized model and the sample
The researcher imposes the structure of the hypothesized model on the
sample data,
Tests how well the observed data fit this restricted structure.
It is highly unlikely that a perfect fit will exist between the observed data
and the hypothesized model,
There will necessarily be a discrepancy between the two
This discrepancy is termed the residual.
Data = Model + Residual

Data
Represent score measurements related to the observed variables derived from persons
comprising the sample
Model
Represents the hypothesized structure linking the observed variables- the latent
variables, and in some models, linking particular latent Variables to one another
Residual
Represents the discrepancy between the hypothesized model and observed data
The general strategic framework for testing structural equation models,

Jöreskog (1993) distinguished among three scenarios
The termed strictly confirmatory models
Alternative models (AM),
Models generating (MG).
The termed strictly confirmatory

The research postulates a single model based on theory, collects the
appropriate data, and then tests the fit of the hypothesized model to the
sample data.
From the results of this test, the researcher either rejects or fails to reject the
mode..
No further modifications to the model are made.
Alternative models (AM),

The researcher proposes several alternative/rival (i.e., competing) models, all of
which are grounded in theory.
Following analysis of a single set of empirical data he or she selects one model as
most appropriate in representing the sample data.
Model generating (MG).
Represents the case where the researcher having postulated and rejected a
theoretically derived model on the basis its poor fit to the sample data,
Proceeds in an exploratory (rather than Confirmatory) fashion to modify and
reestimate the model.
The prima focus here is to locate the source of misfit in the model and to determine
model that-better describes the sample data.
Introduction
MG situation to be the most common of the three scenarios, and for good
reason.
Given the many costs associated with the collection of data, it would be a
rare researcher indeed who could afford to terminate his or her research on
the basis of a rejected hypothesized model
The SC scenario is not commonly found in practice.
Concepts fondamentaux
Les chargés d'études marketing doivent souvent répondre à des questions liées les unes
aux autres.
Une entreprise spécialisée dans les services peut par exemple être confrontée aux
questions suivantes :
Quelles sont les variables permettant de déterminer la qualité de service ?
En quoi la qualité de service exerce-t-elle une influence sur l'attitude et la satisfaction à
l'égard du service ?
De quelle manière la satisfaction encourage-t-elle l'intention de

recommandation ?
Comment l'attitude vis-à-vis des services se combine-t-elle à d'autres variables
pour influer sur l'intention de recommandation ?
Les outils statistiques traditionnels que nous connaissons ne suffisent pas à
fournir une analyse simple susceptible de répondre à l'ensemble de ces questions
C'est pourquoi les chercheurs doivent recourir à la modélisation par équations

structurelles (SEM).
La SEM aide à évaluer les propriétés des mesures et à tester les relations
théoriques envisagées grâce à une technique unique.
En s'appuyant sur la théorie et les recherches antérieures, il est par exemple
possible de supposer que la qualité de service compte cinq critères ou
facteurs : tangibilité, fiabilité, réactivité, assurance et empathie.
On peut décrire la qualité de service comme un construit latent qu'il n'est pas
possible d'observer ou de mesurer directement, mais qui peut être représenté par
le biais de cinq critères observés ou mesurés.
La SEM permet d'établir la part jouée par chacune des dimensions retenues
dans la représentation de la qualité de service.
On évalue à quel point l'ensemble des variables observées pour mesurer ces
critères est représentatif de la qualité de service.
La SEM établit la fiabilité du construit, et cette information aide ensuite à
estimer les relations entre la qualité de service et les autres construits.
La qualité de service exerce une influence directe et positive à la fois sur
l'attitude et la satisfaction à l'égard des services.
Ces deux critères déterminent à leur tour l'intention de recommandation.

L'attitude et la satisfaction à l'égard des services sont donc des variables
dépendantes et indépendantes de la théorie.
Une variable dépendante hypothétique (attitude/satisfaction) peut devenir une
variable indépendante dans le cadre d'une relation de dépendance (expliquer
les intentions de recommandation).
La SEM s'intéresse à la structure des relations de corrélation qui apparaissent dans les
séries d'équations structurelles.
Ce concept est comparable à l'estimation de séries d'équations de régression multiple.
Ces équations modélisent l'ensemble des relations qui existent entre construits, qu'ils
soient dépendants ou indépendants.
Les construits ne sont pas observables soit il s'agit de facteurs latents représentés par
des variables multiples.
Cette méthode se rapproche de l'analyse factorielle où les facteurs sont représentés
par des variables à la différence que la SEM tient explicitement compte de l'erreur de
mesure.
 Cette erreur indique le degré d'incapacité des variables observées à décrire la
pertinence des construits latents.
Représentation des construits en tant que facteurs inobservables ou latents dans des relations de
dépendance.
Estimation des relations de corrélation multiple dans un modèle intégré.
Incorporation des erreurs de mesure.
 La SEM rend compte explicitement du manque de fiabilité des variables observées et propose
une analyse des atténuations et des estimations envisageables à partir des erreurs de mesure.
Explication des covariances présentes parmi les variables observées.
La SEM s'efforce de présenter des hypothèses à partir des moyennes, de la variance et des
covariances relevées sur les données observées ramenées à un nombre restreint de paramètres
structurels définis par un modèle hypothétique sous-jacent.
La SEM est également appelée analyse des structures de covariance, analyse
des variables latentes ou modélisation causale.
Ne permet pas à elle seule d'établir des relations de causalité même si elle
facilite ce processus.. Cette technique est le plus souvent utilisée dans une
optique de confirmation plutôt que d'exploration.
Elle sert en général à déterminer la validité d'un modèle donné plutôt qu'à «
trouver » un modèle adapté. Ceci étant, les analyses SEM comprennent
fréquemment un volet exploratoire.
Notions statistiques associées à la SEM
Construit
Un construit est un concept latent ou inobservable pouvant être conception.
Impossible à mesurer directement ou sans faire d'erreur.
Également appelé facteur, le construit se mesure à l'aide d'indicateurs multiples ou de variables
observées.
Erreur de mesure
Niveau d'incapacité des variables observées à décrire les construits latents pertinents pour la SEM.
Indices absolus.
Ces indices évaluent la qualité ou la médiocrité de l'ajustement ces mesures et des modèles structurels.
Un bon ajustement se caractérise par des valeurs d'ajustement élevées et des valeurs de non-ajustement
faibles.
Variance moyenne partagée.

Mesure utilisée pour évaluer la validité convergente et la validité discriminante, définie comme
la variance des indicateurs ou des variables observées pouvant s'expliquer par le construit latent.
Test du khi-deux (Δx2).
Outil statistique utilisé pour comparer deux modèles imbriqués concurrents.
 Il s'agit de calculer la différence entre les valeurs khi-deux de chaque modèle.
Le nombre de ses degrés de liberté est égal à la différence entre les degrés de liberté des deux
modèles.
Fiabilité des construits.
Variance réelle des résultats par rapport à la variance totale.
La fiabilité des construits rejoint donc la notion conventionnelle de fiabilité des théories
statistiques classiques.
Analyse factorielle confirmatoire (CFA).

Technique servant à estimer le modèle de mesure, afin de confirmer que le nombre de facteurs
(ou de construits) retenus et l'influence des variables observées (indicateurs) s'avèrent
conformes aux attentes théoriques.
Les variables sont sélectionnées sur une base théorique et l'analyse factorielle confirmatoire
permet de voir si elles contribuent effectivement à influencer certains facteurs.
Variable endogène.
Équivalent d'une variable dépendante latente comptant plusieurs éléments.
 Elle est déterminée par les construits ou les variables internes au modèle et dépend donc des
autres construits.
Matrice de covariance estimée. Symbolisée par Σk.
Covariances envisagées parmi toutes les variables observées suite aux équations proposées par
la SEM.
Variable exogène.
Équivalent d'une variable indépendante latente comptant plusieurs éléments
dans une analyse multi variée traditionnelle.
Une variable exogène est déterminée par des facteurs extérieurs au modèle et
n'a aucun rapport avec les autres variables ou construits de ce dernier.
Modèle factoriel de premier ordre.
Les covariances entre les variables observées s'expliquent par la présence d'un
facteur latent unique ou de plusieurs construits.
.
Indices incrémentaux.
Ces mesures permettent d'évaluer si un modèle spécifique proposé par un
chercheur correspond bien à un autre modèle de base.
Ce dernier est un modèle nul dans lequel les variables observées n'ont aucun
rapport les unes avec les autres.
Modèle de mesure.
Premier des deux modèles évalué par la SEM.
Il reflète la théorie qui caractérise les variables observées pour chaque
construit tout en permettant d'évaluer leur fiabilité
Indice de modification.
Indice calculé pour chaque relation possible n'étant pas estimée librement
mais fixée.
L'indice montre les améliorations qui apparaîtraient dans le X2 du modèle
global s'il était estimé librement.
Modèle imbriqué.
Un modèle est imbriqué dans un autre modèle s'il a le même nombre de
construits et de variables et s'il peut être dérivé à partir d'un autre modèle en
modifiant les relations (par exemple en ajoutant ou supprimant celles-ci).
Modèle non récursif.

Modèle structurel contenant des boucles récursives ou des doubles
dépendances.
Indices de parcimonie.
Les indices de parcimonie sont conçus pour adapter l'ajustement en fonction de
la complexité du modèle.
Ils sont particulièrement utiles pour évaluer les modèles concurrents.
Ils fournissent les mesures d'un bon ajustement.
Ratio de parcimonie.
S'obtient en divisant les degrés de liberté du modèle par la totalité des degrés de
liberté disponibles.
Analyse des pistes causales.
Cas particulier de SEM ne contenant que des indicateurs simples pour
chacune des variables du modèle causal.
L'analyse des pistes causales est une SEM dotée d'un modèle structurel mais
dépourvue de modèle de mesure.
.
Diagramme des relations causales.

Représentation graphique d'un modèle mettant en avant l'ensemble complet
des relations existant entre les construits.
Les relations de dépendance sont représentées par des flèches droites et les
relations de corrélation par des flèches courbes.
Matrice de covariance.
Symbolisée par S.
Variances et covariances des variables observées.
Modèle factoriel de second ordre.
Compte deux niveaux.
Un construit latent de second ordre cause de nombreux construits latents de
premier ordre qui causent à leur tour les variables observées.
Les construits de premier ordre servent donc ensuite d'indicateurs ou de
variables observées pour le facteur de second ordre.
Corrélations multiples au carré.

Ces valeurs se rapprochent du concept de communaliste et dénotent le niveau
de responsabilité d'un facteur ou d'un construit latent dans la variance d'une
variable observée.
Erreur structurelle.
Identique à un terme d'erreur dans une analyse de régression.
Dans le cas d'estimations entièrement standardisées, la corrélation multiple au
carré est égale à 1 - (moins) l'erreur structurelle.
Modèle structurel.
Second des deux modèles estimé lors d'une SEM.
Il reflète la théorie qui caractérise la manière dont les construits sont reliés les uns aux
autres, souvent dans le cadre de relations de dépendance multiples.
Relation structurelle.
 Relation de dépendance entre un construit endogène et un autre construit exogène ou
endogène.
Unidimensionnalité.
Notion selon laquelle un ensemble de variables observées ne représente qu'un seul
construit sous-jacent.
Les contributions croisées sont nulles
Principes fondamentaux de la SEM
Pour comprendre la SEM, il est indispensable de connaître :

La théorie,
La modélisation.
Diagramme des relations causales,
Les variables exogènes et endogènes, les relations de
Dépendance et corrélation,
Le coefficient de détermination et l'identification du modèle.
Principes fondamentaux de la SEM : théorie, modèle et
diagramme des relations causales
La théorie
Définie comme un schéma conceptuel établi sur des affirmations
fondamentales, appelées axiomes, que l'on tient pour vraies.
Elle sert d'affirmation: -conceptuelle au développement d'un modèle.
Le modèle SEM doit absolument s'appuyer sur une théorie
Spécifier l'ensemble des relations avant de pouvoir procéder à l'estimation du
modèle.
Les modèles sont souvent élaborés pour tester certaines hypothèses dérivées
d'une théorie.
Un modèle SI contient deux modèles :
1.le modèle de mesure et le modèle structurel.
2. Le modèle de mesure
Modèle de mesure
Décrit la manière dont les variables observées (mesurées) représentent les construits
latents.
Il reflète la théorie qui caractérise les variables observées pour chacun des construits et
permet d'évaluer leur fiabilité.
Les variables observées sont mesurées par un chargé d'études (contrairement aux
variables latentes qui sont le résultat d'un calcul).
On relie donc par des flèches droites le construit et les variables observées qui
lui servent d'indicateur
Un indicateur unique ne permet pas de représenter un construit dans sa
totalité, mais peut toutefois servir d’indication.
Les modèles d'équations structurelles privilégient l'utilisation des modèles de
mesure de type réflectif, c'est-à-dire dont les indicateurs sont spécifiés comme
le reflet du construit (graphiquement, la flèche part du rond vers le carré).
Le modèle de mesure s'appuie sur la technique de l'analyse factorielle confirmatoire
(CFA)
Le but est de s'assurer que le nombre de facteurs (ou construits) et l'influence des
variables (indicateurs) observées sont conformes aux attentes théoriques.
La CFA sert à vérifier la structure factorielle d'un ensemble de variables observées.
Le chercheur peut tester l'hypothèse d'une relation entre les variables observées et
leurs construits latents sous-jacents.
Le chercheur va d'abord faire appel à ses connaissances sur la théorie, à la recherche
empirique ou bien aux deux.
Il pose ensuite le type de relation a priori avant de tester cette hypothèse à l'aide
d'outils statistiques
Les indicateurs sont sélectionnés en fonction de la théorie, et une analyse

factorielle confirmatoire permet de déterminer s'ils influencent comme prévu le
nombre de facteurs attendus.
Les termes construits et facteur sont interchangeables.
Le chercheur exerce un contrôle total sur les indicateurs décrivant chaque
construit.
Le modèle structurel indique quant à lui la manière dont les construits sont
reliés les uns aux autres ;
C'est le plus souvent par des relations de dépendance multiples.
Grâce à ce modèle, il est possible d'attester ou de réfuter l'existence d'une
relation.
Si la théorie pose l'hypothèse d'une relation, on dessine une flèche.
.
Ce modèle se présente sous la forme graphique (voir chap3) d'un diagramme
des coefficients de direction.
Les normes utilisées pour dessiner le diagramme d'un modèle de mesure sont
les suivantes
Les construits sont représentés par des ovales ou des cercles et les variables
mesurées par des carrés.
Des flèches rectilignes relient les construits aux variables mesurées [voir
figure 20.1 (a)].
Les relations de dépendance sont représentées par des flèches rectilignes [voir
figure 20.1 (a)] et les relations de corrélation par des flèches incurvées [voir
figure 20.1(b)].
Principes fondamentaux de la SEM : Construits
exogènes et construits endogènes
Un construit est une variable inobservable ou latent que l'on peut définir en
termes conceptuels mais qu'il est impossible de mesurer directement, par
exemple en se fondant sur les résultats d'un questionnaire.
On ne peut mesurer un construit sans faire d'erreur.
Un construit se mesure de manière approximative et indirecte en étudiant sa
persistance au sein de plusieurs variables observées ou mesurées.
Un construit exogène est l'équivalent d'une variable indépendante latente
comptant plusieurs éléments dans une analyse multivariée traditionnelle.
On recourt à plusieurs items ou variables observées pour représenter un

construit exogène qui agit comme une variable indépendante dans le modèle.
Il est déterminé par des facteurs extérieurs au modèle et n'a aucun rapport
avec les autres variables ou construits de ce dernier.
Dans un modèle de mesure, les indicateurs ou les variables mesurées d'un
construit exogène sont appelées variables X.
Le construit Cx de la figure 20.1 (a) est donc exogène
Le construit endogène est l'équivalent d'une variable dépendante latente
comptant plusieurs éléments.
Il est déterminé par d'autres construits ou variables appartenant au modèle et il
est dépendant des autres construits.
Des flèches simples provenant d'un ou de plusieurs construits exogènes ou
d'autres construits endogènes vont vers le construit endogène.
Dans un modèle de mesure, les indicateurs ou les variables mesurées d'un
construit endogène sont appelées variables Y.
Principes fondamentaux de la SEM: Relations de
dépendance et de corrélation
Une relation de dépendance est illustrée par des flèches rectilignes.

Les flèches partent de l'antécédent (indépendant) vers la variable mesurée ou
le construit latent subissant les effets ultérieurs (dépendant).
Les flèches rectilignes partent du construit vers les variables mesurées.
Dans un modèle structurel, la dépendance se manifeste entre les construits
Des flèches rectilignes les relient entre eux
Un construit endogène peut être l'antécédent d'autres construits endogènes.
Le diagramme des relations causales présente en général les relations de dépendance et
de corrélation entre construits endogènes et exogènes, conformément à la théorie.
La SEM permet de tester les autres relations multiples représentées par les équations
multiples.
L'ajustement ou la justesse de prédiction doivent par conséquent être déterminés pour
l'ensemble du modèle et non pas pour une seule relation.
 Il existe plusieurs techniques multivariées permettant de décomposer la variance
L'analyse de la variance et de la covariance,
La régression multiple, etc.
La SEM permet d'analyser la corrélation ou la covariance.

Elle détermine si la théorie proposée explique de manière satisfaisante la
matrice de corrélation ou de covariance observée parmi les variables mesurées.
L'analyse des données repose dans un premier temps sur les matrices de
corrélation ou de covariance au niveau de l'item.
Une des étapes préparatoires consiste donc à établir des corrélations ou des
covariances parmi les items (variables mesurées ou observées).
La plupart des programmes SEM actuels génèrent automatiquement des

corrélations ou des covariances avant l'analyse
Il est toutefois important de noter que l'analyse SEM s'appuie sur une matrice
de corrélation ou de covariance plutôt que sur des données brutes.
Il est recommandé de baser les estimations de la SEM sur les covariances plutôt
que sur les corrélations.
La matrice de corrélation est un cas particulier de matrice de covariance utilisée
lorsque les données sont standardisées
Les covariances contiennent davantage d'informations et sont plus souples que
les corrélations.
Il est possible de s'appuyer sur les modèles de mesure et les modèles structurels
proposés pour estimer la matrice de covariance entre les variables observées, Y.k.
Le coefficient de détermination est ensuite établi en comparant les similarités entre la
matrice de covariance estimée Hk et la matrice de covariance observée S (échantillon).
On utilise la formule S - Σk pour calculer les tests d'ajustement.
Le résidu est la différence entre la valeur observée et la valeur estimée d'une
covariance.
FML =
Principes fondamentaux de la SEM: identification des
modèles
L’identification du modèle
Permet de savoir si la matrice de covariance contient suffisamment
d'informations pour permettre d'estimer un ensemble d'équations structurelles.
Estimer un paramètre qui servira de modèle pour chaque variance ou covariance
parmi les variables observées.
Si l'on a p variables observées, il est possible d'estimer jusqu’à : (p(p + l))/2
paramètres.
Ce chiffre représente la somme de toutes les covariances uniques (p{p - l))/2 et
de toutes les variances,
 p: (p(p+l))/2 = (p(p-l))/2+p
Principes fondamentaux de la SEM: identification des
modèles
Si k, le nombre réel de paramètres estimés est inférieur à (p(p + 1 ))/2, le

modèle est suridentifié.
Les degrés de liberté sont positifs.
Si k est supérieur à p(p + l))/2, le modèle est sous-identifié et il est impossible
de trouver une solution unique.
Disposer d'au moins trois variables observées pour chaque construit latent
facilite l'identification du modèle qui sera sur identifié.
Cette pratique est donc recommandée.
Différentes étapes de la SEM
Le processus de modélisation par équations structurelles est illustré à la figure 20.2.
Les étapes qui composent ce processus sont les suivantes :
Définir les construits individuels,
Spécifier le modèle de mesure,
Evaluer la fiabilité et la validité du modèle et mesure,
Spécifier le modèle structurel si le modèle de mesure est valide,
Evaluer la validité du modèle structurel
Tirer les conclusions et faire les recommandations appropriées si le modèle structurel
Différentes étapes de la SEM
Différentes étapes de la SEM: définir les construits individuels
Il est primordial que l'analyse de la SEM soit basée sur une théorie.
Préciser les construits spécifiques, la manière dont chaque construit va être défini et
mesuré ainsi que les relations entre les construits en s'appuyant sur une théorie.
On s'intéresse en général à la modélisation par équation structurelles quand on
souhaite tester à la fois la théorie de mesure et la théorie structurelle.
La théorie de mesure spécifie la manière dont les construits sont représentés.
Différentes étapes de la SEM: définir les construits
individuels
La théorie structurelle pose la manière dont les construits sont reliés entre eux.
Les relations structurelles présupposées par la théorie sont converties en
hypothèses
Le test auquel on soumet ces hypothèses ne sera valide que si le modèle de
mesure sous-jacent spécifiant la représentation de ces construits est lui-même
valide.
Apporter le plus grand soin aux opérations de mise en œuvre, de mesure et de
mise à l'échelle des variables pertinentes identifiées et définies par la théorie.
Différentes étapes de la SEM: spécifier le modèle de
mesure
Spécifier le modèle de mesure.

Faire associer les variables mesurées qui conviennent à chacun des construits
latents.
Le modèle de mesure est généralement représenté par un diagramme qui
représente un modèle de mesure simple doté de deux construits corrélés.
Sur le plan de la représentation graphique, des flèches relient chaque construit
aux mesures variables qui le représentent.
Différentes étapes de la SEM : Spécifier le modèle de
mesure
Seules les valeurs reliant chaque variable mesurée avec son construit latent à
l'aide d'une flèche sont estimées, les autres sont nulles.
Un facteur est insuffisant pour expliquer parfaitement une variable mesurée,
ce qui justifie l'ajout d'un terme d'erreur.
On ne distingue pas les construits exogènes et endogènes : ils sont tous traités
comme s'ils étaient du même type, comme dans les analyses factorielles.
Les construits sont fréquemment représentés par des caractères grecs et les
mesures variables par des lettres de l'alphabet.
mesure
Les notations les plus courantes sont les suivantes :

ξ, = facteurs latents ;
X = variables mesurées ;
λx = valeurs des facteurs ;
δ = erreurs ;
ϕ = corrélation entre construits.
Le modèle de mesure peut aussi être représenté sous forme d'équation :
X1 = λx1,1ξ + δ1
mesure
Le chargé d'études devra d'ailleurs préciser l'estimation faite pour chaque
paramètre potentiel du modèle.
Les premiers paramètres libres sont estimés au cours de l'analyse.
Les paramètres fixes ne sont pas estimés par la SEM puisque leur valeur est
définie par le chargé d'études.
Ce sera le plus souvent zéro pour indiquer qu'aucune estimation n'est proposée
pour cette relatif» spécifique.
Il est nécessaire de calibrer l'échelle pour spécifier les variables observées ou
les indicateurs de chaque construit latent.
Attribuer une valeur fixe à l'un des facteurs (on choisit en général la valeur 1).
Différentes étapes de la SEM: Détermination de la taille
de l'échantillon
La taille de l'échantillon requis pour une SEM dépend de plusieurs points :
La complexité du modèle,
La technique d'estimation,
Le volume de données manquantes,
La variance d'erreur de la moyenne des indicateurs ou des variables mesurées et
la distribution multivariée des données.
de l'échantillon
La complexité,
Les modèles qui comptent plus de construits ou plus de variables mesurées
ont besoin d'échantillons plus importants.
C’est aussi le cas s'il existe moins de trois variables mesurées pour chaque
construit.
On utilise la méthode de l’estimation du maximum de ressemblance (ou MLL
Maximum Likelihood Estimation).
L’échantillon compte en général entre 200 et 400 sujets
de l'échantillon
La communalité peut expliquer l'impact de la variance d'erreur de la moyenne

des indicateurs.
La communalité est la variance d'une variable mesurée qui s’explique par le
construit auquel correspond sa valeur.
La recherche montre que plus la communalité est faible, plus les échantillons
ne devront être importants.
C'est tout particulièrement vrai pour les communalités inférieures à 0,5.
Les données varient de plus en plus souvent par rapport à la multinormalité
supposée, des échantillons plus importants sont requis.
Pour minimiser les problèmes d'écart par rapport à la norme, on propose qu'il
y ait au moins 15 répondants pour chaque paramètre estimé dans le modèle.
de l'échantillon
Les recommandations sont les suivantes

Les modélisations par équations structurelles comptant au maximum cinq
construits tous associés à plus de trois variables mesurées et des communautés
d'au moins 0,5, il faut des échantillons d'au moins 200 personnes.
En présence d'au moins cinq construits, même si certains sont mesurés avec
moins de trois indicateurs ou que les communautés soient inférieures à 0,5, la
taille de l'échantillon doit être d'au moins 300 personnes.
Au-delà de cinq construits, si l'on a mesuré plusieurs d'entre eux avec moins
de trois indicateurs et si plusieurs communautés sont inférieures à 0,5,
l'échantillon doit compter au moins 400 personnes.
Des échantillons plus importants offrent des solutions plus stables.

Différentes étapes de la SEM : Évaluer la fiabilité et la
validité du modèle de mesure
La validité du modèle de mesure dépend:

La qualité de l'ajustement,
La fiabilité
Preuves de la validité des construits, en particulier la validité convergente et
discriminante.
Différentes étapes de la SEM: Évaluer la qualité de
l’ajustement du modèle
La qualité de l’ajustement représente la rigueur avec laquelle le modèle spécifié

reproduit la matrice de covariance parmi les indicateurs.
On cherche à voir si la covariance estimée des variables (£ k) est identique à la
covariance observée (S) dans les échantillons de données.
Plus les valeurs des deux matrices sont proches, plus l’ajustement est bon.
Il existe différentes mesures pour évaluer la qualité de l'ajustement
Les indices absolus,
Les indices incrémentaux
Les indices de parcimonie.
Différentes étapes de la SEM: Évaluer la qualité de
Indices absolus
Chaque modèle est évalué indépendamment des autres.
Ces indices mesurent directement la rigueur avec laquelle le modèle spécifié reproduit les
données observées ou leur échantillon.
Les indices absolus peuvent mesurer la qualité ou la médiocrité de l'ajustement.
Les indices de la qualité d'ajustement (Goodness-of-Fit)
Indiquent si le modèle spécifié correspond bien aux données observées ou aux échantillons ;
Des valeurs élevées sont souhaitables.
Les mesures les plus courantes sont l'indice GFI (Goodness-of-Fit Index) et l'indice AGFI
(Adjusted Goodness-of-Fit Index).
Différentes étapes de la SEM :Évaluer la qualité de
l'ajustement du modèle
Différentes étapes de la SEM : Évaluer la qualité de
La médiocrité de l'ajustement se mesure généralement avec les outils suivants :

Racine carrée du khi-deux (x2)
RMSR (Root Mean Square Residual),
SRMR (Standardized Root Mean square Residual)
RMSEA (Root Mean Square Error of Approximation).
Les indices incrémentaux

Evaluent la qualité de l'ajustement entre le modèle spécifié et l'échantillon de
données relatives à un autre modèle envisagé comme modèle de base.
Le modèle de base le plus fréquemment utilisé est un modèle de type nul qui
suppose une absence de corrélation entre les variables observées.
Les indices incrémentaux les plus courants sont les suivants
 NFI (Normed Fit Index),
NNFI (Non-Normed Fit Index),
CFI (Comparative Fit Index),
TLI (Tucker Lewis Index)
Les indices de mesure de parcimonie

Servent à évaluer l'ajustement en fonction de la complexité du modèle.
Ils sont particulièrement utiles pour évaluer des modèles concurrents.
Ils fournissent les mesures d'un bon ajustement et peuvent être optimisés d'un
ajustement mieux adapté ou d'un modèle plus simple prenant en compte un
plus petit nombre de paramètres.
Les indices de mesure de parcimonie les plus courants sont les suivants :
Parsimony Goodness-of-Fit Index
PNFI (Parsimony normed Fit Index).
Khi-deux (x2).
Tester sur un plan statistique, la différence entre les matrices de covariance comme dans : x2 = (n
- 1) [matrice de covariance de l'échantillon observé - matrice de covariance estimée]
n = taille de l'échantillon ou : x 2 = (n - 1) (S – Σk).
Calculer la probabilité pour que la variance observée soit en fait égale à la covariance estimée.
 Plus cette probabilité est faible (p < 0,05), plus les probabilités sont nombreuses pour que les
deux matrices de covariance ne soient pas égales
Le degré de liberté (ddl) se définit à l'aide de la formule suivante :
ddl= 1/2 [(p)(p+l)] –k
P est le nombre total de variables observées et k le nombre de paramètres estimés.
Mesure du khi-deux
Présente certaines limitations dans le sens où elle augmente proportionnelle
ment à la taille de l'échantillon et au nombre de variables observées,
Indispensable de connaître d'autres indices d'ajustement
Goodness-of-Fit.
Le GFI est un indice absolu tandis que l'AGFI rend compte des degrés de
liberté présents dans le modèle.
Si Fk est l'ajustement minimal du modèle estimé et F0 l'ajustement du modèle
de base sans aucun paramètre libre, alors GFI = 1 - Fk /F0.
Avec un meilleur ajustement, le rapport Fk/F0 diminue et le GFI augmente.
AGFI
S'ajuste aux degrés de liberté et sert à établir des comparaisons entre des
modèles présentant des difficultés variées.
AGFJ = [1 - {p (p + 1)/2ddl) (1 - GFI)],
p symbolise le total des variables observées et ddl les degrés de liberté du
modèle.
Des valeurs plus élevées, de l'ordre de 0,90 sont acceptables.
La taille des échantillons a un impact sur le GFI et l'AGFI dont les valeurs
peuvent être importantes dans le cas de modèles à faibles spécifications.
Leur usage est donc limité en tant qu'indices d'ajustement.
Le RMSR
Racine carrée de la moyenne du carré des résidus.
Covariance résiduelle moyenne qui est une fonction des unités utilisées pour
mesurer les variables observées.
SRMR,
Valeur standardisée de la racine carrée de la moyenne du carré des résidus,
Comparer l'ajustement de différents modèles.
Des valeurs SRMR et RMSR, faibles indiquent un bon ajustement.
Des valeurs égales ou inférieures à 0,08 sont donc souhaitables.
Le modèle spécifié est comparé au modèle nul dans lequel on suppose que les
variables ne sont pas corrélées
Le NFI est le ratio de la différence entre le x 2 du modèle proposé (X2prop) et
modèle nul (x2 nul) divisée par le x2 du modèle nul (x2nu¡)
NFI = (x2 nul- X2prop)/ x2 nul
 Comme la valeur du x2 du modèle proposé est proche de zéro, le NFI tend à se
rapprocher de 1, c'est-à-dire d'un ajustement parfait.
Plus les paramètres dans le modèle sont nombreux et plus le NFI est élevé
NNFI = (X2null/ddlnul-X2prop/ddlprop) /[(X2null/ddlnul) – 1]

ddlprop et ddlnul sont respectivement les degrés de liberté des modèles proposé et
nul.
Valeurs supérieures ou égales à 0,90 sont considérées comme acceptables tant
pour le NFI que pour le NNFI.
Le CFI est lié au NFI et aux facteurs des degrés de liberté des modèles complexes.
CFI = 1 - {X2prop - ddlprop)/ {x2nu¡ - ddlnul),
X2prop et ddlprop sont les valeurs du khi-deux et du degré de liberté du modèle théorique
de base proposé et x2 nul et ddlnull les valeurs du khi-deux et du degré de liberté du
modèle nul.
Le CFI varie de 0 à 1, et les valeurs égales ou supérieures à 0,90 sont en général associés
à un bon ajustement.
L'indice TLI ( Tucker Lewis Index) est identique au CFI d'un point de vue conceptuel,
mais il n'est pas normalisé
Ses valeurs sont donc susceptibles de se situer hors de la fourchette allant de 0 à 1.
Les modèles dont l'ajustement est bon se caractérisent par la valeur TLI proche de 1.
CFI et le RMSEA sont les plus populaires et les moins influencées par la taille
de l'échantillon.
Souhaitable de s'appuyer sur plusieurs indices de différents types (au moins
trois).
Rapporter la valeur du x2 aux degrés de liberté associés constitue toujours une
bonne pratique.
S'appuyer sur au moins un indice d'ajustement absolu (Goodness-of-Fit : un
indice d'ajustement absolu de médiocrité (Badness-of-Fit)
Indice d'ajusternent incrémental.
Différentes étapes de la SEM: Évaluer la fiabilité et la
On a vu qu'un construit non fiable ne saurait être valide.
Il convient donc d'évaluer en premier lieu la fiabilité des construits du modèle
de mesure.
Recourir au coefficient alpha.
La fiabilité des construits (ou CR, Composite Reliability)
CR - fiabilité des construits ;
λ = valeur factorielle complètement standardisée ;
δ = variance d'erreurs ;
p = nombre d'indicateurs ou de variables observées.
La fiabilité des construits rejoint donc la notion conventionnelle de fiabilité des
théories statistiques classiques.
On considère comme bonne une fiabilité des construits supérieure ou égale à
0,7.
Situé entre 0,6 et 0,7, elle peut être jugée acceptable à condition que les
estimations relatives à la validité du modèle soient bonnes.
validité du modèle de mesure (validité convergente)
Le volume des valeurs factorielles constitue une preuve de validité convergente.
Des valeurs factorielles élevées indiquent que les variables observées
convergent vers un même construit.
Toutes les valeurs factorielles devraient au moins être significatives sur le plan
statistique et donc idéalement supérieures à 0,5, voire à 0,7.
Des valeurs supérieures à 0,7 indiquent que le Construit est responsable d'au
moins 50 % des variations de variables observées étant donné que (0,71) 2 = 0,5).
On place parfois le seuil à 0,6.
Variance moyenne partagée (ou AVE, Average Variance Extracted).

Variance des indicateurs ou des variables observées pouvant être expliquée par
les construits latents.
L'AVE se calcule ainsi à l'aide de valeurs (complètement) standardisées :
Différentes étapes de la SEM: Évaluer la fiabilité et la validité
du modèle de mesure
AVE = variance moyenne partagée ;
λ, = valeur factorielle complètement standardisée ;
δ = variance d'erreurs ;
p = nombre d'indicateurs ou de variables observées.
AVE varie entre 0 et 1.
Elle représente la part de la variance totale due à une variable latente.
AVE supérieure ou égale à 0,5 traduit une validité convergente satisfaisante,
Construits latents sont responsables d’au moins 50 % de la variance des variables
observées, en moyenne.
AVE est inférieure à 0,5

Variance due à une erreur de mesure est plus importante que la variance
capturée par le construit
Validité des indicateurs individuelle et du construit doit être remise en
question.
AVE est une mesure plus conventionnelle que la CR.
Interpréter également les estimations des paramètres standardisés afin de
s'assurer de leur pertinence et de leur adéquation avec la théorie choisie
validité du modèle de mesure (validité discriminante)
Montrer que le construit se distingue des autres construits et que sa contribution

est par conséquent unique.
Les variables observées individuelles ne devraient prendre leur valeur que sur
un seul construit latent.
Des valeurs croisées trahissent un manque de singularité et posent
potentiellement problème au moment d'établir la validité discriminante.
On suppose en général qu’un ensemble de variables observées ne représente
qu'un seul construit sous-jacent.
C’est ce qu’on appelle le concept d'unidimensionnalité.
Différentes étapes de la SEM: évaluer la fiabilité et la
Manière formelle de mettre en évidence la singularité

Etablir que la corrélation entre deux construits pris au hasard est égale à 1
Spécifier que les variables observées mesurant les deux construits pourraient
aussi bien n'être représentées que par un seul d'entre eux.
Si l’ajustement d'un modèle à deux construits est à l'évidence supérieur à
l'ajustement d'un modèle à un construit
Preuve de validité discriminante.
Différentes étapes de la SEM: évaluer la fiabilité et la
Un autre moyen de tester la validité discriminante

Un construit peut expliquer ses variables observées mieux que n'importe quel
autre construit.
Montrer que la variance moyenne partagée est supérieure au carré des
corrélations.
Preuve de validité discriminante
Si la racine carrée de la variance moyenne partagée est supérieure aux
coefficients de corrélations.
Différentes étapes de la SEM: Manque de validité et
diagnostiquer les problèmes
Si la validité du modèle de mesure proposé n'est pas satisfaisante,

 S'appuyer sur les outils de diagnostic de l'analyse factorielle confirmatoire pour
procéder aux modifications nécessaires.
Les outils de diagnostic
Les estimations des pistes,
Résidu standardisé,
Indices de modification
Recherche des spécifications.
Les estimations ou valeurs des pistes

Lient chaque construit à ses indicateurs ou aux variables observées.
Examiner les valeurs complètement standardisées, car la standardisation supprime l'effet
des échelles de mesure.
Mesurer différents indicateurs à l'aide de différentes échelles, ce qui est pris en compte
lors de la standardisation.
Des valeurs complètement standardisées ne peuvent se situer en dehors d'une fourchette
allant de -1,0 à 1,0.
Une valeur doit être significative sur le plan statistique.
Une valeur non significative suggère que l'indicateur correspondant doit être abandonné
à moins qu'il ne soit conservé pour d'importantes raisons théoriques.
Les valeurs doivent de préférence être supérieures à 0,7 ou au moins

supérieures à 0,5 lors de la comparaison avec des valeurs absolues
Des valeurs faibles mais significatives (inférieures à 0,5) semblent indiquer
que les indicateurs correspondants sont susceptibles d'être supprimés.
Les signes des valeurs doivent aller dans la direction indiquée par la théorie et
les valeurs doivent avoir un sens par rapport au point de vue théorique.
Etre utile d’évaluer le carré des corrélations multiples
Illustre à quel point la variance d'une variable observée dépend du construit
latent associé.
Les résidus sont la différence entre la covariance observée (par exemple les
échantillons de données) et la covariance estimée.
Un résidu standardisé est un résidu divisé par son erreur standard.
Valeurs absolues des résidus standardisés.
Cela pose problème lorsqu'elles dépassent 4,0.
Celles comprises entre 2,5 et 4,0 sont à surveiller attentivement.
Les programmes SEM calculent aussi un indice de modification pour chaque

relation potentielle.
Il montre les améliorations d’ordre général apportées au modèle si jamais la
relation causale a été estimée librement.
La valeur de l’indice devrait être inférieure à 4,0
Des valeurs supérieures indiquent que l’ajustement peut être optimisé en
estimant librement cette relation ou cette piste.
La recherche de spécification
Approche empirique qui s'appuie sur les outils de diagnostic du modèle ainsi
que sur les essais et les erreurs pour trouver un modèle mieux ajusté.
Sa mise en œuvre est très simple grâce aux logiciels SEM.
Aborder ce concept avec précaution
Déterminer un modèle mieux ajusté en s'appuyant sur des données empiriques
peut poser des problèmes.
Cette approche est déconseillée aux utilisateurs non spécialisés.
Le genre d'ajustements (pistes estimées, des résidus standardisés, etc) vont à l'encontre de la
nature intrinsèque de l'analyse factorielle confirmatoire
Ce type d'ajustement se rapproche en fait davantage de celui l'analyse factorielle exploratoire (EFA).
Dans le cadre de modifications mineures (supprimer par exemple moins de 10 % des variables
observées)
Continuer à utiliser le modèle et les données fournies après avoir procédé aux changements
proposés.
En cas de modifications importantes
Modifier la théorie de mesure, spécifier un nouveau modèle de mesure et collecter de nouvelle-
données pour tester le nouveau modèle.
Différentes étapes de la SEM: Spécifier le modèle
structurel
Une fois que le modèle de mesure a été validé, on peut passer à la spécification
du modèle structurel.
L'accent qui était mis sur des relations entre construits latents et variables
observées se déplace vers la nature et la magnitude des relations entre
construits.
Le modèle de mesure est donc modifié en fonction des relations entre
construits latents.
Ce changement va également provoquer une modification de la matrice de
covariance estimée basée sur l'ensemble des relations estimées.
structurel
La matrice de covariance observée, basée sur des échantillons ces données, va
quant à elle, rester identique
Ce sont les mêmes données qui seront utilisées pour estimer le modèle
structurel.
Les statistiques relatives à l'ajustement vont, elles aussi être modifiées, ce qui
indique une différence entre l'ajustement du modèle structurel et l'ajustement
du modèle de mesure.
structurel
structurel
Les construits Cl et C2 entretiennent maintenant une relation de dépendance :
C2 dépend de Cl.
La flèche double incurvée de la figure 20.3 a été remplacée par une flèche rectiligne
simple qui représente la piste entre Cl et C2.
La notation et les symboles ont eux aussi été modifiés.
Le construit C2 est maintenant représenté par ƞ 1.
Ce changement aide à distinguer un construit endogène (C2) d'un construit exogène (Cl).
Seules les variables observées du construit exogène Cl sont représentées par X [X1 àX3).
Les variables observées pour le construit endogène (C2) sont quant à elles représentées par Y
( Y1 à Y3).
Les termes de la variance d'erreur pour les variables Y sont symbolisés par £ plutôt que par S.
Évaluer la validité du modèle structurel : évaluer
l’ajustement
L’ajustement d'un modèle structurel s'observe à l'aide des mêmes critères que
ceux vus précédemment pour le modèle de mesure.
Le nombre de relations du modèle structurel peut être tout au plus que celui
du modèle de mesure.
Le modèle structurel contient moins de paramètres estimés.
Sa valeur x2 ne peut donc être inférieure à celle du modèle de mesure
correspondant.
Un modèle structurel ne peut pas avoir un meilleur ajustement.
Évaluer la validité du modèle structurel : évaluer
l’ajustement
L’ajustement du modèle de mesure sert donc de majorant à la qualité d'ajustement du modèle

structurel.
Plus la qualité d’ajustement du modèle structurel est proche de celle du modèle de mesure, plus
1’ajustement est considéré comme bon.
D’autres règles et outils statistiques identiques à ceux vus précédemment pour le modèle de
mesure.
Ils permettent eux aussi d’évaluer l’ajustement du modèle structurel à l’aide des mêmes indices.
Évaluer la validité du modèle structurel: comparer des
modèles concurrents
Montrer que cet ajustement est meilleur que celui des modèles concurrents qui pourraient servir
d'alternative.
Un bon ajustement ne suffit pas pour prouver que la théorie ou le modèle structurel proposés
sont ceux qui correspondent le mieux aux échantillons de données (matrice de covariance).
Un autre modèle pourrait tout à fait offrir un ajustement équivalent, vont même meilleur.
La qualité de l’ajustement ne garantit pas que le modèle structurel proposé soit la seule version
valable.
Défendre le modèle trouvé en le comparant avec des modèles concurrents.
La comparaison entre le modèle proposé (Ml) et le modèle concurrent (M2) peut porter sur les
différences entre les indices d'ajustement incrémentaux ou les indices de mesure de parcimonie
Évaluer la validité du modèle structurel: comparer des
modèles concurrents
Δ x2 Δddl = x2 ddl(M1) - x2 ddl(M2)

Δddl = ddl (M1)-ddl (M2)
La différence entre les valeurs distribuées de deux khi-deux a aussi une distribution khi-
deux.
Tester si la différence entre Δx2 et Δddl est significative sur un plan statistique.
Recourir à cette procédure pour tester l’importance des différences d'ajustement entre le
modèle de mesure et le modèle structurel.
Si ce n’est pas le cas et que l’ajustement ne soit pas nettement inférieur à celui du
modèle de mesure, cela prouve la validité de la théorie structurelle.
Évaluer la validité du modèle structurel: tester des
relations hypothétiques
Les relations théoriques sont en général transformées en hypothèses pouvant

être testées de manière empirique.
La théorie structurelle est considérée comme valide tant que l'analyse SEM
confirme les hypothèses.
Les paramètres estimés pour une relation hypothétique doivent être
significatifs sur un plan statistique et représentés par le signe adéquat.
Évaluer la validité du modèle structurel: tester des
relations hypothétiques
Observer les estimations qui s'expliquent par la variance des construits

endogènes.
Cette analyse est comparable au n2 de l'analyse de la variance ou au R2 de la
régression multiple.
Si on utilise la SEM pour étudier la validité nomologique d'une nouvelle
échelle,
Les hypothèses seront alors remplacées par des relations connues qui seront
examinées de manière empirique pour défendre la validité nomologique.
Évaluer la validité du modèle structurel: Diagnostiquer
le modèle structurel
Les outils de diagnostic du modèle structurel sont les mêmes que ceux du
modèle de mesure. On procède donc au même examen.
Une analyse complémentaire peut être réalisée à l’aide des outils de
diagnostic.
Spécifier une ou plusieurs pistes supplémentaires qui ne faisaient pas partie
des hypothèses de la théorie de départ.
Toute relation provenant d’une modification n’est pas soutenue par la théorie et
ne devrait pas être traitée de la même manière que les relations originales basées
sur la théorie structurelle.
Évaluer la validité du modèle structurel: tirer des
conclusions et faire des recommandations
La validité du modèle de mesure et du modèle structurel apparaît satisfaisante,

Tirer des conclusions et le cas échéant faire quelques recommandations.
Les conclusions auxquelles on peut aboutir concernant les mesures des
construits clés s'appuient sur une analyse factorielle confirmatoire.
Conclure à la validité et à la fiabilité d'une nouvelle échelle qui pourra être
utilisée pour de futures recherches.
Évaluer la validité du modèle structurel: tirer des
conclusions et faire des recommandations
Ces conclusions sont le fruit des tests d’hypothèse effectués dans le modèle
structurel.
Déduire que les relations qui se caractérisent par des paramètres structurels
estimés importants et pertinents sont confirmées.
Les implications théoriques, managériales et/ou publiques de ces relations
peuvent bien sûr donner lieu à discussion.
Il est possible de faire des recommandations appropriées à la direction en
s'appuyant sur les effets possibles au niveau du management.
Évaluer la validité du modèle structurel: analyse
factorielle confirmatoire d'ordre supérieur
Modèle factoriel de premier ordre.

Les covariances entre les différentes variables observées X s’expliquent par la
présence d'un seul niveau de construits latents.
Contient au moins deux niveaux de construits latents.
Modèle factoriel de second ordre à deux niveaux.
Génère plusieurs construits latents de premier ordre qui à leur tour génèrent
les variables observées.
Les construits de premier ordre agissent par conséquent en tant qu'indicateurs
ou variables observées pour les facteurs de second ordre.
factorielle confirmatoire d’ordre supérieur
Une échelle IUIPC destinée à mesurer les inquiétudes des utilisateurs quant à la confidentialité
de leurs informations personnelles sur Internet.
L'échelle IUIPC compte trois dimensions
1.Collecte d'information (COL),
2.Contrôle de l'information (CON)
3.Prise de connaissance (CONN)
Les construits mesurées respectivement par quatre, trois et trois variables observées
factorielle confirmatoire d'ordre supérieur
Les covariances entre les trois construits latents COL. CON et CONN sont
estimées librement dans le modèle de premier ordre
Le modèle de second ordre illustre quant à lui ces covariances en spécifiant un
autre construit d'ordre supérieur (IUIPC) qui génère des construits de premier
ordre (COL, CON et CONN).
Évaluer la validité du modèle structurel: Analyse factorielle
confirmatoire d'ordre supérieur
Évaluer la validité du modèle structurel: analyse factorielle confirmatoire
d'ordre supérieur
On peut passer d'un modèle de mesure à un modèle structurel.

La relation structurelle entre la confidentialité des informations et un autre construit latent tel
que la confiance (CONF) sera représentée par des pistes multiples dans un modèle de premier
ordre (COL: CONF, CON: CONF et CONN: CONF).
Dans le modèle de second ordre, cette relation sera par contre représentée par une piste unique
(IUIPC : CONF).
Un modèle de second ordre suppose donc que toutes les dimensions de premier ordre (COL,
CON et CONN) seront affectées par les autres construits latents théoriquement liés entre eux
(par exemple, CONF).
Exemples d’illustration
Comportements non éthiques dans les banques

Justice des taux d’intérêt en microfinance
Relation entre la responsabilité sociale et la confiance dans le secteur bancaire
Bibliographie
Bearden , W , Netemeyer RG and Haws KL (2011), Handbook of Marketing

Scales : multi-item measures for Marketing and consumer Behavior Research,
Third Edition, Sage
Malhotra N, Decaudin JM , Bauguerra A et Bories D (2011), Etudes
Marketing , Sixième Edition, Pearson Education
Giannelloni JL et Vernette E (2012), Etudes de Marché, Vuilbert Gestion,
troisième Edition
Devellis RF (2012), Scale Development: theory and Applications, Sage
Bibliographie
Byrne B. (2009). “Structural Equation Modeling With LISREL, PRELIS, and

SIMPLIS: Basic Concepts, Applications, and Programming”, New York,
Psychology Press Taylor and Francis
Brown, T. (2006). Confirmatory factor analysis for applied research. New York,
London: The Guilford Press
Churchill, G., (1979), A Paradigm for Developing Better Measures of Marketing
Constructs, Journal of Marketing Research, Vol.16, n°1, p. 64-73.
Bibliographie
Balemba, K. (2017). Customer satisfaction with the services of microfinance

institutions: Scale development and validation. Strategic Change, 26, 563–574.
Bagozzi, R.P., Yi, Y., & Phillips, L.W. (2012). Specification, evaluation, and
interpretation of structural equation models, Journal of the Academic Marketing.
Science, 40, 8–34.
Walsh, G. & Beatty, S. (2007). Customer-based corporate reputation of a service
firm: scale development and validation, Journal of the Academy of Marketing
Science, 3, 127-143
Travail pratique
1. Définir correctement le terme Notable : qu’est-ce la littérature dit? Quelles

sont les différentes définitions qui existent ainsi que les dimensions sous-
jacentes?
2. Conduire des interviews et/ou des focus groups à Bukavu pour savoir : qui
est notable à Bukavu? Quelles sont les caractéristiques principales du
notable?
3. Identifier les items et dimensions du notable via la littérature et les
interviews.
4. Conduire une enquête quantitative sur au moins 300 personnes à Bukavu
5. Analyser les données sur base des analyses factorielles exploratoire et
confirmatoire
6. Discuter les résultats

Methodes Quantitatives en Gestion Unh2024

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Methodes Quantitatives en Gestion Unh2024

Transféré par

Droits d'auteur :

Formats disponibles

Université Nouveaux Horizons

Faculté des Sciences Economiques et de Gestion

Cours de Méthodes quantitatives en Gestion

Prof. Eddy BALEMBA Kanyurhi

Etudes postdoctorales, Centre Européen de Recherche en Microfinance (CERMI), Université

 Balemba EB, Bucekuderhwa, C, et Kadura, L (2018), les déterminants du crowdfunding

 Malhotra N, Decaudin JM , Bauguerra A et Bories D (2011), Etudes Marketing ,

Introduction à la mesure en gestion

On utilise la mesure tout au long des journées.

Mesure quantitative est un processus déductive.

Astronome ou le biologiste utilisent respectivement le télescope ou le microscope pour

Phénomènes invisibles ne sont pas que propres aux sciences sociales.

Mesure part toujours d’un concept.

Je souhaite mesurer la satisfaction au travail :

Processus de mesure commence par la conceptualisation

Exemple : orientation entrepreneuriale, orientation marché, orientation des parties prenantes,

Une fois que le chercheur a « sa » définition (pour sa recherche dans la mesure

Plusieurs façons de mesurer un concept

Le processus de mesure relie les trois niveaux

1. Churchill, G. (1979), “Paradigm for Developing Better Measures of Marketing

For realistic results

Est-ce qu’il est possible d’obtenir un bon résumé ?

Propose une mesure globale en s’appuyant sur une démarche statistique.

Nous calculons le déterminant |R| de la matrice de corrélation.

L’indice KMO varie entre 0 et 1.

Eigen value (or latent root)

When we take the sum of squared values of factor loadings relating to a

Total sum of squares

This method of factor analysis, developed by L.L. Thurstone,

Various steps involved in this method are as follows:

The first centroid factor is determined as under:

The numerator in the above formula is what is found in R1 corresponding to the

For any variable which is so reflected

1 1000 709 204 081 626 113 155 774

3 204 051 1000 671 123 689 582 072

4 081 089 671 1000 022 798 613 111

5 626 581 123 022 1000 047 201 724

6 113 098 689 798 047 1000 801 120

7 155 083 582 613 201 801 1000 152

8 774 652 072 111 724 120 152 1000

To obtain the second centroid factor B,

Reflecting the variables 3, 4, 6 and 7, we obtain reflected matrix of residual

Principal components (PC) analysis

The method is being applied mostly by using standardized variables

1 1000 .709 .204 .081 .626 .113 .155 .774

Variables 2 .709 1000 .051 .089 .581 .098 .083 .652

3 .204 .051 1000 .671 .123 .689 .582 .072

4 .081 .089 .671 1000 .022 .798 .613 .111

5 .626 .581 .123 .022 1000 .047 .201 .724

6 .113 .098 .689 .798 .047 1000 .801 .120

7 .155 .083 .582 .613 .201 .801 1000 .152

8 .774 .652 .072 .111 .724 .120 .152 1000

Normalizing Ua1 we obtain Va1

Normalisation des variables

1 0.69 +0.57 (0.69)² + (0.57)² = 0.801

 Ressemblances et différences entre l’analyse factorielle exploratoire et

EFA is to completely standardize all variables in the analysis.

Completely standardized solutions are most commonly reported CFA research

CFA model is more parsimonious than the EFA models

EFA’s identification restrictions

Specification of CFA is strongly driven by theory or prior research evidence.

The acceptability of the specified model

Measurement invariance is an important aspect

Overidentified nature of the structural portion of this model,

PRICE .72 .14(.02) .50( .25) .32**(.10) 1