Skip to content

Commit 25942e7

Browse files
authored
Merge pull request #33 from EducationalTestingService/bugfix/fix-scores
Major updates to CFA, and making the whole package `sklearn` compatible
2 parents f02c13e + 0e61991 commit 25942e7

File tree

68 files changed

+7041
-2366
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+7041
-2366
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,4 +9,6 @@
99
*dist/
1010
*build/
1111
*static/
12+
*htmlcov/
13+
*templates/
1214
__pycache__

README.rst

Lines changed: 98 additions & 120 deletions
Original file line numberDiff line numberDiff line change
@@ -12,12 +12,17 @@ FactorAnalyzer
1212
:target: https://anaconda.org/desilinguist/factor_analyzer/
1313

1414

15-
This is a Python module to perform exploratory factor analysis, with several
16-
optional rotations. Estimation can be performed using a minimum residual
17-
(minres) solution (identitical to unweighted least squares), or maximum
18-
likelihood estimation (MLE).
19-
20-
Portions of this code are ported from the excellent R library ``psych``.
15+
This is a Python module to perform exploratory and factor analysis (EFA), with several
16+
optional rotations. It also includes a class to perform confirmatory factor
17+
analysis (CFA), with certain pre-defined constraints. In expoloratory factor analysis,
18+
factor extraction can be performed using a variety of estimation techniques. The
19+
``factor_analyzer`` package allows users to perfrom EFA using either (1) a minimum
20+
residual (MINRES) solution, (2) a maximum likelihood (ML) solution, or (3) a principal
21+
factor solution. However, CFA can only be performe using an ML solution.
22+
23+
Both the EFA and CFA classes within this package are fully compatible with `scikit-learn`.
24+
Portions of this code are ported from the excellent R library `psych`, and the `sem`
25+
package provided inspiration for the CFA class.
2126

2227
Please see the `official documentation <http://factor-analyzer.readthedocs.io/en/latest/index.html>`__ for additional details.
2328

@@ -36,12 +41,13 @@ variable and the latent factors.
3641
Confirmatory factor analysis (CFA), a closely associated technique, is
3742
used to test an a priori hypothesis about latent relationships among sets
3843
of observed variables. In CFA, the researcher specifies the expected pattern
39-
of factor loadings, and other possible constraints on the model.
44+
of factor loadings (and possibly other constraints), and fits a model according
45+
to this specification.
4046

4147
Typically, a number of factors (K) in an EFA or CFA model is selected
4248
such that it is substantially smaller than the number of variables. The
4349
factor analysis model can be estimated using a variety of standard
44-
estimation methods, including but not limited to OLS, minres, or MLE.
50+
estimation methods, including but not limited MINRES or ML.
4551

4652
Factor loadings are similar to standardized regression coefficients, and
4753
variables with higher loadings on a particular factor can be interpreted
@@ -61,13 +67,12 @@ Two common types of rotations are:
6167
correlated.
6268

6369
This package includes a ``factor_analyzer`` module with a stand-alone
64-
``FactorAnalyzer``class. The class includes an ``analyze()`` method that
65-
allows users to perform factor analysis using either minres or MLE, with
66-
optional rotations on the factor loading matrices. The package also offers
67-
a stand-alone ``Rotator`` class to perform common rotations on an unrotated
68-
loading matrix.
70+
``FactorAnalyzer`` class. The class includes ``fit()`` and ``transform()``
71+
methods that enable users to perform factor analysis and score new data
72+
using the fitted factor model. Users can also perform optional otations
73+
on a factor loading matrix using the ``Rotator`` class.
6974

70-
The following rotations options are available in both ``FactorAnalyzer``
75+
The following rotation options are available in both ``FactorAnalyzer``
7176
and ``Rotator``:
7277

7378
(a) varimax (orthogonal rotation)
@@ -80,10 +85,11 @@ and ``Rotator``:
8085

8186
In adddition, the package includes a ``confirmatory_factor_analyzer``
8287
module with a stand-alone ``ConfirmatoryFactorAnalyzer`` class. The
83-
class includes an ``analyze()`` method that allows users to perform
84-
confirmatory factor analysis using MLE. Performing CFA requires users
85-
to specify a model with the expected factor loading relationships and
86-
other constraints.
88+
class includes ``fit()`` and ``transform()`` that enable users to perform
89+
confirmatory factor analysis and score new data using the fitted model.
90+
Performing CFA requires users to specify in advance a model specification
91+
with the expected factor loading relationships. This can be done using
92+
the ``ModelSpecificationParser`` class.
8793

8894
Examples
8995
--------
@@ -92,111 +98,82 @@ Exploratory factor analysis example.
9298

9399
.. code:: python
94100
95-
In [1]: import pandas as pd
96-
97-
In [2]: from factor_analyzer import FactorAnalyzer
98-
99-
In [3]: df_features = pd.read_csv('test02.csv')
100-
101-
In [4]: fa = FactorAnalyzer()
102-
103-
In [5]: fa.analyze(df_features, 3, rotation=None)
104-
105-
In [6]: fa.loadings
106-
Out[6]:
107-
Factor1 Factor2 Factor3
108-
sex -0.129912 -0.163982 0.738235
109-
zygosity 0.038996 -0.046584 0.011503
110-
moed 0.348741 -0.614523 -0.072557
111-
faed 0.453180 -0.719267 -0.075465
112-
faminc 0.366888 -0.443773 -0.017371
113-
english 0.741414 0.150082 0.299775
114-
math 0.741675 0.161230 -0.207445
115-
socsci 0.829102 0.205194 0.049308
116-
natsci 0.760418 0.237687 -0.120686
117-
vocab 0.815334 0.124947 0.176397
118-
119-
In [7]: fa.get_uniqueness()
120-
Out[7]:
121-
Uniqueness
122-
sex 0.411242
123-
zygosity 0.996177
124-
moed 0.495476
125-
faed 0.271588
126-
faminc 0.668157
127-
english 0.337916
128-
math 0.380890
129-
socsci 0.268054
130-
natsci 0.350704
131-
vocab 0.288503
132-
133-
In [8]: fa.get_factor_variance()
134-
Out[8]:
135-
Factor1 Factor2 Factor3
136-
SS Loadings 3.510189 1.283710 0.737395
137-
Proportion Var 0.351019 0.128371 0.073739
138-
Cumulative Var 0.351019 0.479390 0.553129
101+
In [1]: import pandas as pd
102+
...: from factor_analyzer import FactorAnalyzer
103+
104+
In [2]: df_features = pd.read_csv('tests/data/test02.csv')
105+
106+
In [3]: fa = FactorAnalyzer(rotation=None)
107+
108+
In [4]: fa.fit(df_features)
109+
Out[4]:
110+
FactorAnalyzer(bounds=(0.005, 1), impute='median', is_corr_matrix=False,
111+
method='minres', n_factors=3, rotation=None, rotation_kwargs={},
112+
use_smc=True)
113+
114+
In [5]: fa.loadings_
115+
Out[5]:
116+
array([[-0.12991218, 0.16398151, 0.73823491],
117+
[ 0.03899558, 0.04658425, 0.01150343],
118+
[ 0.34874135, 0.61452341, -0.07255666],
119+
[ 0.45318006, 0.7192668 , -0.0754647 ],
120+
[ 0.36688794, 0.44377343, -0.01737066],
121+
[ 0.74141382, -0.15008235, 0.29977513],
122+
[ 0.741675 , -0.16123009, -0.20744497],
123+
[ 0.82910167, -0.20519428, 0.04930817],
124+
[ 0.76041819, -0.23768727, -0.12068582],
125+
[ 0.81533404, -0.12494695, 0.17639684]])
126+
127+
In [6]: fa.get_communalities()
128+
Out[6]:
129+
array([0.5887579 , 0.00382308, 0.50452402, 0.72841182, 0.33184336,
130+
0.66208429, 0.61911037, 0.73194557, 0.64929612, 0.71149718])
139131
140132
Confirmatory factor analysis example.
141133

142134
.. code:: python
143135
144-
In [1]: import pandas as pd
145-
146-
In [2]: from factor_analyzer import ConfirmatoryFactorAnalyzer
147-
148-
In [3]: data = pd.read_csv('tests/data/test12.csv')
149-
150-
In [4]: model = {'loadings': {"Verbal": ["english", "vocab", "socsci"],
151-
...: "Quant": ["socsci", "math", "natsci"]}}
152-
153-
In [6]: cfa.analyze(data, model, fix_first=False)
154-
155-
In [5]: cfa = ConfirmatoryFactorAnalyzer()
156-
157-
In [7]: cfa.loadings
158-
Out[7]:
159-
Verbal Quant
160-
english 3.532436 0.000000
161-
vocab 4.221969 0.000000
162-
socsci 3.281362 1.099739
163-
math 0.000000 4.888016
164-
natsci 0.000000 4.850257
165-
166-
In [8]: cfa.factor_covs
167-
Out[8]:
168-
Verbal Quant
169-
Verbal 1.000000 0.833013
170-
Quant 0.833013 1.000000
171-
172-
In [9]: cfa.error_vars
173-
Out[9]:
174-
evars
175-
english 9.249541
176-
vocab 5.044325
177-
socsci 5.782677
178-
math 15.519003
179-
natsci 9.406164
180-
181-
In [10]: loadings_se, error_vars_se = cfa.get_standard_errors()
182-
183-
In [11]: loadings_se
184-
Out[11]:
185-
Verbal Quant
186-
english 0.100785 0.000000
187-
vocab 0.098195 0.000000
188-
socsci 0.217784 0.216251
189-
math 0.000000 0.138298
190-
natsci 0.000000 0.123820
191-
192-
In [12]: error_vars_se
193-
Out[12]:
194-
error_vars
195-
english 0.385040
196-
vocab 0.353237
197-
socsci 0.307728
198-
math 0.742082
199-
natsci 0.600859
136+
In [1]: import pandas as pd
137+
138+
In [2]: from factor_analyzer import (ConfirmatoryFactorAnalyzer,
139+
...: ModelSpecificationParser)
140+
141+
In [3]: df_features = pd.read_csv('tests/data/test11.csv')
142+
143+
In [4]: model_dict = {"F1": ["V1", "V2", "V3", "V4"],
144+
...: "F2": ["V5", "V6", "V7", "V8"]}
145+
In [5]: model_spec = ModelSpecificationParser.parse_model_specification_from_dict(df_features,
146+
...: model_dict)
147+
148+
In [6]: cfa = ConfirmatoryFactorAnalyzer(model_spec, disp=False)
149+
150+
In [7]: cfa.fit(df_features.values)
151+
152+
In [8]: cfa.loadings_
153+
Out[8]:
154+
array([[0.99131285, 0. ],
155+
[0.46074919, 0. ],
156+
[0.3502267 , 0. ],
157+
[0.58331488, 0. ],
158+
[0. , 0.98621042],
159+
[0. , 0.73389239],
160+
[0. , 0.37602988],
161+
[0. , 0.50049507]])
162+
163+
In [9]: cfa.factor_varcovs_
164+
Out[9]:
165+
array([[1. , 0.17385704],
166+
[0.17385704, 1. ]])
167+
168+
In [10]: cfa.transform(df_features.values)
169+
Out[10]:
170+
array([[-0.46852166, -1.08708035],
171+
[ 2.59025301, 1.20227783],
172+
[-0.47215977, 2.65697245],
173+
...,
174+
[-1.5930886 , -0.91804114],
175+
[ 0.19430887, 0.88174818],
176+
[-0.27863554, -0.7695101 ]])
200177
201178
Requirements
202179
------------
@@ -205,11 +182,12 @@ Requirements
205182
- ``numpy``
206183
- ``pandas``
207184
- ``scipy``
185+
- ``scikit-learn``
208186

209187
Contributing
210188
------------
211189

212-
Contributions to FactorAnalyzer are very welcome. Please file an issue
190+
Contributions to ``factor_analyzer`` are very welcome. Please file an issue
213191
on GitHub, or contact [email protected] if you would like to contribute.
214192

215193
Installation
@@ -221,7 +199,7 @@ You can install this package via ``pip`` with:
221199

222200
Alternatively, you can install via ``conda`` with:
223201

224-
``$ conda install -c desilinguist factor_analyzer``
202+
``$ conda install -c ets factor_analyzer``
225203

226204
License
227205
-------

conda-recipe/factor_analyzer/meta.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{% set name = "factor_analyzer" %}
2-
{% set version = "0.3.0" %}
2+
{% set version = "0.2.3" %}
33
{% set file_ext = "tar.gz" %}
44
{% set hash_type = "sha256" %}
55
{% set hash_value = "94ea4c7d46e846cc7174787adce47156cf58dc257905c878edc5181b4fa300ed" %}

conda_requirements.txt

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
python>=3.4
22
pandas
3-
scipy
4-
numpy
3+
scipy=1.2.1
4+
numpy=1.16.2
5+
scikit-learn=0.20.1
56
nose=1.3.7
7+

factor_analyzer/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,8 @@
1111
calculate_kmo)
1212

1313
from .confirmatory_factor_analyzer import (ConfirmatoryFactorAnalyzer,
14-
ModelParser)
14+
ModelSpecificationParser,
15+
ModelSpecification)
1516

1617
from .utils import (fill_lower_diag,
1718
merge_variance_covariance,

0 commit comments

Comments
 (0)