Skip to content

Commit c8830d7

Browse files
author
jbiggsets
committed
updating readme, tests, and recipe
1 parent 86bf9f3 commit c8830d7

12 files changed

+174
-65
lines changed

README.rst

Lines changed: 27 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -12,14 +12,17 @@ FactorAnalyzer
1212
:target: https://anaconda.org/desilinguist/factor_analyzer/
1313

1414

15-
This is a Python module to perform confirmatory and exploratory and factor
16-
analysis, with several optional rotations. With exploratory factor analysis,
17-
estimation can be performed using a minimum residual (minres) solution
18-
(identitical to unweighted least squares), or maximum likelihood estimation (MLE).
19-
Confirmatory factor analysis can only be performed using a MLE solution.
20-
This code is fully compatible with `sklearn`.
21-
22-
Portions of this code are ported from the excellent R library `psych`.
15+
This is a Python module to perform exploratory and factor analysis (EFA), with several
16+
optional rotations. It also includes a class to perform confirmatory factor
17+
analysis (CFA), with certain pre-defined constraints. In expoloratory factor analysis,
18+
factor extraction can be performed using a variety of estimation techniques. The
19+
``factor_analyzer`` package allows users to perfrom EFA using either (1) a minimum
20+
residual (MINRES) solution, (2) a maximum likelihood (ML) solution, or (3) a principal
21+
factor solution. However, CFA can only be performe using an ML solution.
22+
23+
Both the EFA and CFA classes within this package are fully compatible with `scikit-learn`.
24+
Portions of this code are ported from the excellent R library `psych`, and the `sem`
25+
package provided inspiration for the CFA class.
2326

2427
Please see the `official documentation <http://factor-analyzer.readthedocs.io/en/latest/index.html>`__ for additional details.
2528

@@ -38,12 +41,13 @@ variable and the latent factors.
3841
Confirmatory factor analysis (CFA), a closely associated technique, is
3942
used to test an a priori hypothesis about latent relationships among sets
4043
of observed variables. In CFA, the researcher specifies the expected pattern
41-
of factor loadings, and other possible constraints on the model.
44+
of factor loadings (and possibly other constraints), and fits a model according
45+
to this specification.
4246

4347
Typically, a number of factors (K) in an EFA or CFA model is selected
4448
such that it is substantially smaller than the number of variables. The
4549
factor analysis model can be estimated using a variety of standard
46-
estimation methods, including but not limited to OLS, minres, or MLE.
50+
estimation methods, including but not limited MINRES or ML.
4751

4852
Factor loadings are similar to standardized regression coefficients, and
4953
variables with higher loadings on a particular factor can be interpreted
@@ -63,14 +67,13 @@ Two common types of rotations are:
6367
correlated.
6468

6569
This package includes a ``factor_analyzer`` module with a stand-alone
66-
``FactorAnalyzer`` class. The class includes a ``fit()`` method that
67-
allows users to perform factor analysis using either minres or MLE, with
68-
optional rotations on the factor loading matrices. The package also offers
69-
a stand-alone ``Rotator`` class to perform common rotations on an unrotated
70-
loading matrix.
70+
``FactorAnalyzer`` class. The class includes ``fit()`` and ``transform()``
71+
methods that enable users to perform factor analysis and score new data
72+
using the fitted factor model. Users can also perform optional otations
73+
on a factor loading matrix using the ``Rotator`` class.
7174

72-
The following rotations options are available in both `FactorAnalyzer`
73-
and `Rotator`:
75+
The following `rotations options are available in both ``FactorAnalyzer``
76+
and ``Rotator``:
7477

7578
(a) varimax (orthogonal rotation)
7679
(b) promax (oblique rotation)
@@ -82,10 +85,11 @@ and `Rotator`:
8285

8386
In adddition, the package includes a ``confirmatory_factor_analyzer``
8487
module with a stand-alone ``ConfirmatoryFactorAnalyzer`` class. The
85-
class includes a ``fit()`` method that allows users to perform
86-
confirmatory factor analysis using MLE. Performing CFA requires users
87-
to specify a model with the expected factor loading relationships. This
88-
can be done using the ``ModelSpecificationParser`` class.
88+
class includes ``fit()`` and ``transform()`` that enable users to perform
89+
confirmatory factor analysis and score new data using the fitted model.
90+
Performing CFA requires users to specify in advance a model specification
91+
with the expected factor loading relationships. This can be done using
92+
the ``ModelSpecificationParser`` class.
8993

9094
Examples
9195
--------
@@ -177,7 +181,7 @@ Requirements
177181
- ``numpy``
178182
- ``pandas``
179183
- ``scipy``
180-
- ``scikit-learn==0.20.1``
184+
- ``scikit-learn``
181185

182186
Contributing
183187
------------
@@ -194,7 +198,7 @@ You can install this package via ``pip`` with:
194198

195199
Alternatively, you can install via ``conda`` with:
196200

197-
``$ conda install -c desilinguist factor_analyzer``
201+
``$ conda install -c ets factor_analyzer``
198202

199203
License
200204
-------

conda-recipe/factor_analyzer/meta.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
{% set name = "factor_analyzer" %}
2-
{% set version = "0.3.0" %}
2+
{% set version = "0.2.3" %}
33
{% set file_ext = "tar.gz" %}
44
{% set hash_type = "sha256" %}
55
{% set hash_value = "94ea4c7d46e846cc7174787adce47156cf58dc257905c878edc5181b4fa300ed" %}

conda_requirements.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
python>=3.6
1+
python>=3.4
22
pandas
33
scipy=1.2.1
44
numpy=1.16.2

factor_analyzer/confirmatory_factor_analyzer.py

Lines changed: 63 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,12 @@ class ModelSpecification:
4444
The error variance specification
4545
factor_covs : array-like
4646
The factor covariance specification.
47+
factor_names : list of str or None
48+
A list of factor names, if available.
49+
Defaults to None.
50+
variable_names : list of str or None
51+
A list of variable names, if available.
52+
Defaults to None.
4753
4854
Attributes
4955
----------
@@ -67,12 +73,18 @@ class ModelSpecification:
6773
The indexes of "free" error variance parameters.
6874
factor_covs_free : numpy array
6975
The indexes of "free" factor covariance parameters.
76+
factor_names : list of str or None
77+
A list of factor names, if available.
78+
variable_names : list of str or None
79+
A list of variable names, if available.
7080
"""
7181

7282
def __init__(self,
7383
loadings,
7484
n_factors,
75-
n_variables):
85+
n_variables,
86+
factor_names=None,
87+
variable_names=None):
7688

7789
assert isinstance(loadings, np.ndarray)
7890
assert loadings.shape[0] == n_variables
@@ -81,6 +93,8 @@ def __init__(self,
8193
self._loadings = loadings
8294
self._n_factors = n_factors
8395
self._n_variables = n_variables
96+
self._factor_names = factor_names
97+
self._variable_names = variable_names
8498

8599
self._n_lower_diag = get_symmetric_lower_idxs(n_factors, False).shape[0]
86100

@@ -134,14 +148,22 @@ def n_factors(self):
134148
def n_lower_diag(self):
135149
return self._n_lower_diag
136150

151+
@property
152+
def factor_names(self):
153+
return self._factor_names
154+
155+
@property
156+
def variable_names(self):
157+
return self._variable_names
158+
137159
def get_model_specification_as_dict(self):
138160
"""
139161
Get the model specification as a dictionary.
140162
141163
Returns
142164
-------
143165
model_specification : dict
144-
The model specification parameters,
166+
The model specification keys and values,
145167
as a dictionary.
146168
"""
147169
return {'loadings': self._loadings.copy(),
@@ -152,7 +174,9 @@ def get_model_specification_as_dict(self):
152174
'factor_covs_free': self._factor_covs_free.copy(),
153175
'n_variables': self._n_variables,
154176
'n_factors': self._n_factors,
155-
'n_lower_diag': self._n_lower_diag}
177+
'n_lower_diag': self._n_lower_diag,
178+
'variable_names': self._variable_names,
179+
'factor_names': self._factor_names}
156180

157181

158182
class ModelSpecificationParser:
@@ -169,7 +193,10 @@ def parse_model_specification_from_dict(X, specification=None):
169193
Generate the model specification from a
170194
dictionary. The keys in the dictionary
171195
should be the factor names, and the values
172-
should be the feature names.
196+
should be the feature names. If this method
197+
is used to create the ``ModelSpecification``,
198+
then factor names and variable names will
199+
be added as properties to that object.
173200
174201
Parameters
175202
----------
@@ -201,6 +228,7 @@ def parse_model_specification_from_dict(X, specification=None):
201228
>>> model_spec = ModelSpecificationParser.parse_model_specification_from_dict(X, model_dict)
202229
"""
203230
if specification is None:
231+
factor_names, variable_names = None, None
204232
n_variables, n_factors = X.shape[1], X.shape[1]
205233
loadings = np.ones((n_factors, n_factors), dtype=int)
206234
elif isinstance(specification, dict):
@@ -219,15 +247,20 @@ def parse_model_specification_from_dict(X, specification=None):
219247

220248
return ModelSpecification(**{'loadings': loadings,
221249
'n_variables': n_variables,
222-
'n_factors': n_factors})
250+
'n_factors': n_factors,
251+
'factor_names': factor_names,
252+
'variable_names': variable_names})
223253

224254
@staticmethod
225255
def parse_model_specification_from_array(X, specification=None):
226256
"""
227257
Generate the model specification from
228258
an array. The columns should correspond to
229259
the factors, and the rows should correspond to
230-
the variables.
260+
the variables. If this method is used to create
261+
the ``ModelSpecification``, then no factor names
262+
and variable names will be added as properties
263+
to that object.
231264
232265
Parameters
233266
----------
@@ -285,32 +318,32 @@ class ConfirmatoryFactorAnalyzer(BaseEstimator, TransformerMixin):
285318
Parameters
286319
----------
287320
specification : ModelSpecificaition object or None, optional
288-
A model specification. This must be an ModelSpecificaiton object
321+
A model specification. This must be a ``ModelSpecificaiton`` object
289322
or None. If None, the ModelSpecification will be generated assuming
290323
that n_factors == n_variables, and that all variables load on all
291-
factors.
324+
factors. Note that this could mean the factor model is not
325+
identified, and the optimization could fail.
292326
Defaults to None.
293327
n_obs : int or None, optional
294328
The number of observations in the original
295-
data set. If this is not passed and `is_cov=True`,
296-
then a reduced form of the objective function will
297-
be used.
329+
data set. If this is not passed and `is_cov_matrix=True`,
330+
then an error will be raised.
298331
Defaults to None.
299332
is_cov_matrix : bool, optional
300-
Whether the input `data` is a
333+
Whether the input `X` is a
301334
covariance matrix. If False,
302-
assume it is the full data set
335+
assume it is the full data set.
303336
Defaults to False.
304337
bounds : list of tuples or None, optional
305338
A list of minimum and maximum
306339
boundaries for each element of the
307340
input array. This must equal `x0`,
308341
which is the input array from your
309342
parsed and combined model specification.
310-
The length is ((n_factors * n_variables) +
311-
n_variables + n_factors + (((n_factors * n_factors) -
312-
n_factors) // 2)
313-
If None, noting will be bounded.
343+
The length is:
344+
((n_factors * n_variables) + n_variables + n_factors +
345+
(((n_factors * n_factors) - n_factors) // 2)
346+
If None, nothing will be bounded.
314347
Defaults to None.
315348
max_iter : int, optional
316349
The maximum number of iterations
@@ -328,7 +361,7 @@ class ConfirmatoryFactorAnalyzer(BaseEstimator, TransformerMixin):
328361
Raises
329362
------
330363
ValueError
331-
If is_cov_matrix is True, and n_obs
364+
If ``is_cov_matrix`` is True, and n_obs
332365
is not provided.
333366
334367
Attributes
@@ -581,7 +614,7 @@ def fit(self, X, y=None):
581614
582615
Parameters
583616
----------
584-
X : numpy array
617+
X : array-like
585618
The data to use for confirmatory
586619
factor analysis. If this is just a
587620
covariance matrix, make sure `is_cov_matrix`
@@ -590,12 +623,14 @@ def fit(self, X, y=None):
590623
591624
Raises
592625
------
626+
ValueError
627+
If the specification is not None or a
628+
``ModelSpecification`` object
629+
AssertionError
630+
If ``is_cov_matrix=True`` and the matrix
631+
is not square.
593632
AssertionError
594633
If len(bounds) != len(x0)
595-
If `is_cov=True` and the shame is not square or equal to the
596-
number of variables.
597-
ValueError
598-
If `fix_first=True` and `factor_vars` exists in the model.
599634
600635
Examples
601636
--------
@@ -734,7 +769,7 @@ def transform(self, X):
734769
735770
Returns
736771
-------
737-
X_new : numpy array, shape (n_samples, n_components)
772+
scores : numpy array, shape (n_samples, n_components)
738773
The latent variables of X.
739774
740775
Examples
@@ -852,7 +887,7 @@ def get_model_implied_cov(self):
852887
error = np.diag(self.error_vars_.flatten())
853888
return self.loadings_.dot(self.factor_varcovs_).dot(self.loadings_.T) + error
854889

855-
def get_derivatives_implied_cov(self):
890+
def _get_derivatives_implied_cov(self):
856891
"""
857892
Compute the derivatives for the implied covariance
858893
matrix (sigma).
@@ -913,7 +948,7 @@ def get_derivatives_implied_cov(self):
913948
error_covs_dx[:, self.model.error_vars_free].copy(),
914949
intercept_dx)
915950

916-
def get_derivatives_implied_mu(self):
951+
def _get_derivatives_implied_mu(self):
917952
"""
918953
Compute the "derivatives" for the implied means.
919954
Note that the derivatives of the implied means
@@ -1002,12 +1037,12 @@ def get_standard_errors(self):
10021037
(loadings_dx,
10031038
factor_covs_dx,
10041039
error_covs_dx,
1005-
intercept_dx) = self.get_derivatives_implied_cov()
1040+
intercept_dx) = self._get_derivatives_implied_cov()
10061041

10071042
(loadings_dx_mu,
10081043
factor_covs_dx_mu,
10091044
error_covs_dx_mu,
1010-
intercept_dx_mu) = self.get_derivatives_implied_mu()
1045+
intercept_dx_mu) = self._get_derivatives_implied_mu()
10111046

10121047
# combine all of our derivatives; below we will merge all of these
10131048
# together in a single matrix, delta, to use the delta rule

0 commit comments

Comments
 (0)