statsmodels.multivariate.pca.pca

statsmodels.multivariate.pca.pca(data, ncomp=None, standardize=True, demean=True, normalize=True, gls=False, weights=None, method='svd')[source]

Perform Principal Component Analysis (PCA).

Parameters:
data : ndarray

Variables in columns, observations in rows.

ncomp : int, optional

Number of components to return. If None, returns the as many as the smaller to the number of rows or columns of data.

standardize : bool, optional

Flag indicating to use standardized data with mean 0 and unit variance. standardized being True implies demean.

demean : bool, optional

Flag indicating whether to demean data before computing principal components. demean is ignored if standardize is True.

normalize : bool , optional

Indicates whether th normalize the factors to have unit inner product. If False, the loadings will have unit inner product.

gls : bool, optional

Flag indicating to implement a two-step GLS estimator where in the first step principal components are used to estimate residuals, and then the inverse residual variance is used as a set of weights to estimate the final principal components

weights : ndarray, optional

Series weights to use after transforming data according to standardize or demean when computing the principal components.

method : str, optional

Determines the linear algebra routine uses. ‘eig’, the default, uses an eigenvalue decomposition. ‘svd’ uses a singular value decomposition.

Returns:

  • factors ({ndarray, DataFrame}) – Array (nobs, ncomp) of principal components (also known as scores).

  • loadings ({ndarray, DataFrame}) – Array (ncomp, nvar) of principal component loadings for constructing the factors.

  • projection ({ndarray, DataFrame}) – Array (nobs, nvar) containing the projection of the data onto the ncomp estimated factors.

  • rsquare ({ndarray, Series}) – Array (ncomp,) where the element in the ith position is the R-square of including the fist i principal components. The values are calculated on the transformed data, not the original data.

  • ic ({ndarray, DataFrame}) – Array (ncomp, 3) containing the Bai and Ng (2003) Information criteria. Each column is a different criteria, and each row represents the number of included factors.

  • eigenvals ({ndarray, Series}) – Array of eigenvalues (nvar,).

  • eigenvecs ({ndarray, DataFrame}) – Array of eigenvectors. (nvar, nvar).

Notes

This is a simple function wrapper around the PCA class. See PCA for more information and additional methods.