statsmodels.datasets.get_rdataset

statsmodels.datasets.get_rdataset(dataname, package='datasets', cache=False)[source]

download and return R dataset

Parameters:
datanamestr

The name of the dataset you want to download

packagestr

The package in which the dataset is found. The default is the core ‘datasets’ package.

cachebool or str

If True, will download this data into the STATSMODELS_DATA folder. The default location is a folder called statsmodels_data in the user home folder. Otherwise, you can specify a path to a folder to use for caching the data. If False, the data will not be cached.

Returns:
datasetDataset

A statsmodels.data.utils.Dataset instance. This objects has attributes:

  • data - A pandas DataFrame containing the data

  • title - The dataset title

  • package - The package from which the data came

  • from_cache - Whether not cached data was retrieved

  • __doc__ - The verbatim R documentation.

Notes

If the R dataset has an integer index. This is reset to be zero-based. Otherwise the index is preserved. The caching facilities are dumb. That is, no download dates, e-tags, or otherwise identifying information is checked to see if the data should be downloaded again or not. If the dataset is in the cache, it’s used.


Last update: Mar 18, 2024