statsmodels.datasets.get_rdataset

statsmodels.datasets.get_rdataset(dataname, package='datasets', cache=False)[source]

download and return R dataset

Parameters:
  • dataname (str) – The name of the dataset you want to download
  • package (str) – The package in which the dataset is found. The default is the core ‘datasets’ package.
  • cache (bool or str) – If True, will download this data into the STATSMODELS_DATA folder. The default location is a folder called statsmodels_data in the user home folder. Otherwise, you can specify a path to a folder to use for caching the data. If False, the data will not be cached.
Returns:

dataset – A statsmodels.data.utils.Dataset instance. This objects has attributes:

  • data - A pandas DataFrame containing the data
  • title - The dataset title
  • package - The package from which the data came
  • from_cache - Whether not cached data was retrieved
  • __doc__ - The verbatim R documentation.

Return type:

Dataset instance

Notes

If the R dataset has an integer index. This is reset to be zero-based. Otherwise the index is preserved. The caching facilities are dumb. That is, no download dates, e-tags, or otherwise identifying information is checked to see if the data should be downloaded again or not. If the dataset is in the cache, it’s used.