.. _add_data: Datasets ======== For a list of currently available datasets and usage instructions, see the :ref:`datasets page `. License ------- To be considered for inclusion in `statsmodels`, a dataset must be in the public domain, distributed under a BSD-compatible license, or we must obtain permission from the original author. Adding a dataset: An example ---------------------------- The Nile River data measures the volume of the discharge of the Nile River at Aswan for the years 1871 to 1970. The data are copied from the paper of Cobb (1978). **Step 1**: Create a directory `datasets/nile/` **Step 2**: Add `datasets/nile/nile.csv` and a new file `datasets/__init__.py` which contains :: from data import * **Step 3**: If `nile.csv` is a transformed/cleaned version of the original data, create a `nile/src` directory and include the original raw data there. In the `nile` case, this step is not necessary. **Step 4**: Copy `datasets/template_data.py` to `nile/data.py`. Edit `nile/data.py` by filling-in strings for COPYRIGHT, TITLE, SOURCE, DESCRSHORT, DESCLONG, and NOTE. :: COPYRIGHT = """This is public domain.""" TITLE = """Nile River Data""" SOURCE = """ Cobb, G.W. 1978. The Problem of the Nile: Conditional Solution to a Changepoint Problem. Biometrika. 65.2, 243-251, """ DESCRSHORT = """Annual Nile River Volume at Aswan, 1871-1970"" DESCRLONG = """Annual Nile River Volume at Aswan, 1871-1970. The units of measurement are 1e9 m^{3}, and there is an apparent changepoint near 1898.""" NOTE = """ Number of observations: 100 Number of variables: 2 Variable name definitions: year - Year of observation volume - Nile River volume at Aswan The data were originally used in Cobb (1987, See SOURCE). The author acknowledges that the data were originally compiled from various sources by Dr. Barbara Bell, Center for Astrophysics, Cambridge, Massachusetts. The data set is also used as an example in many textbooks and software packages. """ **Step 5:** Edit the docstring of the `load` function in `data.py` to specify which dataset will be loaded. Also edit the path and the indices for the `endog` and `exog` attributes. In the `nile` case, there is no `exog`, so everything referencing `exog` is not used. The `year` variable is also not used. **Step 6:** Edit the `datasets/__init__.py` to import the directory. That's it! The result can be found `here `_ for reference.