{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Least squares fitting of models to data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a quick introduction to `statsmodels` for physical scientists (e.g. physicists, astronomers) or engineers.\n", "\n", "Why is this needed?\n", "\n", "Because most of `statsmodels` was written by statisticians and they use a different terminology and sometimes methods, making it hard to know which classes and functions are relevant and what their inputs and outputs mean." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2023-02-02T19:17:33.703984Z", "iopub.status.busy": "2023-02-02T19:17:33.702906Z", "iopub.status.idle": "2023-02-02T19:17:34.544420Z", "shell.execute_reply": "2023-02-02T19:17:34.543646Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import statsmodels.api as sm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Linear models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Assume you have data points with measurements `y` at positions `x` as well as measurement errors `y_err`.\n", "\n", "How can you use `statsmodels` to fit a straight line model to this data?\n", "\n", "For an extensive discussion see [Hogg et al. (2010), \"Data analysis recipes: Fitting a model to data\"](https://arxiv.org/abs/1008.4686) ... we'll use the example data given by them in Table 1.\n", "\n", "So the model is `f(x) = a * x + b` and on Figure 1 they print the result we want to reproduce ... the best-fit parameter and the parameter errors for a \"standard weighted least-squares fit\" for this data are:\n", "* `a = 2.24 +- 0.11`\n", "* `b = 34 +- 18`" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2023-02-02T19:17:34.548101Z", "iopub.status.busy": "2023-02-02T19:17:34.547467Z", "iopub.status.idle": "2023-02-02T19:17:34.580872Z", "shell.execute_reply": "2023-02-02T19:17:34.580077Z" }, "jupyter": { "outputs_hidden": false } }, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
xyy_err
0201.0592.061.0
1244.0401.025.0
247.0583.038.0
3287.0402.015.0
4203.0495.021.0
\n", "