statsmodels.stats.descriptivestats.describe¶

statsmodels.stats.descriptivestats.describe(data: Union[numpy.ndarray, pandas.core.series.Series, pandas.core.frame.DataFrame], stats: Optional[Sequence[str]] = None, *, numeric: bool = True, categorical: bool = True, alpha: float = 0.05, use_t: bool = False, percentiles: Sequence[Union[int, float]] = (1, 5, 10, 25, 50, 75, 90, 95, 99), ntop: bool = 5) → pandas.core.frame.DataFrame[source]¶

Extended descriptive statistics for data

Parameters

dataarray_like: Data to describe. Must be convertible to a pandas DataFrame.
statsSequence[str], optional: Statistics to include. If not provided the full set of statistics is computed. This list may evolve across versions to reflect best practices. Supported options are: “nobs”, “missing”, “mean”, “std_err”, “ci”, “ci”, “std”, “iqr”, “iqr_normal”, “mad”, “mad_normal”, “coef_var”, “range”, “max”, “min”, “skew”, “kurtosis”, “jarque_bera”, “mode”, “freq”, “median”, “percentiles”, “distinct”, “top”, and “freq”. See Notes for details.
numericbool, default True: Whether to include numeric columns in the descriptive statistics.
categoricalbool, default True: Whether to include categorical columns in the descriptive statistics.
alphafloat, default 0.05: A number between 0 and 1 representing the size used to compute the confidence interval, which has coverage 1 - alpha.
use_tbool, default False: Use the Student’s t distribution to construct confidence intervals.
percentilessequence[float]: A distinct sequence of floating point values all between 0 and 100. The default percentiles are 1, 5, 10, 25, 50, 75, 90, 95, 99.
ntopint, default 5: The number of top categorical labels to report. Default is

Returns

DataFrame: Descriptive statistics