{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "625c1fc4",
   "metadata": {},
   "source": [
    "# Probability distributions in Python - example with the Binomial distribution\n",
    "\n",
    "In this notebook we will work with Binomial distribution, using the scipy.stats subpackage\n",
    "\n",
    "We will try using the different methods of the distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7e04d079",
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import pandas as pd\n",
    "\n",
    "# For specific probability distributions (e.g., the Binomial distribution) we will use a new library: Scipy (actually only the subpackage scipy.stats) \n",
    "import scipy.stats as stats"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9ff84b8b",
   "metadata": {},
   "source": [
    "### Compute probabilities (pdf)\n",
    "\n",
    "For a stochasticvariable following a binomial distribution, with parameters n=6 and p=0.70, compute P(X = 6):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "96a8c007",
   "metadata": {},
   "outputs": [],
   "source": [
    "stats.binom.pmf(k=6, n=6, p=0.70)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1e3eeb03",
   "metadata": {},
   "source": [
    "We can also compute the probability of every possible outcome (P(X = 0), P(X = 1), P(X = 2), P(X = 3), P(X = 4), P(X = 5) and P(X = 6)):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d98346ae",
   "metadata": {},
   "outputs": [],
   "source": [
    "print(stats.binom.pmf(k=[0,1,2,3,4,5,6], n=6, p=0.70))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b60d8b42",
   "metadata": {},
   "outputs": [],
   "source": [
    "# we can also visualise these probabilities (plot the pdf):\n",
    "plt.bar([0,1,2,3,4,5,6], stats.binom.pmf(k=[0,1,2,3,4,5,6], n=6, p=0.70), width=0.1, color='red')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "62af0e1b",
   "metadata": {},
   "source": [
    "### Compute cdf, inverse cdf, mean, variance, etc\n",
    "\n",
    "Python has many other methods for every distribution. \n",
    "\n",
    "We can also compute the cdf (.cdf), the inverse cdf (.ppf), the Expectation value/the mean (.mean), the variance (.var)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5cddbbcf",
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute the cdf; P(X <= x):\n",
    "print(stats.binom.cdf(k=[0,1,2,3,4,5,6], n=6, p=0.70))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7338a239",
   "metadata": {},
   "outputs": [],
   "source": [
    "# visualise the cdf:\n",
    "plt.bar([0,1,2,3,4,5,6], stats.binom.cdf(k=[0,1,2,3,4,5,6], n=6, p=0.70), width=0.1, color='red')\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b1f0c293",
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute the inverse cdf, in python called \".ppf\" for percent point function. \n",
    "# For instance we can compute the quartiles:\n",
    "stats.binom.ppf(q=[0.25, 0.50, 0.75], n=6, p=0.70)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "00f8c35e",
   "metadata": {},
   "source": [
    "Can you visually verify these values from the cdf plot above?\n",
    "\n",
    "hints:<br>\n",
    "find 0.25 on the y-axis and then go to corresponding x-value - this should be Q1<br>\n",
    "find 0.50 on the y-axis and then go to corresponding x-value - this should be Q2 = the median<br>\n",
    "find 0.75 on the y-axis and then go to corresponding x-value - this should be Q3 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0392d4fc",
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute the expectation value / the mean:\n",
    "stats.binom.mean(n=6, p=0.70)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9d4ed517",
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute the variance:\n",
    "stats.binom.var(n=6, p=0.70)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "97f0286e",
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute the standard deviation:\n",
    "stats.binom.std(n=6, p=0.70)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5c125bc0",
   "metadata": {},
   "outputs": [],
   "source": [
    "# compute the median:\n",
    "stats.binom.median(n=6, p=0.70)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "60a2fb07",
   "metadata": {},
   "source": [
    "## Simulating random variates:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "853b377c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# we can simulate a random variate - that is a single observation of the random variable - using .rvs:\n",
    "print(stats.binom.rvs(size=1, n=6, p=0.70))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b3165c1",
   "metadata": {},
   "source": [
    "Try repreating the code above a few times. \n",
    "\n",
    "What are we simulating?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b5bdd195",
   "metadata": {},
   "outputs": [],
   "source": [
    "# We can also simulate many obersavtions in one go:\n",
    "print(stats.binom.rvs(size=100, n=6, p=0.70))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "pernille",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}