{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# IntroStat Week 4 \n", "\n", "Simulation of sample from a normal distribution" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulation: Distribution of the sample mean" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# 'True' values in theoretical population \n", "mu = 178\n", "sigma = 12" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Draw 10 random numbers:\n", "sample = stats.norm.rvs(mu, sigma, size=10)\n", "print(sample)\n", "\n", "# Calculate the sample mean:\n", "print(sample.mean())\n", "\n", "# Plot histogram \n", "plt.hist(sample, density=True)\n", "plt.xlim(140,220)\n", "plt.ylim(0,0.20)\n", "# Plot the sample mean\n", "plt.axvline(sample.mean(), linestyle='--', color=\"black\")\n", "# Plot the true mean of underlying distribution\n", "plt.axvline(mu, linestyle='-', color=\"red\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Repeat the cell above a few times. \n", "\n", "What do you observe? Do you think the sample mean is a good estimate of the true mean?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Now simulate 100 samples and plot histogram of the 100 sample means\n", "\n", "# Draw (10 x 100) random numbers\n", "samples_100 = stats.norm.rvs(mu, sigma, size=(10,100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\"samples_100\" is now a 2-dimensional array. \n", "\n", "Each column (with 10 elements) is one sample (with sample size n=10)\n", "\n", "There are 100 columns (100 samples, each of size n=10)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calculate sample mean of each sample\n", "xbar = samples_100.mean(axis=0)\n", "\n", "# print the (100 values of) sample means\n", "print(xbar)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot histogram of the mean values\n", "plt.hist(xbar, density=True, color=\"black\")\n", "plt.axvline(mu, linestyle='-', color=\"red\")\n", "plt.xlim(140,220)\n", "plt.show()\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Simulation: Distribution of the sample variance from sample of normal distributed data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Draw 10 random numbers:\n", "sample = stats.norm.rvs(mu, sigma, size=10)\n", "print(sample)\n", "\n", "# Calculate the sample standard deviation:\n", "print(sample.std(ddof=1))\n", "\n", "# Visualise the sample standard deviation with a vertical bar:\n", "plt.hist(sample, density=True)\n", "plt.xlim(140,220)\n", "plt.ylim(0,0.20)\n", "# Plot the sample mean\n", "plt.axvline(sample.mean(), linestyle='--', color=\"black\")\n", "# Plot the true mean of underlying distribution\n", "plt.axvline(mu, linestyle='-', color=\"red\")\n", "# Plot the sample standard deviation\n", "plt.hlines(y=0.10, xmin=sample.mean()-sample.std(ddof=1), xmax=sample.mean()+sample.std(ddof=1), colors='black', linestyle='--')\n", "# Plot the true standard deviation of underlying distribution\n", "plt.hlines(y=0.11, xmin=mu-sigma, xmax=mu+sigma, colors='red')\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the plot above the vertical bar indicates one standard deviation to each side (from the mean)\n", "\n", "Bot the sample standard deviation and the true sigma is visualised. \n", "\n", "Try repeating the cell above a few times. Is the sample standard deviation a good estimate for the true sigma?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Calculate sample variance of each of the 100 samples stired in \"samples_100\":\n", "s2 = samples_100.var(axis=0, ddof=1)\n", "\n", "# print the (100 values of) sample variances\n", "print(s2)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Plot histogram of the sample variance values\n", "plt.hist(s2, density=True, color=\"black\")\n", "plt.axvline(sigma**2, linestyle='-', color=\"red\")\n", "plt.xlim(0,500)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The variance is always positive and does not follow a normal distribution.
\n", "The distribution of variance is not symmestric. " ] } ], "metadata": { "kernelspec": { "display_name": "pernille", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 2 }