{ "cells": [ { "cell_type": "markdown", "id": "c9b1dda8", "metadata": {}, "source": [ "# Week 4 Example with exam question 12 from 2016" ] }, { "cell_type": "code", "execution_count": null, "id": "5834da31", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats" ] }, { "cell_type": "markdown", "id": "8203344c", "metadata": {}, "source": [ "Enter the data in Python and visualise:" ] }, { "cell_type": "code", "execution_count": null, "id": "007d2712", "metadata": {}, "outputs": [], "source": [ "grades = [2,4,7,10,12]\n", "count = [22,78,84,72,24]\n", "plt.bar(grades,count)\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "58ae5db2", "metadata": {}, "outputs": [], "source": [ "# How many datapoints (observations) in total?\n", "print(np.sum(count))" ] }, { "cell_type": "markdown", "id": "69c6a66b", "metadata": {}, "source": [ "1) What is our Population and what is our Sample?\n", "Population: All possible grades given in the same class over time. \n", "Sample: The grades given a specific year. Is this a representative sample? (the data is actually not independent - the sample is not a random sample across years).\n", "\n", "\n", "2) Does the data follow a normal distribution?\n", "No, the distribution is discrete. \n", "But since n=280 (which is large!) - we can use CLT to justify our calculation of the CI for the average.\n", "Hence we proceed with Method 3.9 to calculate the confidence interval. \n", "\n", "3) What information do we need?\n", "The significance level is given: alpha = 0.05 (95% CI)\n", "n = 280\n", "We need to compute xbar (sample mean) and s (sample standard deviation)\n", "We need to find the t-quantile t_0.975 in a t-distribution with n-1 degrees of freedom. " ] }, { "cell_type": "code", "execution_count": null, "id": "2143ecf3", "metadata": {}, "outputs": [], "source": [ "n = 280" ] }, { "cell_type": "code", "execution_count": null, "id": "d0afc8b5", "metadata": {}, "outputs": [], "source": [ "# calculate xbar:\n", "xbar = 1/n*np.sum(np.array(grades)*np.array(count))\n", "print(xbar)" ] }, { "cell_type": "code", "execution_count": null, "id": "c1b9a18c", "metadata": {}, "outputs": [], "source": [ "# calculate s2 (sample variance) and then s (sample standard deviation)\n", "s2 = 1/(280-1)*np.sum(np.array(count) * (grades-xbar)**2)\n", "print(s2)" ] }, { "cell_type": "code", "execution_count": null, "id": "3603f97e", "metadata": {}, "outputs": [], "source": [ "s = s2**.5\n", "print(s)" ] }, { "cell_type": "code", "execution_count": null, "id": "856c25cb", "metadata": {}, "outputs": [], "source": [ "# fint t_0975 in the t distribution with df = n-1\n", "t0975 = stats.t.ppf(0.975, df=n-1)\n", "print(t0975)" ] }, { "cell_type": "markdown", "id": "b97660de", "metadata": {}, "source": [ "(note that t0975 is almost equal to 1.96 - so the t-distribution is very close to a standard normal distribution since the degrees of fredom os so large)" ] }, { "cell_type": "code", "execution_count": null, "id": "99ca6773", "metadata": {}, "outputs": [], "source": [ "# calculate the standard error of the mean (the standard error of \"xbar\")\n", "SE_xbar = s/(n**.5)\n", "print(SE_xbar)" ] }, { "cell_type": "code", "execution_count": null, "id": "7cdc6a98", "metadata": {}, "outputs": [], "source": [ "# calculate the confidence interval:\n", "upper_limit = xbar - t0975*SE_xbar\n", "lower_limit = xbar + t0975*SE_xbar\n", "\n", "print(upper_limit)\n", "print(lower_limit)" ] }, { "cell_type": "markdown", "id": "376f8a32", "metadata": {}, "source": [ "Does this mean that 95% of grades are between 6.6 and 7.3?? \n", "\n", "NO!" ] } ], "metadata": { "kernelspec": { "display_name": "pernille", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }