{ "cells": [ { "cell_type": "markdown", "id": "3f7cd02e", "metadata": {}, "source": [ "## Area of plates simulation" ] }, { "cell_type": "code", "execution_count": null, "id": "4535e984", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import scipy.stats as stats" ] }, { "cell_type": "markdown", "id": "604816b9", "metadata": {}, "source": [ "We image producing many plates and meassuring the width and length of each plate.\n", "\n", "Instead of doing this IRL we simulate :)\n", "\n", "We store our simulated data as we would store real data (here in a pandas DataFrame)." ] }, { "cell_type": "code", "execution_count": null, "id": "9323a635", "metadata": {}, "outputs": [], "source": [ "np.random.seed(2242)" ] }, { "cell_type": "code", "execution_count": null, "id": "56c3f31a", "metadata": {}, "outputs": [], "source": [ "# number of simulations:\n", "k = 10000" ] }, { "cell_type": "code", "execution_count": null, "id": "ed206437", "metadata": {}, "outputs": [], "source": [ "# simulating width and length from normal distributions:\n", "\n", "simulation_data = pd.DataFrame({\n", " 'length': stats.norm.rvs(size=k, loc=2, scale=0.01),\n", " 'width': stats.norm.rvs(size=k, loc=3, scale=0.02)\n", " })\n", "\n", "print(simulation_data.head())" ] }, { "cell_type": "code", "execution_count": null, "id": "9fffdf58", "metadata": {}, "outputs": [], "source": [ "# for each simulated plate we calculate the area and store this (as a new column) in our DataFrame:\n", "simulation_data['area'] = simulation_data['length'] * simulation_data['width']\n", "\n", "print(simulation_data.head())" ] }, { "cell_type": "code", "execution_count": null, "id": "24e3d643", "metadata": {}, "outputs": [], "source": [ "# Compute mean and standard deviation from simulated plate areas:\n", "print(simulation_data['area'].mean()) # mean area\n", "print(simulation_data['area'].std(ddof=1)) # sample standard deviation of area\n", "print(simulation_data['area'].var(ddof=1)) # sample variance of area" ] }, { "cell_type": "code", "execution_count": null, "id": "45af0e55", "metadata": {}, "outputs": [], "source": [ "# Lets plot the distribution of simulated plate areas:\n", "plt.hist(simulation_data['area'], bins=100)\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "eab013f9", "metadata": {}, "source": [ "What do you think about this distribution? (describe in own words)" ] }, { "cell_type": "code", "execution_count": null, "id": "4ce8234f", "metadata": {}, "outputs": [], "source": [ "# how many values deviate by more than 0.10 from 6.00 m2 ?\n", "\n", "# plates that have an area below 5.90 are very \"small\" (indicated by a boolean variable)\n", "simulation_data['small'] = simulation_data['area'] < 5.90\n", "\n", "# plates that have an area above 6.10 are very \"large\" (indicated by a boolean variable)\n", "simulation_data['large'] = simulation_data['area'] > 6.10\n", "\n", "print(simulation_data.head())" ] }, { "cell_type": "code", "execution_count": null, "id": "49ec03b3", "metadata": {}, "outputs": [], "source": [ "# Showing an example of a row that has a \"True\" value:\n", "print(simulation_data.iloc[43:48])" ] }, { "cell_type": "code", "execution_count": null, "id": "9c4b1c0a", "metadata": {}, "outputs": [], "source": [ "# how many values deviate by more than 0.10 from 6m2 ? \n", "# these are the very \"small\" plus the very \"large\":\n", "\n", "# In Python we can simply add \"True\" (1) and \"False\" (0) values:\n", "print(np.sum(simulation_data['small']) + np.sum(simulation_data['large']))\n", "\n", "# same result as a fraction of total number os simulations:\n", "print((np.sum(simulation_data['small']) + np.sum(simulation_data['large']))/k)" ] }, { "cell_type": "markdown", "id": "3b3c9b6d", "metadata": {}, "source": [ "approx 4-5% of plates deviate more than 0.10 from 6m2" ] }, { "cell_type": "markdown", "id": "c660078d", "metadata": {}, "source": [ "### Other probabilities (from unknown distributions)" ] }, { "cell_type": "code", "execution_count": null, "id": "5585e925", "metadata": {}, "outputs": [], "source": [ "# Imagine the plates have a thickness between 0.95cm and 1.05cm\n", "\n", "# Simulate the thickness of each plate (assume the thickness is independent of width and length)\n", "simulation_data['thickness'] = stats.uniform.rvs(size=k, loc=0.0095, scale=0.001)\n", "\n", "print(simulation_data.head())" ] }, { "cell_type": "markdown", "id": "6d5f5620", "metadata": {}, "source": [ "KAHOOT! (x1)" ] }, { "cell_type": "code", "execution_count": null, "id": "f9ab9360", "metadata": {}, "outputs": [], "source": [ "plt.hist(simulation_data['thickness'])\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "251654e5", "metadata": {}, "source": [ "What is the distribution of plate thicknesses? Is it as you expected?" ] }, { "cell_type": "code", "execution_count": null, "id": "5434aba1", "metadata": {}, "outputs": [], "source": [ "# As a rule plates are discarded if the value = (thickness^2)/area is less than 0.000015\n", "\n", "# make a boolean (True/False) variable to indicate if each simulated plate will be discarded:\n", "\n", "simulation_data['value'] = simulation_data['thickness']**2/simulation_data['area']\n", "simulation_data['discard'] = simulation_data['value'] < 1.5e-5\n", "\n", "print(simulation_data.head())" ] }, { "cell_type": "code", "execution_count": null, "id": "1f1d98d1", "metadata": {}, "outputs": [], "source": [ "# visualise the distribution of the \"value\" (rule for discarding plates):\n", "plt.hist(simulation_data['value'])\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "90f1a5fe", "metadata": {}, "source": [ "What do you think about this distribution? How would you describe it in your own words?" ] }, { "cell_type": "code", "execution_count": null, "id": "2322a336", "metadata": {}, "outputs": [], "source": [ "# how many plates (out of k) are discarded?\n", "print(simulation_data['discard'].sum())\n", "print(simulation_data['discard'].sum()/k)" ] }, { "cell_type": "markdown", "id": "fb72a611", "metadata": {}, "source": [ "approximately 1% of plates are discarded" ] }, { "cell_type": "markdown", "id": "c81450c5", "metadata": {}, "source": [ "Tilbage til slides!" ] }, { "cell_type": "code", "execution_count": null, "id": "b11783b0", "metadata": {}, "outputs": [], "source": [ "# variance of the \"value\"\n", "print(simulation_data['value'].var(ddof=1))" ] }, { "cell_type": "code", "execution_count": null, "id": "aec67ae5", "metadata": {}, "outputs": [], "source": [ "# variance of the plate thickness:\n", "simulation_data['thickness'].var(ddof=1)" ] }, { "cell_type": "code", "execution_count": null, "id": "5f3a36b3", "metadata": {}, "outputs": [], "source": [ "# theoretical variance of the plate thickness:\n", "0.001**2 / 12" ] }, { "cell_type": "code", "execution_count": null, "id": "275ddd50", "metadata": {}, "outputs": [], "source": [ "# using error propagation to calculate variance of \"value\":\n", "(0.01**2/(2**2*3))**2 * 0.01**2 + (0.01**2/(2*3**3))**2 * 0.02**2 + (2*0.01/(2*3))**2 * (0.001**2/12)" ] } ], "metadata": { "kernelspec": { "display_name": "pernille", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.5" } }, "nbformat": 4, "nbformat_minor": 5 }