Distributions

Before we can talk about statistics, we need to talk about distributions. Let's say that we want to make predictions about the world. When sticking to simple problems, this is easy. Suppose you drop a bowling ball from 1 meter and suppose your toes are below it. What is the chance that your toes will start to hurt within a few seconds? If there is nothing special going on then the chance is 100%. Gravity will work 100% of the time and a bowling ball hitting toes will be painful 100% of the time.

Say that you are a particularily stubborn researcher with stubborn friends and you find 40 people who all try this experiment 10 times. At the end of the experiment, all these 40 people reported that the ball hit their toes 10 times. Let's show this in a plot:

This type of plot is called a histogram. It has grouped all the people who reported the same number; in this case it placed everyone in the same group. This is an useful way to show visualize large amounts of data and we'll use it below to introduce distributions.

Now, suppose that there is a very strong wind; strong enough to make the ball fall next to your toes. What is the chance that your toes will start to hurt within a few seconds? Hmm, it depends. It depends on the speed of the wind on that exact moment, the rotation and angle that the ball is starting off with and many more things. In a perfect world, we could take these variables and calculate exactly at which times and with which rotations and angles the ball will hit your toes. Unfortunately, we cannot predict the wind accurately for a specific place and time because it depends on the temperature 1 kilometer away as well as the airflow of the last seconds and that depends on whether it was nice weather yesterday and, well, it could depend on a near infinite amount of things. The thing that we can do to say something useful is to repeat the experiment many times and establish probabilities for different events. In this case, we have 10 possible events, namely that the ball hit the toes 0 times, 1 times, ..., 10 times. So, in other words, to say something useful, we can repeat the experiment many times, establish how often each event occurred and use that information to say something about the probability of each event.

Time to put this idea in practice. Let's invite again our 40 friends and ask them to drop the ball 10 times. Also, let's gather again all the data and plot it:

This looks

Built with Julia 1.7.2 and

CairoMakie 0.7.2
DataFrames 1.3.2
StableRNGs 1.0.0
StatsBase 0.33.15

To run this page on your own computer, download this file and open it with Pluto.jl.