Typically, when making predictions via a linear model, we fit the model on our data and make predictions from the fitted model. However, this doesn't take much foreknowledge into account. For example, when predicting a person's length given only the weight and gender, we already have an intuition about the effect size and direction. Bayesian analysis should be able to incorporate this prior information.

In this blog post, I aim to figure out whether foreknowledge can, in theory, increase model accuracy. To do this, I generate data and fit a linear model and a Bayesian binary regression. Next, I compare the accuracy of the model parameters from the linear and Bayesian model.

begin using AlgebraOfGraphics using CairoMakie using CategoricalArrays using DataFrames using GLM using MLDataUtils: rescale! using Random: seed! using Statistics using StatsFuns using Turing end

We define the model as \(g_i = a_e * a_i + r_e * r_i + \epsilon_i = 1.1 * a_i + 1.05 * r_i + \epsilon_i\) where \(a_e\) is the coefficient for the age, \(r_e\) is a coefficient for the nationality and \(\epsilon_i\) is some random noise for individual \(i\).

We generate data for \(n\) individuals via:

function generate_data(i::Int) seed!(i) n = 120 I = 1:n P = [i % 2 == 0 for i in I] r_2(x) = round(x; digits=2) A = r_2.([p ? rand(Normal(aₑ * 18, 1)) : rand(Normal(18, 1)) for p in P]) R = r_2.([p ? rand(Normal(rₑ * 6, 3)) : rand(Normal(6, 3)) for p in P]) E = r_2.(rand(Normal(0, 1), n)) G = aₑ .* A + rₑ .* R .+ E G = r_2.(G) df = DataFrame(age=A, recent=R, error=E, grade=G, pass=P) end;

	age	recent	error	grade	pass
1	17.93	7.49	0.48	28.07	false
2	20.33	1.19	-0.8	22.81	true
3	17.19	10.25	-1.16	28.51	false
4	22.26	5.51	-0.47	29.8	true
5	19.16	0.07	-2.05	19.1	false
6	20.07	10.87	-0.42	33.07	true
7	19.75	6.37	-1.43	26.98	false
8	18.97	4.25	-0.46	24.87	true
9	16.96	5.22	-0.51	23.63	false
10	19.47	-0.88	-0.61	19.88	true
...
120	21.3	3.17	-0.4	26.36	true

age

recent

error

grade

pass

17.93

7.49

0.48

28.07

false

20.33

1.19

-0.8

22.81

true

17.19

10.25

-1.16

28.51

false

22.26

5.51

-0.47

29.8

true

19.16

0.07

-2.05

19.1

false

20.07

10.87

-0.42

33.07

true

19.75

6.37

-1.43

26.98

false

18.97

4.25

-0.46

24.87

true

16.96

5.22

-0.51

23.63

false

19.47

-0.88

-0.61

19.88

true

...

120

21.3

3.17

-0.4

26.36

true

Linear regression

First, we fit a linear model and verify that the coefficients are estimated reasonably well. Here, the only prior information that we give the model is the structure of the data, that is, a formula.

StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}} grade ~ 1 + age + recent Coefficients: ──────────────────────────────────────────────────────────────────────── Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95% ──────────────────────────────────────────────────────────────────────── (Intercept) 0.693364 1.23564 0.56 0.5758 -1.75376 3.14049 age 1.05597 0.0638938 16.53 <1e-31 0.929434 1.18251 recent 1.05645 0.0267033 39.56 <1e-68 1.00357 1.10934 ────────────────────────────────────────────────────────────────────────

Notice how these estimated coefficients are close to the coefficients that we set for age and recent, namely a_e = 1.1 ≈ 1.05597 and r_e = 1.05 ≈ 1.05645, as expected.

Bayesian regression

For the Bayesian regression we fit a model via Turing.jl. Now, we give the model information about the structure of the data as well as priors for the size of the coefficients. For demonstration purposes, I've set the priors to the correct values. This is reasonable because I was wondering whether finding a good prior could have a positive effect on the model accuracy.

@model function bayesian_model(ages, recents, grades, n) intercept ~ Normal(0, 5) βₐ ~ Normal(aₑ, 1) βᵣ ~ Normal(rₑ, 3) σ ~ truncated(Cauchy(0, 2), 0, Inf) μ = intercept .+ βₐ * ages .+ βᵣ * recents grades ~ MvNormal(μ, σ) end;

Let's plot the density for the coefficient estimates \(\beta_a\) and \(\beta_r\):

and compare the outputs from both models:

coefficient	true value	linear estimate	linear error	bayesian estimate	bayesian error
aₑ	1.1	1.05597	4.0 %	1.0585228513902123	3.8 %
rₑ	1.05	1.05645	0.6 %	1.056853165251212	0.7 %

Conclusion

After giving the true coefficients to the Bayesian model in the form of priors, it does score better than the linear model. However, the differences aren't very big. This could be due to the particular random noise in this sample E or due to the relatively big sample size. The more samples, the more likely it is that the data will overrule the prior. In any way, there are real-world situations where gathering extra data is more expensive than gathering priors via reading papers. In those cases, the increased accuracy introduced by using priors could have serious benefits.

Built with Julia 1.10.4 and

AlgebraOfGraphics 0.6.19
CairoMakie 0.11.11
CategoricalArrays 0.10.8
DataFrames 1.6.1
GLM 1.9.0
MLDataUtils 0.5.4
Statistics 1.10.0
StatsFuns 1.3.1
Turing 0.30.9