The logit and logistic functions

Linear regression works on real numbers \mathbb{R}, that is, the input and output are in \mathbb{R}. For probabilities, this is problematic because the linear regression will happily give a probability of -934, where we know that probabilities should always lie between 0 and 1. This is only by definition, but it is an useful definition in practice. Informally, the logistic function converts values from real numbers to probabilities and the logit function does the reverse.

2020-09-26

The principle of maximum entropy

Say that you are a statistician and are asked to come up with a probability distribution for the current state of knowledge on some particular topic you know little about. (This, in Bayesian statistics, is known as choosing a suitable prior.) To do this, the safest bet is coming up with the least informative distribution via the principle of maximum entropy.

This principle is clearly explained by Jaynes (1968): consider a die which has been tossed a very large number of times N. We expect the average to be 3.5, that is, we expect a distribution where P_n = \frac{1}{6} for each n, see the figure below.

2020-08-12

Writing effectively

According to McEnerney (2014), academics are trained to be poor writers. Eventually, they end up in his office and tell, while crying, that their careers might end soon. One reason why academics are poor writers is that they are expert writers. Expert writers are not experts in writing but are experts who write. An expert writer typically thinks via writing and they assume that this raw output is good enough for readers. However, it isn't good enough. For a start, expert writers have a worldview which differs from the readers' due to the writers' expertise. So, to avoid crying, McEnerney argues that writers should instead write to be valuable to the community of readers.

2020-07-29

Writing checklist

I keep forgetting lessons about writing. After writing a text, my usual response is to declare it as near perfect and never look at it again. In this text, I will describe a checklist, which I can use to quickly debunk the declaration. I plan to improve this checklist over time. Hopefully, text which passes the checklist in a few dozen years from now will, indeed, be near perfect.

The list is roughly ordered by importance. The text should:

Ensure that the writing is valuable to the community of readers.
Be simple (Adams, 2015) or be made as simple as possible, but not simpler. This is also known as Occam's razor, kill your darlings or the KISS principle.
Be polite, that is, not contain a career limiting move. For example, do not "write papers proclaiming the superiority of your work and the pathetic inadequacy of the contributions of A, B, C, ..." (Wadge 2020).
Be consistent. For example, either use the Oxford comma in the entire text or do not use it at all.
Avoid misspellings.
Avoid comma splices.
Place the object before the action, so write "the boy hit the ball" instead of "the ball was hit by the boy".
Flow naturally; just like a normal conversation. This is, for me, contradictory to writing when programming.
Provide a high-level overview of the text. This can be a summary, abstract, a few sentences in the introduction or a combination of these.
Prefer common collocations. A list of common collocations is The Academic Collocation List.
Use simple verbs, for example, prefer "stop" over "cease to move on" or "do not continue".
Avoid dying metaphors such as "stand shoulder to shoulder with" (Orwell, 1946). Metaphors aim to "assist thought by evoking a visual image" (Orwell, 1946). Dying metaphors do not evoke such an image anymore due to overuse (Orwell, 1946).
Avoid pretentious diction such as dressing up simple statements, inappropriate adjectives and foreign words and expressions (Orwell, 1946). For example, respectively "effective", "epic" and "status quo" (Orwell, 1946).
Avoid meaningless words, that is, words for which no clear definition exists. For example, "democracy" and "freedom" have "several different meanings which cannot be reconciled with one another" (Orwell, 1946).

2020-06-28

Combinations and permutations

Counting is simple except when there is a lot to be counted. Combinations and permutations are such a case; they are about counting without replacement. Suppose we want to count the number of possible results we can obtain from picking k numbers, without replacement, from an equal or larger set of numbers, that is, from n where k \leq n. When the same set of numbers in different orders should be counted separately, then the count is called the number of permutations. So, if we have some set of numbers and shuffle some numbers around, then we say that the numbers are permuted. When the same set of numbers in different orders should be counted only once, then the count is called the number of combinations. Which makes sense since it is only about the combination of numbers and not the order.

2020-06-27

Comparing means and SDs

When comparing different papers it might be that the papers have numbers about the same thing, but that the numbers are on different scales. Forr example, many different questionnaires exists measuring the same constructs such as the NEO-PI and the BFI both measure the Big Five personality traits. Say, we want to compare reported means and standard deviations (SDs) for these questionnaires, which both use a Likert scale.

In this post, the equations to rescale reported means and standard deviations (SDs) to another scale are derived. Before that, an example is worked trough to get an intuition of the problem.

2020-05-11

Predicates and reproducibility

While reading texts on statistics and meta-science I kept noticing vagueness. For example, there seems to be half a dozen definitions of replicability in papers since 2016. In this text, I try to formalize the underlying structure.

Edit 2020-11-01: The model below is basically the same, but poorer, than the causal models as presented by, for example, Pearl (2009).

Assume determinism. Assume that for any function f there is a set of predicates, or context, C which need to hold for the function to hold, that is, return the correct answer. Let this be denoted by C \xRightarrow{a} f. For example, Bernoulli's equation solved for \rho only holds for a context C_b containing isentropic flows, that is, C_b \Rightarrow \text{Bernoulli's equation}, where C_b contains isentropic flows. There have been arguments that such contexts need to contain an (open-ended) list of negative conditions (Hoefer, 2003). Let these contexts and the contexts below also contain this list.

2020-03-05

Simple and binary regression

One of the most famous scientific discoveries was Newton's laws of motion. The laws allowed people to make predictions. For example, the acceleration for an object can be predicted given the applied force and the mass of the object. Making predictions remains a popular endeavor. This post explains the simplest way to predict an outcome for a new value, given a set of points.

To explain the concepts, data on apples and pears is generated. Underlying relations for the generated data are known. The known relations can be compared to the results from the regression.

2020-02-02

The greatest sales deck someone else has ever seen

According to Andy Raskin, the greatest sales deck has five elements. In this post, I'll present an adapted version. In line with the rest of this blog, I'll give an example of selling a programming language which is not Blub to a company, the newer language is called Y. Assume that the company is fine with language Blub because, well, everything is written in Blub and all the employees know Blub.

2020-01-24

Correlations

Correlations are ubiquitous. For example, news articles reporting that a research paper found no correlation between X and Y. Also, it is related to (in)dependence, which plays an important role in linear regression. This post will explain the Pearson correlation coefficient. The explanation is mainly based on the book by Hogg et al. (2018).

In the context of a book on mathematical statistics, certain variable names make sense. However, in this post, some variable names are changed to make the information more coherent. One convention which is adhered to is that single values are lowercase, and multiple values are capitalized. Furthermore, since in most empirical research we only need discrete statistics, the continuous versions of formulas are omitted.

◀ prev

▶ next