🔍 Search   đź”­ Blogroll   🔄 RSS  

Building open source software. Contact: email.

2 hours ago

Jan Terlouw about "touwtjes uit de brievenbussen"

In honor of Jan Terlouw who passed away today. The first part of a famous story he told in the Netherlands (English below):

Toen ik een kind was, was het oorlog. De tweede wereldoorlog was van mijn 7e tot mijn 13e jaar. En toen het voorbij was, toen ik student was, toen ik mijn dienstplicht vervulde, toen ik mijn eerste onderzoeksbaan kreeg, toen was Nederland wat je noemde in wederopbouw. De huizen werden hersteld, de kerken, de andere gebouwen. Maar vooral waren we de welvaardstaat aan het bouwen. AOW kwam op stand. Ouderdomspensioen voor iedereen. En het ziekenfonds ofwel ziektewet. Medische zorg voor iedereen. Dat gebeurde. En in die tijd zat ik met mijn jonge gezin, we woonden in een single-huis in Utrecht. En overal hingen touwtjes uit de brievenbus. De kinderen konden gewoon de voordeuren opentrekken en bij mekaar binnenlopen. Volwassenen ook. We vertrouwden mekaar.

Show more

2025-05-14

Why I wrote fx

I used to always run my blog via a static site generator. For people unfamiliar with this, it essentially means that you modify files locally and then send these files to a static site hoster like GitHub Pages or Cloudflare. This worked fine but it wasn't very fast. Uploading a blogpost would take me a few minutes. I would need to modify the files locally, verify that it looked good, and then send them to the server.

The alternative was to setup Wordpress, but somehow that didn't feel good to me. Wordpress servers always felt too bloated. I just wanted exactly the things I needed and nothing more. No plugins. No styling options. No intricate menu's. Social media doesn't have those things either. On social media sites like X/Twitter, you can only made a profile banner and add posts. And it works. Isn't this what writing is about anyway? You write articles/blogs/letters/memos, or whatever you want to call it and publish it. Social media is widely used so it must work.

Show more

2025-05-14

Grading Historians

I've listened to many podcasts with historians over the years and I just realized one thing that my favorite historians all have in common: you can just watch/listen to a talk from years ago and it will still feel accurate.

You basically can't hear which year it currently is. Great historians talk about long lasting and persistent phenomena instead of ephemeral ones.

2025-05-13

Misinformation

There is this common narrative that social media promotes misinformation since you can end up in a bubble. But non-social media also doesn’t have to be correct, does it? So then it’s about which of the two publishes more misinformation? It probably depends on what bubble you might end up in.

At least you could say bubbles are more diverse. One person can be inside one bubble and another in another so for society overall they might cancel out? Or maybe new ideas might start to arise?

Maybe the argument should be that mass (social) media may lead to misinformation since both correct and incorrect messages would be amplified.

2025-05-13

Stephen Kotkin's Hopeful Future for the West

In 2017, Stephen Kotkin gave a 5 hour long talk about Spheres of Influence at the IWM Vienna. The talk is long not because he repeats the same ideas, but because he doesn't give conclusions. As I see it, his talks clearly step through both sides of an argument and then leave it up to the audience to decide what to believe; largely in-line with the Socratic method. One part of the talk stood out to me: he said that the West does not have a clear hopeful picture for the future anymore.

Show more

2025-05-13

Never again

After World War II, people introduced the phrase "never again". Never again should the whole world fight a war against each other. Never again should Germany, Japan, and Italy attempt to get rid of people from other countries. Or as Elie Wiesel wrote: "Never again' becomes more than a slogan: It's a prayer, a promise, a vow ... never again the glorification of base, ugly, dark violence."

As a thought experiment, how can we ensure that this never happens again? "territorial integrity" seems to me like a great start. Allowing nations (read: groups of people) defend their border to me seems to go a long way in protecting people from each other. Sovereignty also plays an important role. It ensures that people are allowed to rule themselves. What is the point of a fixed border if someone else makes decisions for you?

Show more

2025-05-12

Maybe podcasts, YouTubers, and individual producers are more beneficial to society than some institution that says things. An individual can be held accountable and thus needs to show integrity. For example, which companies are more integer generally, founder-led or institutions?

2025-05-08

Harsh Dwivendi on the Windsurf Sale

Harsh Dwivedi:

Windsurf sold for $3 Billion
Cursor now valued at $9 Billion

Windsurf bought by OpenAI
OpenAI is an existing investor of Cursor

Both are VSCode forks
VSCode is owned by Microsoft

bubble.jpg

2025-05-07

Heroes

A few months ago I suddenly wondered where have our heroes gone? Or at least where are the heroes that stand up and do the right thing regardless of the negative effects that might have on them. Most CEOs seem to care more about their own income than integrity, which in the introduction of terms like “enshittification”. I noticed this after reading Investing Between the Lines by L. J. Rittenhouse. In the book, the author argues that integrity is associated with outstanding long-term investment results. So with the book in hand, I have gone through about one hundred shareholder letters and found only three that satisfied the basic criteria from the book. Also TV and movies seem to be full of mostly dull characters that mostly care about not stepping out of line. Or what would be the modern equivalent of Die Hard? It seems like most things become duller, blander, and more average. And from the politicians I’m currently missing a strong positive story for where we should be going. It seems to be mostly against things. Against Israel, against climate change, against capitalism, and even against the West.

Show more

2025-05-07

The large Dutch pension fund ABP achieved an annual return of about 6% over the last 20 years, according to https://www.abp.nl/over-abp/duurzaam-en-verantwoord-beleggen/beleggingsresultaten. The S&P 500 had a 10.392% return (dividends invested) over 20 years, according to https://tradethatswing.com/average-historical-stock-market-returns-for-sp-500-5-year-up-to-150-year-averages/.

Show more

2025-05-07

Publishing a Snap Package

I'm trying to get jas (just an installer) published in the snap package registry for a few weeks now. This is how the process is going so far.

According to the docs, I can just register a new snap and publish it. So in a fresh Ubuntu 24.04, I ran:

$ sudo apt update

$ sudo apt install neovim

$ sudo snap install snapcraft --classic

$ git clone https://github.com/rikhuijzer/jas.git

$ mv pkg/snapcraft.yaml .

$ snapcraft # installs LXD at first run

$ sudo apt install gnome-keyring

$ snapcraft login

$ snapcraft register jas

$ snapcraft upload --release=edge jas_0.2.0_amd64.snap
Store operation failed:
- resource-not-found: Snap not found for name=jas
Full execution log: '/root/.local/state/snapcraft/log/snapcraft-20250408-093243.551840.log'

Show more

2025-05-07

Kyle Chan: "BYD Mexico plant: BYD’s plans for building a new EV plant in Mexico were put on hold after Trump got re-elected. Then suddenly in March, China’s Ministry of Commerce jumped ahead and withheld approval for BYD’s Mexico plant, arguing that BYD’s technology might get “leaked” to the US (which doesn’t make any sense given the many BYD plants popping up all over the world). [...]

One explanation for this pattern of actions is a battle over symbolic control. Rather than getting hit by a ban by the other side, it looks like you have more control when you jump ahead and implement the ban first yourself. It’s like the classic line: 'You can’t fire me—I quit.'"

2025-05-07

Arvind Narayanan: “AI is helpful despite being error-prone if it is faster to verify the output than it is to do the work yourself.”

2025-05-05

Buffett on DOGE and the US deficit. He said the deficit is unsustainable and that it’s an extremely difficult thing to do. He concluded with: “It’s a job I don’t want, but it’s a job I think should be done.”

https://www.youtube.com/live/1LWBphTImy4 at 04:50:16.

2025-05-04

Buffett: “You never reach an answer in this business. You reach a point of action […]”

2025-04-30

Done with GitHub Actions Supply Chain Attacks

Recently, there was another security incident with GitHub Actions. This time, an attacker managed to modify the . After the change, the action printed secrets to the logs which the attacker (and anyone else) could then scrape. More specifically, not only the most recent version, but "most versions of " were affected. For example,

Show more

2025-04-30

How fast is CeTZ-Plot?

In a recent post, I showed how CeTZ-Plot can be used to plot data from a CSV file. I posted this on Reddit and got some interesting comments. One comment was that CeTZ-Plot was too slow for plotting data with 90k rows to SVG. This could be due to SVG being a vector format, so it will always add all 90k points even if they are on top of each other. It's probably a better idea to plot PNG in such cases.

But let's still see how fast CeTZ-Plot is. This is actually an interesting question in general because CeTZ-Plot is written in Typst. Typst is a new typesetting system similar to LaTeX. Writing in this system is probably slower than writing in a more optimized language. But on the other hand, Typst was written in Rust so maybe the performance is not too bad.

Show more

2025-04-29

Plotting a CSV file with Typst and CeTZ-Plot

Whenever I need to plot some data, I usually prefer to have a tool that

  • is fast,

  • is easy to install,

  • is reliable,

  • is flexible,

  • is free to use,

  • produces high-quality plots, and

  • doesn't require many dependencies.

gnuplot and matplotlib are popular choices, but I personally don't like the appearance of gnuplot and I usually am not so happy with Python's large amount of dependencies.

For quick plotting, I recently discovered CeTZ-Plot. It's a plotting library inside Typst. Typst is a modern alternative to LaTeX, so it is meant to create full documents, but it's also quite easy to use it to create images.

Show more

2025-04-28

File Upload in fx

File uploads for fx are now implemented in pull request #21.

Here is an image that is hosted on this site:

sunset.jpg

2025-04-27

I don't like the way you're running our country

An old joke from Ronald Reagan:

An American and a Russian are arguing about their two countries. The American says look: "In my country, I can walk into the Oval Office, pound the president's desk, and say 'Mr. President, I don't like the way you're running our country!'".

And the Russian says "I can do that." The American says "You can?" The Russian says "Yes, I can walk right into the Kremlin, go to the General Secretary's office, slam my fist on his desk and say "I don't like the way President Reagan is running his country."

2025-04-26

TIL: EV fires can quickly be extinguished via a special lance that punctures the battery and sprays water on the cells directly (https://youtu.be/1FWY3LNuPU4).

2025-04-26

Blogs are still very much alive. This post by Tommy Breslein for example would not have worked as one or more tweets. As a YouTube video maybe it would have worked but even then it’s hard to beat the effectiveness of a clear blog post.

2025-04-25

Set -euxo pipefail

To make working with Bash scripts less problematic, I've switched to these default shebang and settings:

#!/usr/bin/env bash
set -euxo pipefail

The first line tells the interpreter to run the file via bash as found at /usr/bin/env. This /usr/bin/env is one of the most platform-independent locations that I know (it even works on NixOS).

The second line makes it much easier to find problems in the script. The -e option will cause the script to fail immediately when a command fails, the -o pipefail will also crash if one of the commands in a pipe fails (this could have avoided a Cloudflare outage), -u will treat unset variables as error, and -x will print each command before execution.

2025-04-24

This speech by Oliver Anthony is exactly my social media experience as well. A few people are famous and get all the attention while the rest are "nobodies". I think it's a bit inherent to the massive scale of social media. In a small town, it's hard to be a complete nobody. At least some people know your parents or have seen you in the supermarket, for example.

2025-04-24

I think I’m going to skip the comment feature for fx (https://fx.huijzer.xyz). Niall Ferguson recently argued that people in technology often forget about the past and say that now everything is different even though usually it isn’t (https://youtu.be/giZC4pCqB4o). Most of the time, new things are just old things with slight variations.

So what are old things that are great? Publishing text from one paragraph length to multiple paragraphs. This is what newspapers did for more than a century. Then you still have book authors and authors like Dijkstra (EWD notes) that also published. To write a comment, you have to write your own publication. Not some effortless comment a la modern social media.

Show more

2025-04-21

"Why are big tech companies slow? Because they've packed in as many features as possible in order to make more money, and the interaction of existing features adds an unimaginable amount of cognitive load" (https://www.seangoedecke.com/difficulty-in-big-tech/).

2025-04-18

I wish Dijkstra was alive and on Twitter. Quotes like this are perfect for the place: "The use of COBOL cripples the mind; its teaching should, therefore, be regarded as a criminal offense."

2025-04-17

Hello world

2025-04-12

Every programming language has its 'killer' domain

I recently read the article Every programming language needs its killer app to succeed and think the article makes a great point. It made a lot of sense to me. However, the term "killer app" seems to not really work out with the examples. Only Ruby lists Ruby on Rails as its killer app, but that's basically it.

Instead, I think it's about having a killer domain. So, going through the examples from the original article, here is my take:

Statically typed languages:

Show more

2025-03-17

The 'They Did It' Fallacy

I was just now listening to an argument that got me more and more interested as the argument went on. The author did a great job of first taking the opposite position, then building a case for their side, and then making a strong counterargument.

Essentially, the argument went something like this:

  1. The Wi-Fi connection is unreliable.

  2. Some people in the household say that the problem is that the router is too old.

  3. However, after testing and reading reviews, it seems that the router is not the problem.

  4. Therefore, the problem must be that the government is corrupt since they send dangerous radio waves to our house.

Show more

2025-02-08

On Interface Design

I have spent quite some hours building open source software. With that, a lot of things have gotten easier over time. For example, setting up tests, CI, documentation, and websites has gotten much easier. However, interface design somehow not. I already asked about it 3 years ago. If anything, it has gotten worse since I nowadays realize more how important it is to get the design right. Especially once libraries become more and more used, there is a real cost to introduce breaking changes.

Show more

2025-02-06

The transformrs crate: Interface for AI API providers

Recently, I was thinking again about where AI is going. I was wondering what I as a developer should, or should not, be building. I wrote a post about my thoughts and concluded two things:

Firstly, cloud-based AI gets much cheaper every year, namely about 90% cheaper. If your AI application costs $1 per day to run now, next year it'll cost 10 cents, and just a penny the year after that. And while the price drops, the models keep getting better.

Secondly, the best AI tools don't necessarily come from the big technology companies. Take Cursor AI for example. Microsoft had everything needed to make the best AI code editor - the most GPUs, the most popular code editor (Visual Studio), and years of AI experience. But a small startup built Cursor, which many developers now prefer. The same happened with DeepSeek. Google, Meta, Microsoft, and Amazon are all spending billions on developing the best models. But DeepSeek came out of nowhere and delivered great results. This isn't new. The same thing happened with Google in the early 2000s. Altavista was the biggest search engine until Google, a small newcomer, made something better.

Show more

2025-01-31

AI learning rate and some thoughts

Now that DeepSeek released their new model, I'm thinking again about where AI is heading. For some technologies such as electric cars or batteries, I think I have a reasonable idea of where they are heading thanks to learning curves. Due to price decreases, we will probably see more electric cars, drones, ships, and trucks. Oh, and of course stoves that can boil water in 40 seconds. For AI, I'm not sure. In this blog post, I'll try to estimate the learning curve for AI and see if I can use that to make predictions.

Show more

2025-01-27

Live reloading for any generated website

When generating a website (typically HTML and CSS files), it is often useful to have a live reload feature. This means that your browser will automatically reload the page when you make changes to the files via code. For example, say you write some code that generates a plot on a webpage, or that generates some WebAssembly module that is embedded in the page. In the past, I would use tools like webpack or try to manually establish a socket on the server and inject JavaScript in the page.

I recently found a much simpler solution. Just use Bash together with any server that can serve static files and injects live-reloading like live-server.

Show more

2025-01-25

Running code in blog posts is probably a bad idea

One of my favorite things to do is to automate as much as possible. So when I started writing blog posts, I thought it would be a good idea to run the code in my blog posts automatically. For example, I would add blogs with cool code and visualizations and then run this code upon each push to the repository via CI. I even made a package for it called PlutoStaticHTML.jl.

What PlutoStaticHTML.jl allows you to do is to write your blog posts in Pluto notebooks (Pluto.jl is like Jupyter notebooks but for the Julia language). Then, you can setup CI such that the code will be executed each time you push to the repository and the output will be embedded in the blog post.

Show more

2024-05-30

Battery Learning Curves

A few months ago, I spent some time on trying to predict the changes in batteries over time. My aim was to estimate when electric cars would finally be as cheap or cheaper than internal combustion engine cars. However, my approach in that blog post was extremely naive since I didn't know about learning curves. Simply put, learning curves are a real world phenomenon where a good or service becomes consistently cheaper over time. As shown in the learning curve link, solar modules have gotten 99.6% cheaper since 1976. From $100 to about $0.3, that's almost 4 orders of magnitude!

Show more

2024-05-29

Summary of Same as Ever by Morgan Housel

One of my favorite books this year has been Same as Ever: A Guide to What Never Changes by Morgan Housel. I've read some of his blog posts at Collab Fund before, so I already knew the book would be good. His writing style is simple on the surface, but at the same time surprisingly insightful. (Which is quite a feat, I wouldn't disagree if you say my academic writing has been the opposite.) Morgan also often comes up with new ideas that I've never heard before. For example, he has argued that the German army in the 1930s got so strong because they had their weapons taken away from them by the Treaty of Versailles. Thanks to the lack of old weapons, they built new and more advanced equipment from scratch.

Show more

2024-04-29

Some SaaS banned me, again

A few years ago, a good friend of mine happily moved to Microsoft Office 365. It was great, he said, because his files were synced across all his devices and he felt it was making him more productive. He even stored many gigabytes of personal pictures on OneDrive because the storage pricing was fair and it was easy to do. Then, he shared his picture collection with one person and got banned almost immediately. No OneDrive, no Word, no opportunity for backup, no nothing. Contacting customer support didn't help. If I remember correctly, Microsoft claimed he was possibly involved in terrorist activities. In truth, Microsoft probably had a problem with sharing pictures of children. But, he shared the pictures with his own wife. He shared pictures of his children with his wife. The Dutch tech site Tweakers wrote a long article about similar events with software as a service (SaaS) platforms in 2021, so this was not an one-off occurrence.

Show more

2024-04-25

The history of the ROC curve

The Receiver Operating Characteristic (ROC) curve is a well-known tool used to evaluate the performance of binary classifiers. Its history is clear. According to Wikipedia, it

"was first developed by electrical engineers and radar engineers during World War II for detecting enemy objects in battlefields, starting in 1941, which led to its name ('receiver operating characteristic')"

Another source also states that the ROC curve was first used during the second world war to distinguish between enemy targets, or just noise by radar receiver operators (utah.edu, 2007).

Show more

2024-04-17

Updating my notes via email

Charles Darwin made it a habit to immediately write down anything that conflicted with his own ideas, so his brain would not forget or ignore it. In his own words:

"I had also, during many years, followed a golden rule, namely that whenever published fact, a new observation of thought came across me, which was opposed to my general results, to make a memorandum of it without fail and at once; for I had found by experience that such facts and thoughts were far more apt to escape from the memory than favourable ones."

Based on this, I've also made it a habit to quickly write down new ideas or thoughts. Unlike Darwin, however, I don't carry a notebook with me. Instead, I prefer to store my notes in a Git repository. Unlike a notebook, a Git repository is more fireproof, can more easily be searched and edited, and can scale to much larger sizes. Jeff Huang, for example, wrote that he has a single text file with all his notes from 2008 to 2022. At the time of his writing, the file contained 51,690 handwritten lines of text. He wrote that the file been his "secret weapon".

Show more

2024-03-08

Installing Forgejo with a separate runner

On the 15th of February 2024, Forgejo announced that they will be decoupling (hard forking) their project further from Gitea. I think this is great since Forgejo is the only European Git forge that I know of, and a hard fork means that the project can now grow more independently. With Forgejo, it is now possible to self-host host a forge on a European cloud provider like Hetzner. This is great because it allows decoupling a bit from American Big Tech. Put differently, a self-hosted Forgejo avoids having all your eggs in one basket.

This post will go through a full step by step guide on how to set things up. This guide is based on my Gitea configuration that I ran for a year, so it works. During the year, I paid about 10 euros per month for two Hetzner servers. The two servers allow separating Forgejo from the runners. This ensures that a heavy job on the runner will not slow down the Forgejo server.

Show more

2024-02-19

How much have batteries changed over time?

From time to time, batteries make the headlines because they store too little energy, because they are too expensive, or because the costs have dropped dramatically in the last year. These things could indeed be true at the same time. Or, as Hans Rosling would say, "things can be bad, and getting better." I was curious by how much. To figure out where things are heading, let's not focus on headlines and instead look at data for multiple years.

Phone Batteries

As a first investigation, I wonder whether much is happening in the area of small batteries. Thanks to the rise of smartphones, these batteries may have improved dramatically over time. We could checkout the raw battery prices, but consumers would not buy at these prices. So instead let's look at consumer smartphone prices. Smartphones are a mass produced product, so they should be able to incorporate state-of-the-art battery technology. Let's therefore look at iPhone battery capacity over time.

Show more

2024-02-03

Encrypting and decrypting a secret with wasm_bindgen

Doing a round trip of encrypting and decrypting a secret should be pretty easy, right? Well, it turned out to be a bit more involved than I thought. But, in the end it worked here is the code for anyone who wants to do the same.

I'll be going through the functions step by step first. The full example with imports is shown at the end.

First, we need to generate a key. Here, I've set extractable to false. This aims to prevent the key from being read by other scripts.

fn crypto() -> web_sys::Crypto {
    let window = web_sys::window().expect("no global `window` exists");
    window.crypto().expect("no global `crypto` exists")
}

pub fn generate_key() -> Promise {
    let sc = crypto().subtle();
    // Symmetric encryption is used, so the same key is used for both operations.
    // GCM has good performance and security according to Wikipedia.
    let algo = AesKeyGenParams::new("AES-GCM", 256);
    let extractable = false;
    let usages = js_array(&["encrypt", "decrypt"]);
    sc.generate_key_with_object(
        &algo,
        extractable,
        &usages
    ).expect("failed to generate key")
}

Show more

2024-01-28

An old solution to modern OpenAI GPTs problems

Ever since the introduction of ChatGPT, OpenAI has had a compute shortage. This might explain their current focus on GPTs, formerly known as Plugins. Simply put, you can see GPTs as a way to wrap around the base language model. In a wrapping, you can give some instructions (a prompt), 20 files, and enable Web Browsing, DALL·E Image Generation, and/or Code Interpreter. Also, you can define an Action, which allows the GPT to call an API from your own server.

At first sight the possibilities seem limited for developers. The code interpreter will only run Python code inside their sandbox. Furthermore, the interpreter has no internet access, so installing extra tools is not possible. You could spin up your own server and interact via the Actions (API calls), but that has some latency and requires setting up a server. Without spinning up a server, you could define some CLI script in Python and write in the instruction how to interact with that Python script. Unfortunately, this does limit the capabilities. Not all Python packages are installed in the sandbox and there is only so much that can be expressed in the instruction.

Show more

2023-11-25

Triggering entr

entr is an extremely useful little tool that can watch files and run a command automatically upon a file change. So, for example, the following can be used to watch all Rust source files and run the tests:

ls src/**/*.rs | entr -s "cargo test"

This works great and I've been using it for years. However, recently I switched to a Mac which restricts the number of files that can be watched to 256. This is a problem for large codebases. Furthermore, it can sometimes be very difficult to figure out which files to watch exactly. For instance when watching LaTeX files, it is important to not watch the log files or entr would go into an infinite loop.

Show more

2023-01-24

GPT versus Google

In 2019, I finished my master's thesis on the topic of Natural Language Processing (NLP) and I thought that I understood the basics of Artificial Intelligence (AI) after that. However, I've now finally tried ChatGPT and have to admit that my main conclusion was proven wrong. It is extremely likely that AI will mostly replace search engines as we know them and in this post I document Google's current responses versus the responses from recent GPT models. Google's responses will probably be fun to look back on in 20 years.

First a bit of background. In 2019 when I did my thesis, BERT was just released. Just like OpenAI's newest models, BERT is based on the idea of the machine learning model called transformers. In my thesis I applied BERT to the problem of automatically responding to customers. The idea was to feed BERT with lot's of data from customers and build a chat bot to automate the company's support center.

Show more

2022-06-25

Why I still recommend Julia (for Data Science)

Yuri Vishnevsky wrote that he no longer recommends Julia. This caused lengthy discussions at Hacker News, Reddit and the Julia forum. Yuri argues that Julia shouldn't be used in any context where correctness matters. Based on the amount and the ferocity of the comments, it is natural to conclude that Julia as a whole must produce incorrect results and therefore cannot be a productive environment. However, the scope of the blog post and the discussions are narrow. In general, I still recommend Julia for data science applications because it is fundamentally productive and, with care, correct.

Show more

2022-03-19

Optimizing Julia code

I'm lately doing for the first time some optimizations of Julia code and I sort of find it super beautiful.

This is how I started a message on the Julia language Slack in response to a question about why optimising Julia code is so difficult compared to other languages. In the message I argued against that claim. Optimising isn't hard in Julia if you compare it to Python or R where you have to be an expert in Python or R and C/C++. Also, in that message I went through a high-level overview of how I approached optimising. The next day, Frames Catherine White, who is a true Julia veteran, suggested that I write a blog post about my overview, so here we are.

Show more

2022-02-16

Static site authentication

More and more companies start providing functionality for static site hosting. For example, GitHub announced Pages in 2008, Netlify was founded in 2014, GitLab annouced pages in 2016 and Cloudflare launched a Pages beta in 2020. Nowadays, even large general cloud providers, such as Digital Ocean, Google, Microsoft or Amazon, have either dedicated products or dedicated tutorials on static site hosting.

In terms of usability, the products score similarly. Setting up hosting usually involves linking a Git account to a hoster which will allow the hoster to detect changes and update the site based on the latest state of the repository. In terms of speed, the services are also pretty similar.

Show more

2021-12-01

Bayesian Latent Profile Analysis (mixture modeling)

Updated on 2021-12-15: Include ordered constraint.

This post discusses some latent analysis techniques and runs a Bayesian analysis for example data where the outcome is continuous, also known as latent profile analysis (LPA). My aim will be to clearly visualize the analysis so that it can easily be adjusted to different contexts.

In essence, latent analyses about finding hidden groups in data (Oberski, 2016). Specifically, they are called mixture models because the underlying distributions are mixed together.

Show more

2021-11-17

Collinear Bayes

In my post on Shapley values and multicollinearity, I looked into what happens when you fit a complex uninterpretable model on collinear or near-collinear data and try to figure out which features (variables) are important. The results were reasonable but not great. Luckily, there are still more things to try. Gelman et al. (2020) say that Bayesian models can do reasonably well on collinear data because they show high uncertainty in the estimated coefficients. Also, Bayesian models have a chance of fitting the data better as is beautifully shown in the Stan documentation. It can be quite tricky to implement though because a good parameterization is necessary (https://statmodeling.stat.columbia.edu/2019/07/07/collinearity-in-bayesian-models/).

Show more

2021-10-27

Nested cross-validation

Nested cross-validation is said to be an improvement over cross-validation. Unfortunately, I found most explanations quite confusing, so decided to simulate some data and see what happens.

In this post, I simulate two models: one linear model which perfectly fits the data and one which overfits the data. Next, cross-validation and nested cross-validation are plotted. To keep the post short, I've hidden the code to produce the plots.

import MLJLinearModels
import MLJDecisionTreeInterface

using DataFrames: DataFrame, select, Not
using Distributions: Normal
using CairoMakie: Axis, Figure, lines, lines!, scatter, scatter!, current_figure, axislegend, help, linkxaxes!, linkyaxes!, xlabel!, density, density!, hidedecorations!, violin!, boxplot!, hidexdecorations!, hideydecorations!
using MLJ: CV, evaluate, models, matching, @load, machine, fit!, predict, predict_mode, rms
using Random: seed!
using Statistics: mean, std, var, median
using MLJTuning: TunedModel, Explicit
using MLJModelInterface: Probabilistic, Deterministic

Show more

2021-06-16

Increasing model accuracy by using foreknowledge

Typically, when making predictions via a linear model, we fit the model on our data and make predictions from the fitted model. However, this doesn't take much foreknowledge into account. For example, when predicting a person's length given only the weight and gender, we already have an intuition about the effect size and direction. Bayesian analysis should be able to incorporate this prior information.

In this blog post, I aim to figure out whether foreknowledge can, in theory, increase model accuracy. To do this, I generate data and fit a linear model and a Bayesian binary regression. Next, I compare the accuracy of the model parameters from the linear and Bayesian model.

Show more

2021-01-21

Random forest classification in Julia

Below is example code for fitting and evaluating a linear regression and random forest classifier in Julia. I've added the linear regression as a baseline for the random forest. The models are evaluated on a mock variable generated from two distributions, namely

\begin{aligned}
d_1 &= \text{Normal}(10, 2) \: \: \text{and} \\
d_2 &= \text{Normal}(12, 2),
\end{aligned}

The random variable is just noise meant to test the classifier, generated via

Show more

2021-01-21

Random forest, Shapley values and multicollinearity

Linear statistical models are great for many use-cases since they are easy to use and easy to interpret. Specifically, linear models can use features (also known as independent variables, predictors or covariates) to predict an outcome (also known as dependent variables).

In a linear model, a higher coefficient for a feature, the more a feature played a role in making a prediction. However, when variables in a regression model are correlated, these conclusions don't hold anymore.

Show more

2020-12-16

GitHub and GitLab commands cheatsheet

Both GitHub and GitLab provide shortcuts for interacting with the layers they have built on top of Git. These shortcuts are a convenient and clean way to interact with things like issues and PRs. For instance, using Fixes #2334 in a commit message will close issue #2334 automatically when the commit is applied to the main branch. However, the layers on top of Git differ between the two, and therefore the commands will differ as well. This document is a cheatsheet for issue closing commands; I plan to add more of these commands over time.

Show more

2020-11-29

Design cheatsheet

I like to complain that design can distract from the main topic and is therefore not important. However, design is important. If your site, presentation or article looks ugly, then you already are one step behind in convincing the audience. The cheatsheet below can be used to quickly fix design mistakes.

Colors

Suprisingly, you should Never Use Black. Instead you can use a colors which are near black. For example:

TintHTML color codeExample text
Pure black#000000~~~

Lorem ipsum dolor sit amet

~~~
Grey#4D4D4D~~~

Lorem ipsum dolor sit amet

~~~
Green#506455~~~

Lorem ipsum dolor sit amet

~~~
Blue#113654~~~

Lorem ipsum dolor sit amet

~~~
Pink#564556~~~

Lorem ipsum dolor sit amet

~~~

Show more

2020-11-14

Frequentist and Bayesian coin flipping

To me, it is still unclear what exactly is the difference between Frequentist and Bayesian statistics. Most explanations involve terms such as "likelihood", "uncertainty" and "prior probabilities". Here, I'm going to show the difference between both statistical paradigms by using a coin flipping example. In the examples, the effect of showing more data to both paradigms will be visualised.

Generating data

Lets start by generating some data from a fair coin flip, that is, the probability of heads is 0.5.

import CairoMakie

using AlgebraOfGraphics: Lines, Scatter, data, draw, visual, mapping
using Distributions
using HypothesisTests: OneSampleTTest, confint
using StableRNGs: StableRNG

Show more

2020-11-07

Installing NixOS with encryption on a Lenovo laptop

In this post, I walk through the steps to install NixOS on a Lenovo Yoga 7 with an encrypted root disk. This tutorial is mainly based on the tutorial by Martijn Vermaat and comments by @ahstro and dbwest.

USB preparation

Download NixOS and figure out the location of the USB drive with lsblk. Use the location of the drive and not the partition, so /dev/sdb instead of /dev/sdb1. Then, prepare the USB with

Show more

2020-11-04

The logit and logistic functions

Linear regression works on real numbers , that is, the input and output are in . For probabilities, this is problematic because the linear regression will happily give a probability of , where we know that probabilities should always lie between and . This is only by definition, but it is an useful definition in practice. Informally, the logistic function converts values from real numbers to probabilities and the logit function does the reverse.

Logistic

The logistic function converts values from to :

\text{logistic}(x) = \frac{1}{1 + e^{-x}}.

Show more

2020-09-26

The principle of maximum entropy

Say that you are a statistician and are asked to come up with a probability distribution for the current state of knowledge on some particular topic you know little about. (This, in Bayesian statistics, is known as choosing a suitable prior.) To do this, the safest bet is coming up with the least informative distribution via the principle of maximum entropy.

This principle is clearly explained by Jaynes (1968): consider a die which has been tossed a very large number of times . We expect the average to be 3.5, that is, we expect a distribution where for each , see the figure below.

Show more

2020-08-12

Writing effectively

According to McEnerney (2014), academics are trained to be poor writers. Eventually, they end up in his office and tell, while crying, that their careers might end soon. One reason why academics are poor writers is that they are expert writers. Expert writers are not experts in writing but are experts who write. An expert writer typically thinks via writing and they assume that this raw output is good enough for readers. However, it isn't good enough. For a start, expert writers have a worldview which differs from the readers' due to the writers' expertise. So, to avoid crying, McEnerney argues that writers should instead write to be valuable to the community of readers.

Show more

2020-07-29

Writing checklist

I keep forgetting lessons about writing. After writing a text, my usual response is to declare it as near perfect and never look at it again. In this text, I will describe a checklist, which I can use to quickly debunk the declaration. I plan to improve this checklist over time. Hopefully, text which passes the checklist in a few dozen years from now will, indeed, be near perfect.

The list is roughly ordered by importance. The text should:

  1. Ensure that the writing is valuable to the community of readers.

  2. Be simple (Adams, 2015) or be made as simple as possible, but not simpler. This is also known as Occam's razor, kill your darlings or the KISS principle.

  3. Be polite, that is, not contain a career limiting move. For example, do not "write papers proclaiming the superiority of your work and the pathetic inadequacy of the contributions of A, B, C, ..." (Wadge 2020).

  4. Be consistent. For example, either use the Oxford comma in the entire text or do not use it at all.

  5. Avoid misspellings.

  6. Avoid comma splices.

  7. Place the object before the action, so write "the boy hit the ball" instead of "the ball was hit by the boy".

  8. Flow naturally; just like a normal conversation. This is, for me, contradictory to writing when programming.

  9. Provide a high-level overview of the text. This can be a summary, abstract, a few sentences in the introduction or a combination of these.

  10. Prefer common collocations. A list of common collocations is The Academic Collocation List.

  11. Use simple verbs, for example, prefer "stop" over "cease to move on" or "do not continue".

  12. Avoid dying metaphors such as "stand shoulder to shoulder with" (Orwell, 1946). Metaphors aim to "assist thought by evoking a visual image" (Orwell, 1946). Dying metaphors do not evoke such an image anymore due to overuse (Orwell, 1946).

  13. Avoid pretentious diction such as dressing up simple statements, inappropriate adjectives and foreign words and expressions (Orwell, 1946). For example, respectively "effective", "epic" and "status quo" (Orwell, 1946).

  14. Avoid meaningless words, that is, words for which no clear definition exists. For example, "democracy" and "freedom" have "several different meanings which cannot be reconciled with one another" (Orwell, 1946).

Show more

2020-06-28

Combinations and permutations

Counting is simple except when there is a lot to be counted. Combinations and permutations are such a case; they are about counting without replacement. Suppose we want to count the number of possible results we can obtain from picking numbers, without replacement, from an equal or larger set of numbers, that is, from where . When the same set of numbers in different orders should be counted separately, then the count is called the number of permutations. So, if we have some set of numbers and shuffle some numbers around, then we say that the numbers are permuted. When the same set of numbers in different orders should be counted only once, then the count is called the number of combinations. Which makes sense since it is only about the combination of numbers and not the order.

Show more

2020-06-27

Comparing means and SDs

When comparing different papers it might be that the papers have numbers about the same thing, but that the numbers are on different scales. Forr example, many different questionnaires exists measuring the same constructs such as the NEO-PI and the BFI both measure the Big Five personality traits. Say, we want to compare reported means and standard deviations (SDs) for these questionnaires, which both use a Likert scale.

In this post, the equations to rescale reported means and standard deviations (SDs) to another scale are derived. Before that, an example is worked trough to get an intuition of the problem.

Show more

2020-05-11

Predicates and reproducibility

While reading texts on statistics and meta-science I kept noticing vagueness. For example, there seems to be half a dozen definitions of replicability in papers since 2016. In this text, I try to formalize the underlying structure.

Edit 2020-11-01: The model below is basically the same, but poorer, than the causal models as presented by, for example, Pearl (2009).

Assume determinism. Assume that for any function there is a set of predicates, or context, which need to hold for the function to hold, that is, return the correct answer. Let this be denoted by . For example, Bernoulli's equation solved for only holds for a context containing isentropic flows, that is, , where contains isentropic flows. There have been arguments that such contexts need to contain an (open-ended) list of negative conditions (Hoefer, 2003). Let these contexts and the contexts below also contain this list.

Show more

2020-03-05

Simple and binary regression

One of the most famous scientific discoveries was Newton's laws of motion. The laws allowed people to make predictions. For example, the acceleration for an object can be predicted given the applied force and the mass of the object. Making predictions remains a popular endeavor. This post explains the simplest way to predict an outcome for a new value, given a set of points.

To explain the concepts, data on apples and pears is generated. Underlying relations for the generated data are known. The known relations can be compared to the results from the regression.

Show more

2020-02-02

The greatest sales deck someone else has ever seen

According to Andy Raskin, the greatest sales deck has five elements. In this post, I'll present an adapted version. In line with the rest of this blog, I'll give an example of selling a programming language which is not Blub to a company, the newer language is called Y. Assume that the company is fine with language Blub because, well, everything is written in Blub and all the employees know Blub.

Show more

2020-01-24

Correlations

Correlations are ubiquitous. For example, news articles reporting that a research paper found no correlation between X and Y. Also, it is related to (in)dependence, which plays an important role in linear regression. This post will explain the Pearson correlation coefficient. The explanation is mainly based on the book by Hogg et al. (2018).

In the context of a book on mathematical statistics, certain variable names make sense. However, in this post, some variable names are changed to make the information more coherent. One convention which is adhered to is that single values are lowercase, and multiple values are capitalized. Furthermore, since in most empirical research we only need discrete statistics, the continuous versions of formulas are omitted.

Show more

2020-01-16

Benefits of writing blog posts

The first step into creating good habits is figuring out why exactly you want the habit. To me, writing blog posts seems like a good habit, but I'm unsure why. This post will attempt to convince the reader and myself of the benefits. I have combined my own ideas with the ideas by Terry Tao [1] and Gregory Gunderson [2], and grouped them.

Pedagogic benefits

  • Writing detailed expository notes is a way to practise research. This allows you to break free from the methods you are used to [2].

  • One can practise writing [1].

  • Writing allows one to test understanding of an idea [1]. It forces you to explain it clearly without hand waving. When aiming your text at colleagues or future employers, you cannot use jargon to hide your lack of knowledge [2].

  • Writing allows figuring out what exactly you do not understand or what you need to learn first [2].

  • Writing aids in structuring knowledge [2].

Show more

2019-12-29

Statistical power from scratch

In the 1970s the American government wanted to save fuel by allowing drivers to turn right at a red light (Reinhart, 2020). Many studies found that this Right-Turn-On-Red (RTOR) change caused more accidents. Unfortunately, these studies concluded that the results were not statistically significant. Only years later, when combining the data, it was found that the changes were significant (Preusser et al., 1982). Statisticians nowadays solve these kind of problems by considering not only significance, but also power. This post aims to explain and demonstrate power from scratch. Specifically, data is simulated to show power and the necessary concepts underlying power. The underlying concepts are the

Show more

2019-12-03

Niceties in the Julia programming language

In general I'm quite amazed by the Julia programming language. This blog post aims to be a demonstration of its niceties. The post targets readers who have programming experience. To aid in the rest of the examples we define a struct and its instantiation in a variable.

struct MyStruct
  a::Number
  b::Number
end

structs = [MyStruct(1, 2), MyStruct(3, 4)]
Functions and methods

For object-oriented programmers the distinction between a function and a method is simple. If it is inside a class it is a method, otherwise it is a function. In Julia we can use function overloading. This means that the types of the input parameters, or signatures, are used to determine what should be called. In Julia these are called methods. For example we can define the following methods for the function f.

Show more

2019-12-01

NixOS configuration highlights

I have recently started paying attention to the time spent on fine-tuning my Linux installation. My conclusion is that it is a lot. Compared to Windows, I save lots of time by using package managers to install software. Still, I'm pretty sure that most readers spend (too) much time with the package managers as well. This can be observed from the fact that most people know the basic apt-get commands by heart.

At the time of writing, I'm happily running and tweaking NixOS for a few months. NixOS allows me to define my operating system state in one configuration file configuration.nix. I have been pedantic and will try to avoid configuring my system outside the configuration. This way, multiple computers are in the same state and I can set-up new computers in no time. The only manual steps have been NixOS installation, Firefox configuration, user password configuration and applying a system specific configuration. (The latter involves creating two symlinks.)

Show more

2019-10-29

Entr

Having a compile and run shortcut seems like the most basic requirement for a developer. Most fully fledged IDE's therefore include it. For example, PyCharm will automatically detect the main file for the current Python project and run it with the correct virtual environment. This is all nice and well until it does not work out of the box. For example, when working with LaTeX most text editors can install a plugin which introduces a compile and run shortcut. Or, if you are lucky you can write down some script at some place in the editor which will execute upon a certain key press. This works as long as your static script is able to infer the correct file to execute. If not then the editor command needs to be changed for each project.

Show more

2015-08-25

QoS setup using Tomato in combination with an Experiabox

This is a copy of my blog post at Blogspot. It is mostly here for myself, so that I can compare my writing in 2015 with newer writings.

Introduction

In this blog my setup for QoS (Quality of Service) will be explained. The QoS is used in a home network with five users. The home network has an maximum download/upload speed of 650 / 35 kB/s. The QoS had to be introduced because of 'slow internet' noticed by the users while browsing websites, gaming or using VoIP. This problem was growing ever year because of more clients (i.e. smartphones, tablets and laptops) and data usage (cloud backup, HD stream availability). The key in solving the problem is to avoid using too much bandwidth. When too much bandwidth is used the data will pile up resulting in slow internet packet delivery. To limit the bandwidth it is important to slow down the big users like streaming and cloud backup. Last year a solution using the program cFosSpeed was implemented. This program ran on all the Windows devices and limited data based on the responsible process. Unfortunately the program could not run on android, meaning that those devices weren't limited at all. This rendered the solution completely useless. The solution now used is based on a router with some advanced firmware. The router knows nothing about the responsible processes, but is only looking at the packets. This results in a completely platform independent system which works without any set-up at the client side.

Show more