GPT versus Google

In 2019, I finished my master's thesis on the topic of Natural Language Processing (NLP) and I thought that I understood the basics of Artificial Intelligence (AI) after that. However, I've now finally tried ChatGPT and have to admit that my main conclusion was proven wrong. It is extremely likely that AI will mostly replace search engines as we know them and in this post I document Google's current responses versus the responses from recent GPT models. Google's responses will probably be fun to look back on in 20 years.

First a bit of background. In 2019 when I did my thesis, BERT was just released. Just like OpenAI's newest models, BERT is based on the idea of the machine learning model called transformers. In my thesis I applied BERT to the problem of automatically responding to customers. The idea was to feed BERT with lot's of data from customers and build a chat bot to automate the company's support center.

In the thesis, the aim focused mostly on classifying the intent of a user. For example, the intent of the sentence "How to install a Brother MFC-5890CN net-work printer?" should be classified as "Install Printer". In the thesis, the result was basically that accuracy was not high enough. Also, there was a lot of work needed to train a model for intent classification, and then move on to interpreting the rest of the sentence. In other words, the most useful strategy at that point in time was to give very specific tasks to the model and handle the plumbing around it yourself. For years, this was my take-away from the thesis: If you want to use AI, then find a suitable problem for it first. For example, email and image classification work great if you have as much data as a big tech company. The problem is that there is not always enough data available and that some tasks are just too complex for the model to understand.

In hindsight, the solution to this was obvious. In hindsight, the solution was to give the model way more data and way more free parameters. This is what happened to image classification too in the famous ImageNet competition when AlexNet severely outperformed the competition in 2012 via training on GPUs. Before AlexNet, participants to the competition used to manually and painstakingly go through all the features of, for example, flowers and try to train statistical models on specific features. Then the outcomes of different models would be combined in a hand-tuned way, and a classification for the flower would roll out of it. For example, researches would spend months to figure out which leafs can be spotted the best by which Sobel filter. However, one day, AlexNet dramatically outperformed the hand-crafted features which lead to the current AI boom. There is a video from Dr. Mike Pound somewhere explaining this transition.

And now suddenly, many people expect that AI will replace Google Search. Google has allegedly declared an internal "code red". And after collecting some samples, it becomes clear why.

Before we continue, it is good to mention that Google's algorithm is famous for being complex and for being a closely guarded secret. Over the years, the algorithm has improved in accuracy. At some point it started to highlight answers in sources and nowadays Google returns the matching text from the query. This was likely not an easy feat. Just like hand-tuning the classification of pictures, hand-tuning the best Google Search results must have required many hours of work!

In the benchmarks here, I will compare Google's highlighted response to that of ChatGPT. Note that these example queries are not a result of some random sampling procedure, I just picked some queries that popped up in my mind (based on what I searched for earlier this week, so it is slightly biased). To see that the bias is not too strong, feel free to try similar queries out for yourself and you'll see that ChatGPT is surprisingly accurate in most cases. Let's ask why:

Query: "how is gpt so accurate?"

Google:

Because GPT-3 is structurally similar to its predecessors, its greater accuracy is attributed to its increased capacity and greater number of parameters. GPT-3's capacity is ten times larger than that of Microsoft's Turing NLG, the next largest NLP model known at the time.

ChatGPT:

GPT (Generative Pre-trained Transformer) is a language model that uses deep learning techniques to generate human-like text. It is trained on a large dataset of text, which allows it to learn patterns and relationships between words and phrases. The model uses these patterns to generate new text that is similar to the text it was trained on. Additionally, GPT uses a technique called "transformer architecture" which allows it to effectively process large amounts of text and generate high-quality output. This architecture is based on self-attention mechanism that enable the model to weigh the importance of different words in a sentence and generate the output accordingly. This combination of large dataset and transformer architecture allows GPT to generate highly accurate and human-like text.

Clearly a win for ChatGPT here. Google's answer is also correct, but for the wrong question. It answers why GPT-3 is more accurate than an earlier GPT version.

And now for something completely different. A small fact about functional programming:

Query: "Which three functions from functional programming are commonly known and which of these functions is used the least?"

Google does not show a suggestion and just links to functional programming on Wikipedia and tutorialspoint.com.

ChatGPT:

The three commonly known functional programming functions are:

map(): This function applies a given function to each element of an iterable (e.g. list, tuple, etc.) and returns an iterable containing the results.

filter(): This function applies a given function to each element of an iterable, and only keeps the elements for which the function returns true.

reduce(): This function applies a given function to an iterable, cumulatively combining the elements in some way.

It's hard to say which function is used the least, since it can vary depending on the specific use case. Some developers may find reduce more difficult to understand and use as it is a bit more complex than the other two but it is also more powerful in some situations.

A win for ChatGPT. The three functions are correct and the last paragraph about reduce is slightly wrong, but still okayish.

Then, let's figure out how that American way of writing titles is called again?

Query: How is that American way of writing capitalized titles called again?

Google does not show a suggestion but does link to a document from APA at position 2 which describes "Title Case Capitalization". The document is full of details and requires some reading to figure out how to do it.

ChatGPT:

The American way of writing capitalized titles is called "Title Case." In title case, the first letter of each word in the title is capitalized, with the exception of certain small words such as "a," "an," "the," "and," "but," "or," etc.

Both are correct, but ChatGPT's answer requires less energy (for humans). Let's quickly continue and see whether we can automate the process.

Query: How to write "This is an example sentence" in title case?

Google links to many sources containing an "example sentence" because the quotation marks are interpreted as a literal.

ChatGPT:

"This is an example sentence" in title case would be written as: "This Is an Example Sentence"

Boom. Mic drop. I don't think that ChatGPT was taught by a human how to convert sentences to title case. It learned it from reading the internet. This is mind blowing.

However, that "Is" looks a bit off to me. Is that correctly capitalized?

Query: Should is be capitalized in title case apa style?

Google links to a nice tool which does it correctly and also gives:

The rules are fairly standard for title case: Capitalize the first and the last word. Capitalize nouns, pronouns, adjectives, verbs (including phrasal verbs such as “play with”), adverbs, and subordinate conjunctions. Lowercase articles (a, an, the), coordinating conjunctions, and prepositions (regardless of length).

ChatGPT:

In APA style, the word "is" should not be capitalized in title case. APA style advises against capitalizing conjunctions, prepositions, and articles unless they are the first or last word in the title.

So "This is an example sentence" in APA style title case would be written as "This is an Example Sentence".

So, to me as a non-native American speaker who doesn't know the meaning of "pronouns" and "adjectives", the Google suggestion is useless again apart from the great link.

The ChatGPT suggestion is worse here because it is wrong. "Is" should be capitalized.

Finally, a last one to save the day for ChatGPT again: In the introduction, I said that it was pretty difficult in 2019 to get the intent for the sentence: "How to install a Brother MFC-5890CN net-work printer?" correct. Nowadays, that is solved:

Query:: How to install a Brother MFC-5890CN net-work printer?

Google links to the webpage from Brother which has a direct link to the right manual.

ChatGPT:

Installing a Brother MFC-5890CN network printer typically involves the following steps:

Connect the printer to your network: Connect the printer to your router or switch using an Ethernet cable.

Install the printer driver: Download the latest printer driver for the Brother MFC-5890CN from the Brother website and install it on the computer you want to use to print from.

Add the printer to your computer: Once the driver is installed, you will need to add the printer to your computer. This can be done by going to the "Control Panel" and then "Devices and Printers" on Windows, or "System Preferences" and then "Print & Fax" on Mac.

Configure the network settings: You will need to configure the network settings for your printer. This can be done by going to the printer's web-based configuration page and accessing the network settings.

Print a test page: Once the printer is set up and connected to your network, you can print a test page to ensure that it is working properly.

Note: The specific steps may vary depending on your operating system and network configuration. Also, you should refer to the printer's manual for detailed instructions on how to install and configure the Brother MFC-5890CN network printer.

According to the manual that Google linked, one should install the MFL-Pro Suite. This is hidden on page 27 after instructions such as "Unpack the machine and check the components". So, one could say that the printer installation instructions by ChatGPT are wrong. I wouldn't say so. After going to the Brother website, the driver installation program is actually returned and, indeed, this will most likely contain enough instructions to get the printer going. Arguably, the instructions in the manual are as wrong as the ChatGPT instructions because no one who wants to achieve results follows the manual step-by-step.

In conclusion, Google and ChatGPT are both sometimes wrong and sometimes right. In general, ChatGPT is way faster as was also observed by someone participating in a class with ChatGPT. I wonder how search (or knowledge work for that matter) is going to look in a few years. Crazy times.