Large language models (LLMs) have taken the world by storm, especially in their “chatbot” manifestations like ChatGPT. Unfortunately, many silly things have been written about their current and likely future capabilities.
At one level, it is understandable that there should be so much excitement, as well as so many confused views about the significance of this development. Certainly, there are many ways in which one can interact with an LLM and have the impression that there is a kind of intelligence involved. In particular, LLMs do very well at the “Turing test,” in which a text-chat conversation is the basis for judging whether the other party is intelligent. Even on a bad day, an LLM can achieve a striking resemblance to a human; when working well, an LLM can seem like a superhuman. It’s not clear whether we should now declare that LLMs are intelligent (albeit with some glitches), or whether our experience with LLMs means that the Turing test is not a very good test of intelligence.
LLMs reveal a striking and previously unknown empirical fact about the universe, related to the way knowledge is encoded in a large corpus of documents or images. It turns out that large bodies of text and large bodies of images contain sufficient information to generate lots of plausible-seeming new text or art. (This discovery is in some ways the successor to a striking empirical fact revealed by search engines some thirty years ago: that the brute force use of keywords and simple phrases is surprisingly good at finding what’s of interest in the internet, especially when improved by paying attention to inter-page references.)
Notwithstanding their surprising successes, it’s crucial to understand that LLMs work based on statistics, not truth or accuracy. Descriptions of LLM failures refer to so-called “hallucinations”: sometimes the LLM will make up things that are simply not true. Some AI fans claim that future improvements will reduce or eliminate this problem, but I think that such a view misunderstands what LLMs are doing, as well as potentially misleading people about suitable applications for LLMs.
The crucial insight is that LLMs are always making things up (Technology Review has a short article about this, unfortunately behind a paywall). Much of the time LLM statistical fictions appear correct to us: they coincide with reality as we understand it. The intriguing empirical discovery is the high frequency with which the made-up stuff is correct. But human users are applying a discriminating function (“is this correct?”) that the LLM itself is unable to perform. We are comparing its statements to some model of truth or reality. When the LLM’s statements align with truth or reality, we think that all is well. When the LLM’s statements don't align with truth or reality, we are puzzled and call it a hallucination. But from the perspective of the LLM, it's all the same.
The most accurate characterization I have seen is that large language models are bullshitters (the detailed argument was published in 2024 as Hicks, Humphries, and Slater, “ChatGPT is bullshit”). The term “bullshit” is used here in the technical sense that Frankfurt defined in his 2005 book, On Bullshit. The crucial idea is that a human bullshitter is a person who is unconcerned with truth. They are neither determined liar nor determined truth-teller; they just don’t care, picking whatever story best serves them in the moment. As it happens, that same indifference to truth is pretty much exactly the condition of every LLM. A human bullshitter may have complex motivations for their behavior (a personality defect, specific circumstances), but an LLM is unconcerned with truth because its fundamental operating mechanism has no relationship to truth.
And so, we come to the question of what LLMs are good for. The simplest answer is to say that they work well in situations where you would be happy to have someone who is kind of a bullshitter. If you need plausible sounding but occasionally incorrect answers, then an LLM will do a great job. However, for any circumstance involving critical decisions and/or high stakes, an LLM is a terrible idea, just as a human bullshitter would be a terrible idea.
This is a somewhat odd perspective, but one that I think will become more familiar over time. It’s not that LLMs are useless, nor that they are a panacea. Instead, they have an odd kind of utility that is unfamiliar. Managers and subject-matter experts have not typically had to grapple with the question “do I have jobs that would be suitable for delegation to an excellent bullshitter?” But that question, or one like it, seems to be at the heart of understanding how to best use LLMs.
In a lot of cases, picking the best starting point is valuable, even if it is far off. This is true of problem solving in general, including engineering, systems / software architecture / development, social engineering (marketing, legal, etc.), science, and art.
A problem is by definition something not yet solved, or not yet solved by a particular person or group. Estimating the best starting points, even if they are sometimes wrong or not optimal, is still usually better than starting with a blank page and first principles sources.
But, as you point out, this kind of rough starting point is not always directly usable as a result. Sometimes OK for fuzzy uses, not ever usable for important & dangerous cases, and to various degrees in general.