AI babbling
LLMs don't know what matters
A part of my current work is to help create responses to customer questions about our uses of AI. The questions typically arise in requests for proposal, requests for information, and the like, which I’ll refer to generically as “RFPs.” When large companies buy complex, expensive products like the ones that we sell, there are many questions to be answered along the way to finally closing the deal.
In the early days of large language models (LLMs), I would just get customer questions and answer them. Lately, I have been receiving initial drafts of responses that were produced (by our team) by using AI, typically LLMs themselves. There’s an entertaining circularity to having AI answer questions about AI, but that hasn’t itself been a problem: it’s been mostly an understandable and reasonable evolution. However, as I review them, I have realized that there is a structural problem to using AI this way that I hadn’t noticed before. This problem is adjacent to the bullshit issue that I have written about previously, and similarly rooted in how LLMs work, but it is a little different.
The “new” problem, as I see it, is that the AI version says too much. It offers information that is (mostly) true but is often only marginally related to the question. This problem is not a showstopper in the context of RFP answers, which are often excessively verbose. The responding vendor often supplies information that’s only tangentially related to the question. Nevertheless, it is striking how often the way that I improve one of these AI-generated responses is by deleting one or more sentences entirely.
OK, so maybe the AI is babbling too much; presumably, there’s a way to get it to be less verbose. Although these systems don’t really work this way, we can imagine a “verbosity knob” on the AI that we could turn up or down. Unfortunately, it’s not clear there’s any setting of this imaginary knob that represents the right point. Alongside the unnecessary babble, the AI also occasionally misses some reasonable direct answer to the question at hand. So there’s an argument that it needs to be a little more verbose, not less verbose.
To be clear, the AI is not doing a bad job: it is performing at roughly the same level that people in the organization do when they are first navigating new questions based on a stack of previous answers. Indeed, I suspect the underlying process in both cases is quite similar: a kind of pattern-matching on phrases in the question, followed by a stitching-together of apparently-related sentences from the corresponding previous answers.
The difference with humans is that they (usually) learn enough from the corrections and the phrasings of an expert reviewer so that they get better and can supply those improved answers in the future. Ideally, the human responders are developing some kind of mental model of both the system about which they are replying, and the concerns of the questioners. I’m not sure what the corresponding process would be for the AI that’s generating the answers that I see. I suspect that there must be some way to take my revised answers and use them to improve the quality of the AI’s responses, but that workflow or capability is not well developed, and is certainly not exposed to me in my current reviewing role.
In some ways this is just an AI embodiment of the old slogan that “if you can’t dazzle them with diamonds, you should baffle them with bullshit.” Alas, the LLM primed with previous answers does not seem to be capable of producing diamonds; but, as I’ve previously observed, it is at its core a highly capable bullshitter.
The alarming aspect of this situation is not the production of draft responses via AI. As I’ve noted, those are not necessarily different in quality from the ones produced by humans. Instead, I find myself with two related concerns about the future evolution of RFPs:
First, my role as reviewer might be replaced by some future AI. This doesn’t concern me because I fear the loss of employment; instead, because my review is based on both knowledge of the actual product and a theory of mind for the questioners, I don’t see how LLMs would be able to do the kinds of checks that I’m doing.
Second, the customer organizations asking questions might choose to deal with RFP responses by using their own AI systems.
It’s already a somewhat cartoonish process for a big company to buy a complex product, involving what is sometimes a huge pile of text and spreadsheets, covering many weird and arbitrary questions about points of compliance with laws and regulations and best practices of various kinds. It’s worrisome that the RFP answers produced by an AI, then reviewed by an AI, might then be checked by an AI after being returned to the company that asked for them.
There is some possibility that the entire flabby text-generating and -consuming loop would have no necessary connection to actual compliance, or any shared understanding of reality. It would all be metalinguistic games, with various “documents” being exchanged but with no clear connection to the real objects of concern: products on one hand; laws, regulations, and polices on the other hand. By some misguided measure of “efficiency,” this system might seem like an improvement: more text handled, by fewer people, per unit of time. But in a thoughtful assessment of effectiveness, this process seems like a kind of cyclic garbage, referring to itself but unconnected to anything that matters.


There was a really good post a couple of weeks ago, speculating on the question, "What if AI had written the Gettysburg Address?" The conclusion was that the result would have been very similar to the two-hour speech Edward Everett gave as the warmup act for Lincoln that day. The one that later prompted Everett to tell Lincoln, "“I should be glad if I could flatter myself that I came as near to the central idea of the occasion, in two hours, as you did in two minutes.”
One quote I've always kept in mind in my own writing came from Ray Bradbury's essay, "Zen in the Art of Writing" - "The artist learns what to leave out." The way you hone your craft is by figuring out what you don't need to say. Such a concept would never occur to an AI unless it was told to do so, and even then it probably wouldn't do a very good job.
https://open.substack.com/pub/danielkwilliams/p/what-if-ai-had-written-the-gettysburg?utm_campaign=post-expanded-share&utm_medium=post%20viewer
I agree that a “verbosity knob” would be really useful. Some AI tools offer summaries of their own responses, but those summaries often repeat parts of the overly detailed answer that didn’t need to be there in the first place.