The widespread availability of powerful tools such as ChatGPT has triggered intense interest in understanding how AI systems work -- along with common misconceptions about their capabilities.
Although the abilities of GPT-4 and other emerging generative AI models are impressive, they result from understandable arrangements of basic algorithmic components, says author Ronald Kneusel. In his book How AI Works: From Sorcery to Science, a recent release from No Starch Press, Kneusel gives readers an accessible overview of the history of AI and the developments that enabled today's generative AI boom.
Drawing on his decades of experience in computer programming and machine learning, Kneusel unpacks misunderstandings about AI and provides a detailed but straightforward explanation of how these systems operate. In this interview with TechTarget Editorial, he sheds light on the inner workings, limitations and benefits of large language models (LLMs) and shares his perspective on the field's rapid evolution.
For more from How AI Works, read an excerpt from Chapter 7, "Large Language Models: True AI At Last?"
Editor's note: The following has been edited for length and clarity.
Can you tell me about your background and what inspired you to write this book?
Ronald Kneusel: My PhD is in computer science, in artificial intelligence, and I also have a master's degree in physics. I've been working with machine learning since 2002 and deep learning since before AlexNet, so I've been doing it for a long time.
The current book, How AI Works, is my third AI book from No Starch. The first two are for people who want to get into working with [AI], but this new one is really for general readers. There are lots of books that talk about AI and the effects it'll have, and that's all critically important. But what I wanted to do with this one is not get into the weeds, but also not be a bird's eye view -- as I put it, a trees-level view, so you can understand what [AI] is doing without burying yourself in the math.
Are there any misconceptions about AI that you encounter particularly often?
Kneusel: One would be that it's some kind of fancy program, in the sense that people have put it together and thought through it. Throughout the book, I juxtapose symbolic AI with connectionist AI. Connectionist is kind of an old term, but it's basically neural networks; it's what neural networks do. The two were competing for decades. When I learned about AI in the 80s, it was all symbolic. I remember neural networks were mentioned as toys that nothing useful would ever come from. Of course, that was wrong, but understandable at the time. Computers really weren't up to the task of doing what needed to be done to show the true power of a neural network.
I remember talking to people when GPT first started getting noticed, and they would say, 'Well, it's just looking things up.' [The model] is not looking things up. It actually knows all this stuff itself. It's stored it in a compressed format in a way that's probably well beyond our comprehension, just the way our mind's storage of data is currently beyond our comprehension.
People do seem a little shocked to realize that it's kind of an accident -- as I say, the ultimate Bob Ross happy accident. [The model] was intended to predict the next token, and to do that really well, you have to have a certain scale and size of model, and these emergent abilities happen. It's almost as if we stumbled -- though that's not fair to the researchers who spend a lot of effort designing things -- into an architecture that has the ability for this stuff to emerge.
In the book, you write that the emergence of LLMs 'has permanently altered the AI landscape.' There's clearly been an expansion of AI's popularity and reach lately, but could you explain why you view these technologies as such a paradigm shift in AI?
Kneusel: Because they're incredibly general. It's incredibly useful. Almost every day, you see some new research showing one of these abundant emergent abilities has some incredible use. In medicine, multiple studies have shown that large language models have a better bedside manner than doctors. In some cases, they're better at interpreting symptoms than a lot of doctors.
Though actually, expert systems built in the 80s were like that as well. They just never caught on because they were very brittle. And that's actually another aspect: These [LLMs] aren't as brittle. They're robust. I treat them like alien minds. It's a strange half-mind, as my brother calls it.
Can you elaborate on the distinction that you're drawing out there, between those older expert systems as more brittle and these newer LLMs as more robust?
Kneusel: So, in a very narrow area, expert systems did work, and they're still around as rule-based systems that sometimes get used. But if you change the conditions in which they operate much, they can't adapt and they fall apart. If you want to incorporate new knowledge of some kind, it's difficult to do that.
I saw a question asked to Richard Feynman during a lecture back in the '80s about intelligent machines. He said, 'I have no idea how you would program such a thing.' I was only an undergrad at the time, but I was thinking the same thing. How do you program this sort of behavior? Everyone's been thinking that since the '40s, really. And you can't, but that's what symbolic AI was.
So instead, taking inspiration from biology and all the billions of brains on the planet, if you get enough things together, maybe something will emerge from it. If there's enough operating capacity in terms of rules and functions and things like that -- which I think the transformer architecture has -- then you can expect, maybe, things to emerge from it.
What are some of the most interesting or exciting applications for these models that have such strong generalizability?
Kneusel: The first one that jumped out at me was programming. Medicine is another one. Education. I was very impressed with how quickly Khan Academy grabbed onto this, and I think they were right to do so. [These models] know just about everything. It's amazing.
As people understand how to work with them to avoid or minimize hallucinations, I see no reason why you can't expect to see kids in the future [using them], when they reach a certain age. You would have basically one-on-one private tutoring, which has been known for a long time to be probably the best approach to education, just not feasible. And now it's becoming feasible. So I expect huge changes there. No one can know what these models know. It's just not possible to know that much.
My 87-year-old father in Milwaukee used to ride the trolleys back in the 1940s, and so he challenged GPT-4 to explain a particular line of this very obscure part of the trolley system. It knew about it, explained it in detail, where it ran, how long it ran. No human would know that off the top of their head unless that was a very esoteric hobby they were into. [With AI], you have that breadth. So your tutor will be a PhD in dozens of fields, basically, and able to tell you at any level what you need to know.
The issue of hallucinations has come up frequently as LLMs have been more widely adopted. In the book, you say you 'expect future systems to be combinations of models, including models that validate output before returning it to the user.' What techniques do you find promising for mitigating the problem of hallucinations?
Kneusel: I know Nvidia has been working on kind of a layer between the user and the LLM to do just that. Another area where you might see some sort of reemergence, almost, of portions of symbolic AI is as type checkers and fact checkers. People have used models to check models. There's a whole literature now [analyzing] how you properly query the model to get it to be right more often than not, not to hallucinate.
What I'm finding also is that you have to have a certain level of explicitness [when using LLMs]. You have to think, 'Okay, most people wouldn't think this request might be interpreted this way, but [a model] might.' There are full formulas now for engineering prompts to minimize these sorts of things. So you can imagine outer systems that take your request and reengineer it to minimize [hallucinations].
It's interesting that one technique for mitigating hallucinations is more effective prompting, in the form of more explicit instructions. Maybe not as much as in traditional programming, where everything has to be stated explicitly, but still thinking in terms of the constraints of this machine you're working with.
Kneusel: Yeah, I like that comparison, because in traditional programming -- especially imperative programming -- you just arrange the problem step by step: 'Do this, do this, do this.' And then, say you're a manager and you have a bunch of software engineers and [you tell them], 'Okay, here's a spec, go write the code.' There will be back and forth; they'll read it one way, you maybe meant another, maybe your client meant another.
And that's kind of where we are with these models. If you have traditional programming on one side and humans on the other, the models are very much more like us. So now you have prompt engineering, and there are actual positions open for prompt engineers. It's a new kind of programming, in a sense. It's a new way of getting this strange thing to do what you want it to do properly.
Do you see prompt engineering being a discipline that sticks around, or do you think that might become less important as models advance?
Kneusel: I think it'll fade away. I think it's a thing now because that's what's necessary now. But as [models] get more sophisticated, I think it'll fade. We all learned how to search in a search engine, right? The first time using a search engine, you were disappointed in your results, maybe. But over time, we learned how to use them properly. And it'll be the same with these tools. We'll learn how to use them properly.
We've touched on hallucinations; I'm wondering what other potential risks or challenges you foresee as these technologies advance.
Kneusel: Probably that because they behave more like us, they're going to be subject to the same problems we have. In fact, I just saw a headline this morning that, even though GPT-4 and others are very good at basic medical diagnoses and talking with patients, they still have biases related to racial disparities in treatment -- assumptions that are not true but still embedded in the field. And because they're trained on the existing data, of course they have those biases now, too.
We're used to trusting computers as infallible sorts of things. Now it's more like a person we're talking to, and that person's going to have a certain viewpoint. That person's got certain biases. And that's where alignment comes into it. The proper behavior you're getting from the commercial models is the result of a lot of interaction with human beings and the model, basically saying, 'This is how you should be responding.' It's a new way of interacting with the computer.
It sounds like there are two risks that you're drawing out there. One is familiar: the importance of high-quality and diverse training data for models. The other is this issue of alignment and figuring out how to set appropriate guardrails for generative AI, which is newer.
Kneusel: Right. It's not solved. So far, we're basically doing what software testing does. If you program in functional programming languages, you can start to get to a position where you can make mathematical statements about whether a program really is correct and has no bugs. But since we can't do that, we just test the heck out of it until we can convince ourselves that we've reasonably done everything we can do.
But still, things happen. I used to work with medical devices, and there was an infusion pump that suddenly killed several people, even though it had been used for ages and it had been extensively tested. The company had to go through and make sure they hit every single branch of this large thing written in C. They did eventually find a combination of inputs that would [explain] what happened, but regular testing would never have found that. No matter how much we align or test the models, there's going to be exceptions.
What advice would you give folks who are interested in getting into AI?
Kneusel: It's a strange time. It really is. Someone asked Geoff Hinton this recently in an interview, and he said, 'Become a plumber.' His idea was dexterity: Even though the models might know exactly how to fix your plumbing, they can't physically fix your plumbing, but we [humans] can.
But people still have to build the models. People still have to understand things. Someone has to build the system. Someone has to learn how to use them and incorporate them. There's a ton of practical use for non-generative AI models all over the place. They're just getting buried inside of systems and devices.
So for getting into AI, I would still learn to code. Learn the math. It's not a lot of math, really -- you can do neural networks with one semester of calculus. You don't need more than that. And there's so much material out there.
If you are learning to code, resist that temptation to use [generative AI] too much. It becomes a super powerful tool, and you have to guard yourself a little from overreliance on it. It's like math class. You watch the professor do it, and you're like, 'Yeah, I get it.' And then you sit and stare at the homework, and you're like, 'What is this?' You have to do that part too. That's how we learn.
And for me, knowledge is always valuable. Almost everything I've learned or studied, I've used in some way, somewhere. I get calculus students learning integration techniques asking, 'Why do we do this? I can just go to Wolfram Alpha and type it in, and it gives me the answer.' And I get that, because people in the past had to do it [manually]; now we don't. But still, you should have the general idea of what it is and why it works, even if you're not an expert.
What are the key things that you would want a general reader to take away from this book?
Kneusel: [The book's subtitle] says, 'From Sorcery to Science,' right? AI is not magic. In the end, it boils down to clever arrangements of a basic repeated unit and a very simple algorithm that's been around in its basic form for centuries. It's remarkable that it works, because it sort of shouldn't, but it's that. It didn't fall out of the sky. It's been developed over time, and there's a path it took, and it's comprehensible.