How can I get exactly what I want from ChatGPT-4?
Generative AI sure is a beguiling technology.
If we want to understand GenAI’s capabilities and potential we need to get our hands dirty with it as much as possible.
ChatGPT, can you help me write this blog post?
For example, a fairly obvious use case for a tool like ChatGPT-4 (currently OpenAI’s premium consumer product at $20/month) is as a research and writing assistant. But to get value, and to decide whether to “add it to our toolbox”, we need to get to grips with its strengths and weaknesses.
Playing with ChatGPT-4, one of the first things you’ll probably realise is just how sensitive it is to the way you phrase instructions. Well, ok, it’s not as sensitive as traditional computer programming— where a single missing character often breaks your entire code — but it’s still pretty sensitive. How you decide to “prompt” your large language model (LLM) is critical to what it will give you back in return.
“Prompt engineering” — the art/science of crafting, structuring and refining our inputs to LLMs to achieve specific output goals — is super hot right now. Sheila Teo won Singapore’s GPT-4 prompt engineering competition and shared her secret sauce. Inspired by Sheila’s recipe, Jordan Gibbs built a custom GPT that helps you to write better prompts (if you prompt it to do so).
A prompt engineer can make very decent money:
Now, it’s possible we’re in an interim phase, where prompt engineering appears to be vital to getting what we want from AI. Perhaps the next generation of LLMs won’t be so picky about how we talk to them. Who knows? But, whatever happens next, as (OpenAI cofounder) Andrej Karpathy put it: “The hottest new programming language is English” (or any other human language!)
To get a feel for this, let’s take a PDF of the UK government’s recently published Generative AI Framework (worth a read, by the way) and see what ChatGPT-4 makes of it.
I first wanted to see if I could get it to extract the 10 principles outlined in the document. I uploaded the PDF and prompted:
ChatGPT hasn’t been particularly detailed there, but it has done a good job of capturing the principles in a single, long sentence.
However, I wanted a list:
Good. The model has essentially captured the principles. But GPT has a strong urge to summarise, and it can result in some subtle changes that we might not want. The list is similar to the list in the document, but not identical. Note that GPT hasn’t mentioned that it took some liberties with with the list.
Now it wants to summarise even more! That’s not what I asked for. Also GPT has made some subtle adjustments to the principles 7–10, as we’ll see below.
Bingo! That’s now the exact wording from the document. We got there in the end. I was actually quite surprised we got there, as in previous experiments I had found that GPT was very “reluctant” (sorry, I probably shouldn’t anthropomorphise an LLM so much!) to give me verbatim quotes from documents.
For example:
As you can imagine (or as you may have seen for yourself, if you have tried using LLMs as a research/writing partner) the unpredictability levels rise steeply when you task the model with summarising much larger pieces.
In a separate test I asked GPT-4 to summarise the whole document. This resulted in quite a bit of “push back”.
ChatGPT then went on to provide “tips” on how I might do it myself. Fair enough, I shouldn’t get lazy.
But I wonder how much this is down to how the model has been trained, using reinforcement learning from human feedback (RHLF)? OpenAI has suffered a lot of bad PR around copyright infringement and cheating on homework. It seems likely that GPT-4 has been trained to “resist” requests to rip content wholesale, or generate entire articles, essays or blog posts (!) at the drop of a prompt.
Or maybe I just need to bone up on my prompt engineering?
What do you think? I’m very interested to hear about others’ experiences using LLMs as a research/writing partner. Meanwhile, I’ll keep experimenting…