I am on vacation with my family next week (through the following Tuesday). I may have 1-2 posts that I finish this week and put in the queue (or I may not). I am hesitant about scheduling too far in advance given how fast things are moving these days, but I do have a few things on my mind that I have not posted here yet. So we will see. It will be a surprise!
Most of the time when I write about generative AI here I am writing about text generation. I generally explain how CMOs should be using simple tools to help their teams and companies become more effective. Today I am going to get a little more complex. Before I do, a little history and philosophy of AIs.
AI as a term has been around as long as computers. This current wave started in 2017 with the publication of the “Attention is all you need” paper on “transformers”. OpenAI went all-in on transformers and created GPT-1 in 2018, GPT-2 in February 2019, and GPT-3 in June 2020. The terminology shifted from the general “AI” to the more specific “Large Language Models”.
But transformers are more than just language models. In January 2021 OpenAI announced Dall-e which could transform text to images. In March 2022 the first Discord-driven version of MidJourney launched, followed by Dall-e2 in April, StableDiffusion in August.
Now we can roughly categorize AI transformers into xx groups:
Text-to-text
Text-to-image (or image-to-image)
Text-to-audio (or audio-to-text)
Text-to-video (etc…)
GPT-4 is multi-modal, which means a single tool can input and output text or images.
Beyond this list there is one specific type of text that is special, and deserves it’s own category:
text-to-code (and code-to-text)
Maybe it is better described as “natural language to code”.
Code is often intimidating to non-developers. As a marketer it is usually enough to understand what can be done with code, roughly how long things will take, and who to talk to to get it done. When a project requires “code” it usually just means it will take a lot longer than you want, and you need to beg for resources in someone else’s organization in order to get it done. Best case scenario you have a team of dedicated developers (dedicated to you, but reporting into a different structure), who you can meet with on a regular basis to define priorities.
Most of that is not going to change. But some of it will.
Already there are many CMS platforms that allow the marketing team to make changes without having to go through dev. The next step is utilizing “no-code” or “low code” tools, but in practice I have only ever seen that used at start-ups or individual hustlers. Most large companies do not want no-code marketers messing with the website. It’s asking for trouble.
But AI is bringing in a whole new world where marketers can launch products without talking to development at all.
How to code with AI
When the first chess engines started they were terrible. Slowly over many years they got better and better until they were able to beat the best humans. It still lost to teams of humans working with computers. But in another decade AI got so good that any contribution from humans made the model worse. We now judge human-chess by how closely a player plays to what an AI would have done.
The first version of co-pilot, OpenAI’s coding tool wasn’t great, but it was still revolutionary for those who used it. When I asked my friend how often he can use the code it creates, he said that was not the way to think about it:
The code it creates is never “ready to go”. It gets about 80% of the code right most of the time. But now, instead of writing code, I can have the AI create the code, and then I just go an edit it. It speeds me up by about 2x.
You couldn’t replace a developer with co-pilot v1, but you could replace two developers with co-pilot and a single developer.
The version of co-pilot was created with GPT-3. GPT-4 is much much better.
Now I have heard of non-developers using it like this:
Ask it to create code
Run the code. Get an error
Put the error code into GPT-4 and ask it what went wrong, and how to fix it
GPT-4 apologizes and fixes the code
Run the code. It works.
My early model of generative AI was that it was a sophisticated text-completion-engine. It is, but… so are humans. At this point “text completion” is not a useful way to think about how generative AI works. A better way is to think about the AI as a “reasoning model”. They use language to “figure stuff out”. GPT-3 roughly had the “reasoning ability” of a bright 12-year old. You could get it to do things, but you had to be very specific, and give it lots of examples or it got confused.
GPT-4 is a reasoning-engine approximately equivalent to a bright college student. You can be far more general in your ask than the 12-year old, but if you ask it to do something beyond that level, you still need to be specific and give it examples (it is MUCH better at writing jokes, but if you just tell it to “write a joke” you are not going to get anything a professional comedian would use — just like if you gave the same request to a bright college student).
GPT-4 also has the ability now to plug into other services. If you ask a bright college student to do math in their head you will get pretty good answers at lower levels of complexity, but as you get more complicated the math will start to fail. But give a college student a calculator, and any errors will come from miscommunication or misunderstanding, not calculation. GPT-4 can now plug into Wolfram-Alpha. It has a calculator.
It also has the ability to code.
Now it is possible to convert free-text into code, run the code, get an “answer”, and then interpret that answer back in free-text.
The big remaining thing that AI cannot do is have “intention”. Human brains are a mix of “desires” plus “reasoning machines” that attempt to achieve those desires. Up until now AI has just been a “reasoning machine” — any desires come from the human users.
But that two is changing
This tool gives GPT-4 “purpose”. The user still defines what that purpose (“goal”) is, but after that purpose is defined, the AI will start generating “prompts” to get itself to achieve that goal. It iterates and tests on itself to try different prompts until it feels the goal is achieved. “Auto GPT”
Another example (audio-to-code):
Here are some more tools worth exploring as a CMO:
You can text-to-code directly from ChatGPT now, but there are many tools being built on top of GPT-4 that will do it in a more user-friendly way. Most still seem to be in beta with waitlists. Here is one example: Create.xyz
MyGPT: A front-end interface for ChatGPT that gives you access to some plug ins NOW (rather than sitting on the waitlist on the main site. You need you OpenAI Key)
Wove: A tool for stringing together large numbers of GPT prompts into a workflow
Using text-to-code and incorporating it into your workflow and team capabilities is going to be harder than integrating text-to-text, but it is the future of work. If you are senior now you will likely be able to coast through the rest of your career without using these new capabilities, just as the previous generation was able to coast without learning how to use email, but do you really want to be equivalent of the executive who asks his secretary to print his emails so he can read them?
Keep it simple,
Edward
ALSO: Facebook launched a segmentation tool for images. AI-people are very excited.