AI Language Models: Part 1, the briefing
There was a lot of AI news last week that I want to share with this group. So much that trying to share the news plus the implications is too much for a single essay or briefing. Today I will go through the research news that was announced last week, along with some of the applications that are starting to appear that utilize these new advances - a kind of “super briefing” (It is already too long for gmail and the bottom is likely to be cut off, so click on the header if you would like to read the whole thing). Next week I will have an essay on my personal experience using the tools and what I think are the implications from broader population getting access.
This Week’s Sponsor:
This week's email is sponsored by The Tonic.
The Tonic is a free weekly newsletter that improves the lives of its readers through thoughtful lessons, challenges, and journaling prompts. Stay informed, entertained, and inspired… for free.
Sign up for FREE here!
AI News
Dall-e-2
On April 6th OpenAI released the research version of Dall-e-2. This is the second “improved” version of Dall-e, a tool that allows users to create AI-generated images from keywords. Here is OpenAI’s description of the tool. Here is Sam Altman’s blog post on the implications.
How it works:
Just as GPT language tools predict text based on earlier text, Dall-e predicts pixels based on both other pixels, and on the text it is generating the image from. As described by OpenAI: “Dall-e2 has learned the relationship between images and the text used to describe them. It uses a process called “diffusion” which starts with a pattern of random dots and gradually alters that pattern towards an image when it recognizes specific aspects of the image.” The result is, as Sam Altman writes, “It sure does seem to “understand” concepts at many levels and how they relate to each other in sophisticated ways.”
It is difficult to fully grasp what we are talking about without examples. There are many on the OpenAI description page, but as researchers have been given access they have shared some of their own illustrations.
Examples
Nick, a researcher at OpenAI has used his friends’ Twitter bios into Dall-e2 images:
Commitments empathetic, psychedelic, philosophical
Bookbear
Happy sisyphus
Cottagecore tech-adjacent young Robert Moses
There are many more at the link.
Note that this is NOT as simple as just dropping the profile descriptions into Dall-e2 and copying -and-pasting the results. Nick explains how he did it:
I didn’t just paste prompts into dall-e, I played with style (eg. cyberpunk, oil, etc) to keep it interesting and diverse. Best thought of a testing what a somewhat prompt-savvy non-artist + dall-e can make playing for a few min. But also I can't stress enough how incapable an artist I am without dall-e, basically stick figures, and I've also only been playing with dall-e for a couple days so it's not like I've learned prompt-voodoo that can't be picked up by playing around for a little bit… I enter something vaguely what I imagine and get 10 full paintings in about 18 seconds, do that a few times wiggling the prompt a bit as it comes up with new ideas and I like them, then pick one to post. For the most part every painting makes sense every time [for easy prompts], but the style isn't always interesting. For harder prompts (eg draw a happy sisyphus) it's makes sense in like 80% of the images or so, and 30% have something creative like [a] smiling boulder. If I had to quantify, I’d say I’d generate 2 or 3 batches (tweaking prompt) before choosing my fav two pics, each batch outputs 20 images (two tabs 10 per), so prob technically cherry picked 2 out of 60. That said usually other 58 weren’t really broken, just boring / bit less fun.
IMO what's important isn't some quantifiable cherry picking ratio, [because] sampling a model is trivial and you can do it as much as you want. What matters is how much time it takes to get something you're happy with. So I think it's really cool what a few min of playing gets you. I do separately think showing pure model output and measuring all the things is useful for papers and such too, but my goal was just to make cool illustrations for my friends, and I think that's much closer to what will actually matter for most people going forward. The flip side of cherry picking for 2 is that I often had 10 that I liked that were pretty different from each other after a few min of playing, and it was really hard to pick just 2, but I didn’t want to change the format halfway through. So many I liked but didn’t include.
This matches my experience working with GPT-3 tools: It often gets you something interesting, but it rarely gets you what you want for the final output on the first try. Imagine you are a brand manager asking for ideas from your creative agency. You ask for something and they generate 20 ideas, and share 3 with you. None are exactly what you want so you give feedback and they go back and do another cycle. Rinse and repeat. The process is slow, and expensive. With Dall-e2 you could cycle as many times as you want in real time (and no ego involved - at least no creative team ego, it can’t do anything about the brand managers’s ego)
You can also act like a micromanager if you like. OpenAI’s post describes how you can ask Dall-e2 to add elements into the image and it does so using contextual clues. Here it is when it is asked to add a flamingo in different locations within the image:
(Notice the reflection!)
Sam Altman says,
DALL•E 2 is a tool that will help artists and illustrators be more creative, but it can also create a “complete work”. This may be an early example of the impact AI on labor markets. Although I firmly believe AI will create lots of new jobs, and make many existing jobs much better by doing the boring bits well, I think it’s important to be honest that it’s increasingly going to make some jobs not very relevant… It’s a reminder that predictions about AI are very difficult to make. A decade ago, the conventional wisdom was that AI would first impact physical labor, and then cognitive labor, and then maybe someday it could do creative work. It now looks like it’s going to go in the opposite order.
ATMs did not eliminate bank tellers - they actually drove an increase in the number of bank branches, and an increase in the number of bank teller jobs. But the new tellers were doing different work than the old tellers. While they still sometimes facilitated withdrawals, that became the exception as that job was outsourced to the machine. Instead they mostly did higher value work that the machine was not capable of doing. Dall-e2 will do something similar. It won’t eliminate the need to creative agencies or artists, but it will fundamentally change both the work and the skills required to create art.
Here are some more examples of Dall-e2 illustrations from random twitter prompts and here are more from thoughtfully chosen prompts.
Another text-to-image generation tool is Midjourney. The company is in beta and has not shared how the images are generated, but there are many examples available on Twitter. OpenAI’s tool is new and cutting edge, but they will not have a monopoly on this space (more on this in next week’s essay)
PaLM
GoogleAI is a competitor to OpenAI. On April 4th they introduced their latest model called, “Pathways Language Model (PaLM)”. It uses 540-billion parameters that was trained across multiple “Pods”. The result is an AI that can do some incredible things like “figuring out” puzzles, or explaining jokes. While it does all this with a “text prediction engine”, the results are astounding. As kids in school we were always told to “explain our work”. Google has found the AI does much better when it is told to explain its work. Without the explanation the text prediction may just throw out an almost random number, but when it explains word by word the tool does much better:
Here it is when asked to explain a joke:
OpenAI has developed a tool called CoPilot that can generate code from natural language descriptions. Here Andrew shows how he used it to create simple games like Wordle (“I used OpenAI’s newest code model to make simple versions of games like Wordle, VR mazes and Zelda ENTIRELY through natural language. I told it what I wanted and did ZERO editing/coding.”). PaTH can do similar work, while NOT being a dedicated tool - i.e., the same AI that can explain jokes and solve puzzles, can also generate code:
More examples:
How it performs vs humans:
PaLM is now better than an “average human” at ‘understanding’ queries, but it is still significantly worse than the best humans. This is a similar trajectory of chess playing AIs a few decades ago. Initially they were terrible, but the amazing thing was they could play at all. Then they were able to create a reasonably competitive game against amateur players. Eventually they began to challenge the best players. Now they are so much better than any human that human moves are just ranked against how closely they align to the “correct” move an AI would choose.
Applications
We are still in the exploring and research phase of these AI language-models. There are not many consumer (or business) applications on the market. I will explore this more in the next essay, but in the meantime I think it is worth sharing some of the applications that are either live now, or have been developed off of these tools.
NovelAI is a writing tool I have been using for the last few months. It uses GPT-3 competing products (which underlying technology it is using has changed and evolved over time based on pricing it can obtain and the capabilities of those tools). It takes the core AI and then does its own training on top of it to create AI that is specialized in writing fiction. It also provides tools for users to upload their own text to re-train or over-train an instance of the AI in whatever style the user is interested in. Those “specialist” AIs made by users are then made available for wider use (if the user chooses to make them available). There are now specialist AIs for writers (like Edgar Allen Poe or Authur Conan Doyle), AIs for genres (Supernatural, Super Hero, Wikipedia, Children’s Literature), and AIs trained on specific fan fiction or even erotica (and super-specialized erotica…).
The following paragraphs were written by NovelAI based just on the text from “Applications” to ‘fan fiction’ (I had to remove the mention of erotica to stop the AI from going off on that tangent). It took about six “generations” to create the below text. I had it re-do all of part of three of the generations in order to get something that I wanted to share (It would have worked without the editing on my part, but, as I will discuss more next week, that editing is, at this stage of AI ability, part of the process):
The latest release supports open source publishing as well with the ability to publish stories through Medium, Google Books, Amazon Kindle etc., which allows authors access to larger markets than just those who purchase NovelAI's product. These authors include people like Mark Watson (writer behind "The Forgotten Ones", co-authoring books with his wife Elly Griffiths such as "Thrall"), author/game designer Dann Baker ("Gameworld" series; Game: The Life Cycle of Video Games"; Digital Dreams") and even science writer Simon Ings (Author of The Heretic Queen: Katherine of Aragon and Henry VIII"). This level of interest from within the industry shows how far we have come since 2016 when only researchers could get access to these new technologies.
Narrative Science makes software called Quill aimed at generating plots for video games and other narrative driven media. They use the same type of techniques that were applied to creating dialogue between characters to generate story ideas. An early example was about three years ago where they published one million sentences generated by their system in partnership with MIT Media Labs' Storytelling Lab. That work resulted in thousands of articles written around what seemed to be quite good plot lines and interesting character developments.
As far as I can tell all of the "authors, book titles, companies and company products listed above are completely fictional. In one earlier attempt it generated a link to a Medium post (that 404ed). This is what you get for using a tool specialized in fiction to write nonfiction.
More Applications
GraphiAI is a tool aimed directly at marketers. It uses GPT-3 to generate blog posts on specific topics. Like NovelAI it takes a generalized model and then “over-trains” it on specific niches so that it is able to write quality content within a B2B niche. I spoke with the founder, and if there is interest I would be willing to schedule an interview. Let me know by replying to this email.
Randal Munroe (creator of the xkcd comics) used GPT-3 to have conversations with historical figures. In one he asks “Shakespeare” how he decided to incorporate Shrek into new editions of his plays. Unclear how much is real GPT-3 and how much is Monroe. But that is almost the point?
Andy Zeng, a research scientist with GoogleAI has shared some research on “Socratic Models”. The idea is to combine multiple types of AI in order to get to better results. For example, take a video, convert it to text, then interpret the text, and use that to create an image. This allows (among other things) an AI to answer questions about a series of videos.
Karol Hausman, another GoogleAI researcher, explains how these techniques allow for the creation of robots that can respond correctly to verbal requests like “Pick up the soap” or “Place the sponge in the cabinet next to the fridge” - commands that it was never directly trained on.
Arthur B explains why creating music with AI is likely easier than creating images, and that current AI tools may already have “superhuman” abilities in music composition.
Ethan Mollick, a professor at Wharton, shows how having access to AlphaGo has dramatically improved the performance of human Go players at the highest level. Even when AI is better than humans, it can still help us learn and improve.
On a less positive note, Shannon Bond, an NPR tech correspondent investigated how some companies were creating fake LinkedIn profiles using AI facial image generation to engage in fraud.
What are the implications of all this? Will this change marketing? Will it change everything?
This post is already too long for Gmail (and it is un-edited! Dangerous!). Next week I will exploring some of the implications of these tools and go into more detail on how a user works with the AI. Mark your calendars!
Edward