This is going to be something a little bit different: a podcast roundup. For the past few years I’ve been doing a roundup post and essay about once a quarter. This will be an audio version of that. I wanted to try a podcast format where it’s primarily clips from others with me narrating along the way. Why? It sounded like fun and I love experimenting with how to make content on different mediums. I think it turned out pretty well, although much longer than I initially thought. Going through all these audio files was a pain, so I wrote a script that transcribed them all and allows me to search by topic and group related content. This was kinda fun to make as well. Most of this episode is on AI, similar to my last few roundups, but I also did a section on the future of education, and I’ve got some carveouts for other interesting content at the end as well. The podcast is almost an hour long, you should be able to skip to relevant sections in the description. And if you’re not an audio person, there’s the usual written version below. * 01:37 — 🤖 The A.I. Frontier * 08:45 — AI: How useful are language models? * 14:05 — AI: What does AI allow us to do? * 14:33 — AI: Easier to communicate * 18:16 — AI: Creativity * 25:50 — AI: Augmented Intelligence * 33:29 — 🧑🏫 The Education Frontier * 43:00 — 🔗 Interesting Content 🤖 A.I. Frontier In this section about AI, I’m going to first cover some basic fundamentals of how these AI models work, and then really get into their use cases and what they allow us to do. It’s been 3 years since the original GPT-3 release, 6 months since ChatGPT, and we’ve had GPT-4 and other open source models for more than 3 months now. So with that said, I want to talk about the current state of GenAI, in particular language models because these have more generalized promise. Here’s Andrej Karpathy in a talk called “State of GPT” with more on what models are out there and what they can do: Now since then we've seen an entire evolutionary tree of base models that everyone has trained. Not all of these models are available. For example, the GPT-4 base model was never released. The GPT-4 model that you might be interacting with over API is not a base model, it's an assistant model. And we're going to cover how to get those in a bit. GPT-3 base model is available via the API under the name DaVinci. And GPT-2 base model is available even as weights on our GitHub repo. But currently the best available base model probably is the LLaMA series from Meta, although it is not commercially licensed. Now one thing to point out is base models are not assistants. They don't want to answer to you, they don't want to make answers to your questions. They just want to complete documents. So if you tell them, write a poem about the bread and cheese, it will answer questions with more questions. It's just completing what it thinks is a document. However, you can prompt them in a specific way for base models that is more likely to work. A review of what we have now: * Monolithic foundation models from OpenAI, Google, Anthropic, etc. * Assistant models built on top of these like ChatGPT, Bard, and Claude. * Open source models — particularly the semi-open-sourced LLaMA model from Meta, where you can run it locally and even fine-tune it with your own data. State-of-the-art GPT-4 is actually a “mixture of experts” assistant model that routes your prompt to whichever large model that can complete it best. This is a big difference from the original GPT-3. In the early days of GPT-3, if you “asked” it a question, it had a high chance of spitting out something rude, offensive, or just didn’t really answer the question. How do they get from the base models to the assistant models like ChatGPT that are pleasant to talk to? So instead, we have a different path to make actual GPT assistants, not just base model document completers. And so that takes us into supervised fine tuning. So in the supervised fine tuning stage, we are going to collect small but high-quality data sets. And in this case, we're going to ask human contractors to gather data of the form prompt and ideal response. And we're going to collect lots of these, typically tens of thousands or something like that. And then we're going to still do language modeling on this data. So nothing changed algorithmically. We're just swapping out a training set. So it used to be internet documents, which is a high-quantity, low-quality for basically QA prompt response kind of data. And that is low-quantity, high-quality. So we would still do language modeling. And then after training, we get an SFT model. And you can actually deploy these models. And they are actual assistants. And they work to some extent. The performance of these models seems to have surprised even the engineers that created them. How and why do these models work? Stephen Wolfram released a great essay about this earlier this year aptly called “What is ChatGPT Doing… and Why Does It Work?” The essay was basically a small book, and Wolfram actually did turn it into a small book. So it’s long, but if you’re interested in these, I’d really recommend it, even if not everything makes sense. He explains models, neural nets, and a bunch of other building blocks. Here’s the gist of what ChatGPT does: The basic concept of ChatGPT is at some level rather simple. Start from a huge sample of human-created text from the web, books, etc. Then train a neural net to generate text that’s “like this”. And in particular, make it able to start from a “prompt” and then continue with text that’s “like what it’s been trained with”. As we’ve seen, the actual neural net in ChatGPT is made up of very simple elements—though billions of them. And the basic operation of the neural net is also very simple, consisting essentially of passing input derived from the text it’s generated so far “once through its elements” (without any loops, etc.) for every new word (or part of a word) that it generates. . . . The specific engineering of ChatGPT has made it quite compelling. But ultimately (at least until it can use outside tools) ChatGPT is “merely” pulling out some “coherent thread of text” from the “statistics of conventional wisdom” that it’s accumulated. But it’s amazing how human-like the results are. And as I’ve discussed, this suggests something that’s at least scientifically very important: that human language (and the patterns of thinking behind it) are somehow simpler and more “law like” in their structure than we thought. ChatGPT has implicitly discovered it. But we can potentially explicitly expose it, with semantic grammar, computational language, etc. In a YouTube lecture he put out, Wolfram explains that the ChatGPT's ability to generate plausible text is due to its ability to use the known structures of language, specifically the regularity in grammatical syntax. We know that sentences aren't random jumbles of words. Sentences are made up with nouns in particular places, verbs in particular places, and we can represent that by a parse tree in which we say, here's the whole sentence, there's a noun phrase, a verb phrase, another noun phrase, these are broken down in certain ways. This is the parse tree, and in order for this to be a grammatically correct sentence, there are only certain possible forms of parse tree that correspond to a grammatically correct sentence. So this is a regularity of language that we've known for a couple of thousand years. How useful are language models? How useful really are language models, and what are they good and bad at? I have occasionally used it to summarize super long emails, but I've never used it to write one. I actually summarize documents is something I use it for a lot. It's super good at that. I use it for translation. I use it to learn things. That was Sam Altman, founder of OpenAI, saying he primarily uses it for summarizing. At another event he admitted that ChatGPT plug-ins haven’t really caught on yet, despite their potential. What are they good for then? What are the LLM “primitives”— or the base capabilities that they’re actually good at? The major primitives are: Summarization, text expansion, basic reasoning, semantic search in meaning space. Most of all translation — between human languages, computer languages, and any combination of them. I’ll talk about this in a little bit. They’re clearly not good at retrieving information, or being a store of data. The same way human brains aren’t very good at this naturally without heavy training. This might never really be a core primitive of generalized LLMs. They are not a database and likely will never be. And that’s ok — they can always be supplemented with vector or structured data sources. As I mentioned, the context window of a transformer is its working memory. If you can load the working memory with any information that is relevant to the task, the model will work extremely well, because it can immediately access all that memory. Andrej Karpathy talks here about what LLMs aren’t good at, but also how they can easily be supplemented: And the emerging recipe there is you take relevant documents, you split them up into chunks, you embed all of them, and you basically get embedding vectors that represent that data. You store that in the vector store, and then at test time, you make some kind of a query to your vector store, and you fetch chunks that might be relevant to your task, and you stuff them into the prompt, and then you generate. So this can work quite well in practice. So this is I think similar to when you and I solve problems. You can do everything from your memory and transformers have very large and extensive memory, but also it really helps to reference some primary documents. So whenever you find yourself going back to a textbook to find something