FutureBlind Podcast

Max Olson

This is the audio edition of the FutureBlind blog. Episodes will be rare: on occasion I'll record an audio version of an essay that tries to take advantage of the audio medium with clips from others, good sound design, and more. futureblind.com

Episodes

  1. 07/12/2023

    Roundup #7: Augmented Intelligence & The Education Frontier

    This is going to be something a little bit different: a podcast roundup. For the past few years I’ve been doing a roundup post and essay about once a quarter. This will be an audio version of that. I wanted to try a podcast format where it’s primarily clips from others with me narrating along the way. Why? It sounded like fun and I love experimenting with how to make content on different mediums. I think it turned out pretty well, although much longer than I initially thought. Going through all these audio files was a pain, so I wrote a script that transcribed them all and allows me to search by topic and group related content. This was kinda fun to make as well. Most of this episode is on AI, similar to my last few roundups, but I also did a section on the future of education, and I’ve got some carveouts for other interesting content at the end as well. The podcast is almost an hour long, you should be able to skip to relevant sections in the description. And if you’re not an audio person, there’s the usual written version below. * 01:37 — 🤖 The A.I. Frontier * 08:45 — AI: How useful are language models? * 14:05 — AI: What does AI allow us to do? * 14:33 — AI: Easier to communicate * 18:16 — AI: Creativity * 25:50 — AI: Augmented Intelligence * 33:29 — 🧑‍🏫 The Education Frontier * 43:00 — 🔗 Interesting Content 🤖 A.I. Frontier In this section about AI, I’m going to first cover some basic fundamentals of how these AI models work, and then really get into their use cases and what they allow us to do. It’s been 3 years since the original GPT-3 release, 6 months since ChatGPT, and we’ve had GPT-4 and other open source models for more than 3 months now. So with that said, I want to talk about the current state of GenAI, in particular language models because these have more generalized promise. Here’s Andrej Karpathy in a talk called “State of GPT” with more on what models are out there and what they can do: Now since then we've seen an entire evolutionary tree of base models that everyone has trained. Not all of these models are available. For example, the GPT-4 base model was never released. The GPT-4 model that you might be interacting with over API is not a base model, it's an assistant model. And we're going to cover how to get those in a bit. GPT-3 base model is available via the API under the name DaVinci. And GPT-2 base model is available even as weights on our GitHub repo. But currently the best available base model probably is the LLaMA series from Meta, although it is not commercially licensed. Now one thing to point out is base models are not assistants. They don't want to answer to you, they don't want to make answers to your questions. They just want to complete documents. So if you tell them, write a poem about the bread and cheese, it will answer questions with more questions. It's just completing what it thinks is a document. However, you can prompt them in a specific way for base models that is more likely to work. A review of what we have now: * Monolithic foundation models from OpenAI, Google, Anthropic, etc. * Assistant models built on top of these like ChatGPT, Bard, and Claude. * Open source models — particularly the semi-open-sourced LLaMA model from Meta, where you can run it locally and even fine-tune it with your own data. State-of-the-art GPT-4 is actually a “mixture of experts” assistant model that routes your prompt to whichever large model that can complete it best. This is a big difference from the original GPT-3. In the early days of GPT-3, if you “asked” it a question, it had a high chance of spitting out something rude, offensive, or just didn’t really answer the question. How do they get from the base models to the assistant models like ChatGPT that are pleasant to talk to? So instead, we have a different path to make actual GPT assistants, not just base model document completers. And so that takes us into supervised fine tuning. So in the supervised fine tuning stage, we are going to collect small but high-quality data sets. And in this case, we're going to ask human contractors to gather data of the form prompt and ideal response. And we're going to collect lots of these, typically tens of thousands or something like that. And then we're going to still do language modeling on this data. So nothing changed algorithmically. We're just swapping out a training set. So it used to be internet documents, which is a high-quantity, low-quality for basically QA prompt response kind of data. And that is low-quantity, high-quality. So we would still do language modeling. And then after training, we get an SFT model. And you can actually deploy these models. And they are actual assistants. And they work to some extent. The performance of these models seems to have surprised even the engineers that created them. How and why do these models work? Stephen Wolfram released a great essay about this earlier this year aptly called “What is ChatGPT Doing… and Why Does It Work?” The essay was basically a small book, and Wolfram actually did turn it into a small book. So it’s long, but if you’re interested in these, I’d really recommend it, even if not everything makes sense. He explains models, neural nets, and a bunch of other building blocks. Here’s the gist of what ChatGPT does: The basic concept of ChatGPT is at some level rather simple. Start from a huge sample of human-created text from the web, books, etc. Then train a neural net to generate text that’s “like this”. And in particular, make it able to start from a “prompt” and then continue with text that’s “like what it’s been trained with”. As we’ve seen, the actual neural net in ChatGPT is made up of very simple elements—though billions of them. And the basic operation of the neural net is also very simple, consisting essentially of passing input derived from the text it’s generated so far “once through its elements” (without any loops, etc.) for every new word (or part of a word) that it generates. . . . The specific engineering of ChatGPT has made it quite compelling. But ultimately (at least until it can use outside tools) ChatGPT is “merely” pulling out some “coherent thread of text” from the “statistics of conventional wisdom” that it’s accumulated. But it’s amazing how human-like the results are. And as I’ve discussed, this suggests something that’s at least scientifically very important: that human language (and the patterns of thinking behind it) are somehow simpler and more “law like” in their structure than we thought. ChatGPT has implicitly discovered it. But we can potentially explicitly expose it, with semantic grammar, computational language, etc. In a YouTube lecture he put out, Wolfram explains that the ChatGPT's ability to generate plausible text is due to its ability to use the known structures of language, specifically the regularity in grammatical syntax. We know that sentences aren't random jumbles of words. Sentences are made up with nouns in particular places, verbs in particular places, and we can represent that by a parse tree in which we say, here's the whole sentence, there's a noun phrase, a verb phrase, another noun phrase, these are broken down in certain ways. This is the parse tree, and in order for this to be a grammatically correct sentence, there are only certain possible forms of parse tree that correspond to a grammatically correct sentence. So this is a regularity of language that we've known for a couple of thousand years. How useful are language models? How useful really are language models, and what are they good and bad at? I have occasionally used it to summarize super long emails, but I've never used it to write one. I actually summarize documents is something I use it for a lot. It's super good at that. I use it for translation. I use it to learn things. That was Sam Altman, founder of OpenAI, saying he primarily uses it for summarizing. At another event he admitted that ChatGPT plug-ins haven’t really caught on yet, despite their potential. What are they good for then? What are the LLM “primitives”— or the base capabilities that they’re actually good at? The major primitives are: Summarization, text expansion, basic reasoning, semantic search in meaning space. Most of all translation — between human languages, computer languages, and any combination of them. I’ll talk about this in a little bit. They’re clearly not good at retrieving information, or being a store of data. The same way human brains aren’t very good at this naturally without heavy training. This might never really be a core primitive of generalized LLMs. They are not a database and likely will never be. And that’s ok — they can always be supplemented with vector or structured data sources. As I mentioned, the context window of a transformer is its working memory. If you can load the working memory with any information that is relevant to the task, the model will work extremely well, because it can immediately access all that memory. Andrej Karpathy talks here about what LLMs aren’t good at, but also how they can easily be supplemented: And the emerging recipe there is you take relevant documents, you split them up into chunks, you embed all of them, and you basically get embedding vectors that represent that data. You store that in the vector store, and then at test time, you make some kind of a query to your vector store, and you fetch chunks that might be relevant to your task, and you stuff them into the prompt, and then you generate. So this can work quite well in practice. So this is I think similar to when you and I solve problems. You can do everything from your memory and transformers have very large and extensive memory, but also it really helps to reference some primary documents. So whenever you find yourself going back to a textbook to find something

    55 min
  2. 09/27/2022

    Take the Iterative Path

    One of the greatest business successes over the last 20 years has been SpaceX’s rise to dominance. SpaceX now launches more rockets to orbit than any other company (or nation) in the world. They seem to move fast on every level, out executing and out innovating everyone in the industry. Their story has been rightfully told as one of engineering brilliance and determination. But at its core, the key their success is much simpler. There’s a clue in this NASA report on the Commercial Crew Program: SpaceX and Boeing have very different philosophies in terms of how they develop hardware. SpaceX focuses on rapidly iterating through a build-test-learn approach that drives modifications toward design maturity. Boeing utilizes a well-established systems engineering methodology targeted at an initial investment in engineering studies and analysis to mature the system design prior to building and testing the hardware. Each approach has advantages and disadvantages. This is the heart of why SpaceX won. They take an iterative path. Taking the determinate path Let’s talk about the Boeing philosophy first, which is the most common approach taken by other traditional aerospace companies. “There are basically two approaches to building complex systems like rockets: linear and iterative design,” Eric Berger writes in the book “Liftoff” about the early history of SpaceX: The linear method begins with an initial goal, and moves through developing requirements to meet that goal, followed by numerous qualification tests of subsystems before assembling them into the major pieces of the rocket, such as its structures, propulsion, and avionics. With linear design, years are spent engineering a project before development begins. This is because it is difficult, time-consuming, and expensive to modify a design and requirements after beginning to build hardware. I call this the “determinate path” — in trying to accomplish a goal, the path to get there is planned and fixed in advance. In project management this method is called waterfall, an “approach that emphasizes a linear progression from beginning to end of a project. This methodology, often used by engineers, is front-loaded to rely on careful planning, detailed documentation, and consecutive execution.” Spend a lot of time scoping and planning carefully upfront, then move progressively forward step-by-step. This is the “measure twice, cut once” approach. You may be familiar with it as it’s very common in organizations everywhere. There can be many reasons why this path would be taken: * If from the start you have very clear, unambiguous requirements (from customer, management, etc.) * If you think you can figure out how exactly to build something before building it, you’d probably want to plan it all in advance. * If your fixed costs are high, it can force you to make decisions up front. Take traditional auto manufacturing. A door mold machine might cost $50 or $100M, so you have to figure out what the design of the door will be first. (But this means if later they have a new idea for a better car door, they don’t want to change it because of the sunk costs of the mold machine.) * You have a lot of resources, which makes you think you can just brute force it and overwhelm the problem with money and people. (Many overfunded startups are guilty of this.) But there is another way . . . Taking the iterative path When I think of the most impactful technologies over the last 100 years, nearly all were created by small teams of tinkerers. Why? It’s easier for these teams to take an iterative path. Taking this path means rapid prototyping, testing concepts against reality, failing, and adapting. Continuing from the book “Liftoff”: The iterative approach begins with a goal and almost immediately leaps into concept designs, bench tests, and prototypes. The mantra with this approach is build and test early, find failures, and adapt. Focus more on building and finding failure modes than making things perfect. Project managers call it “agile”, or at Facebook, “move fast and break things.” The canonical example of this to me is the Wright brothers, previously bicycle mechanics, building iterations of their airplane design over and over, and failing until they succeeded. This approach ended up being common in the origin stories of all airplane manufacturers and defense companies — Martin Marietta, Lockheed, Northrop Grumman, etc., where again you had relatively small teams of self-taught tinkerers building complex machines through a process of iteration, failure, and learning until they succeed. How can you reconcile this “fail fast” approach with the care that’s needed to reliably build things where human lives are on the line? The answer is that these can be two different parts of the organization. Working together, but with different focuses. “[SpaceX is] launching 5 or 6 times a month and on their pads they need operational excellence with zero risk — you know, they’re doing innovation but it’s minimal innovation. Blowing things up on the pad is not a good idea — you want that down to zero because human lives and certainly lots of capital is at risk.” This is Steve Blank on a recent Village Global podcast. He continues: But on the other hand, they have another part of the company that in fact believes in not only blowing things up on the test pad — because if you’re not doing that you’re not pushing the envelope fast enough — it’s the cycle time of doing that. So they have an agile innovation process. Now think about that. This is the same company doing two very different things with two different groups of people, two different risk profiles, but more importantly they’re talking to each other. It’s not “here are the smart people, and here are the people turning the crank,” they’re learning from each other. The guys building the raptor engines and Starship need to know where the GFC plugs in and what the right materials and things they need to get right on the next rocket. And the people doing the existing rockets can learn about new materials and incremental upgrades so they are innovating but innovating with minimal risk. The iterative path is easier to take when you’re nimble and the cost of failure is low. This is why it’s so common in software. But as the previously mentioned companies have shown, it’s also the best approach in hardware and complex, frontier tech. And just as the traditional aerospace companies have demonstrated, organizations that are very bureaucratic now were almost always more iterative in the past. The early history of Lockheed’s Skunk Works division is informative, which I believe later served as one of the models for SpaceX’s approach. Skunk Works was an R&D group created by Kelly Johnson within Lockheed during the war in 1943 when they got the contract to build the P-80 Shooting Star. From a documentary on the birth of Skunk Works: Lockheed was already swamped in terms of manpower, tooling, and facilities with wartime contracts but this was a blessing in disguise, an opportunity to implement an idea he’d been pestering Robert Gross about for years. Let him round up a small group of talented people: designers, engineers and shop men. Put them under one roof where they could all work closely together and give him complete authority over everything from procurement to flight tests. Johnson gathered 28 engineers including himself, and 105 “shop men” (I assume this just means workers who can build what the engineers design) and built a small facility out of discarded shipping crates using a circus tent for a roof. He then laid out the original rules that would become the foundation for Skunk Works over the next 30 years: . . . he’d be responsible for all decisions. Paperwork and red tape would be cut to the minimum. Each engineer would be designer, shop contact, parts chaser, and mechanic, and each would remain within a stone’s throw of the shop at all times. . . . Forcefully reminded that simplicity is the keynote of good design, the designers jumped into their work. But this was a new kind of operation, and instead of moving from stage to stage, the schedule demanded an extraordinary degree of concurrency. The time from initial concept to delivery of the first P-80 to test pilots would be only 5 months. In fact, nearly all of the early planes coming out of Lockheed took less than 6 months — less than 6 months from concept to delivery. Crazy! Even the famous A-12 (later the SR-71 Blackbird) look less than 4 years from initial idea to roll out. This may seem like a lot when you’re used to super-fast software timelines, but this is 4 years for one of the fastest, most successful aircraft ever built. The scrappy culture lived on in later Skunk Works projects. This is Ben Rich, who led the division in later years, on their building of the F-117 (this is the Darth-Vader-looking stealth fighter you’ve probably seen before): On the F-117, we had to get the guy to climb into the cockpit. So I went to the local builders mart, and bought one of these ladders for 50 bucks, and we just used it. . . . We didn’t have to spend thousands of dollars designing it for Mil spec — military specification — and we did simple things like that. The more you learn about the history of building things, the more you hear stories like this, even with highly complex innovations. The development of the Sidewinder missile is another interesting example: again, small team, rapid iteration, creative solutions to problems. Why is iteration better? Taking the iterative path tests your model against reality, getting to the truth as fast as possible. There are a few major downsides to the linear approach: * Clear specs and requirements from the outset may seem like a good thing. Much of the time though they don’t match reality though. This is especially true in areas that are push

    20 min
  3. The Future of Space, Part II: The Potential

    05/03/2021

    The Future of Space, Part II: The Potential

    Getting to space is about to get a lot easier. I talked about the reasons why in the last episode. Now for the fun part: what it will lead to. This summary is focused on some of the changes we're likely to see in the next 5 to 20 years. Here's a link to the full essay, or you can listen and check out the links below. Links mentioned NASA announcement that Starship had won the contract to land humans on the Moon again "The Lunar Space Elevator" on the Cool Worlds YouTube channel Kessler syndrome — the scenario where one or more orbital collisions cause a cascade of further collisions NASA's Orbital Debris quarterly newsletter Axiom Space's breakdown of why microgravity is beneficial Varda Space Industries Ivan Kirigin interviews Delian Asparouhov about Varda and building factories in space A concept for an asteroid railway (check out the drawings) More on the 1999 NASA study about space-based solar power Resources to follow along /r/spacex — I've been a lurker on this subreddit for nearly 10 years now and can say that it's probably the best source for news and intelligent discussion on SpaceX. NASA Spaceflight has a good discussion board as well. NASA YouTube channel SpaceX YouTube channel Everyday Astronaut YouTube channel — There's a handful of really good space- and rocket-focused channels from both professional rocket scientists and amateur space enthusiasts. Tim Dodd's "Everyday Astronaut" is probably my favorite. Tons of good content, with technical explanations, news breakdowns, interviews and more. Scott Manley's is another good one, as is Marcus House's. Reference material The Case for Space, by Robert Zubrin. Probably the best rundown of why we need to continue our push into space, how we'll get there, and what it will look like when we do. Beyond, by Chris Impey. Very similar to The Case for Space, so if you're interested I would just read that. I read Beyond first a few years ago and it has a lot of great explanations and potential futures, like using nuclear or fusion engines to explore the universe. Winchell Chung's Atomic Rockets. Basically a Wikipedia for space travel and sci-fi concepts. You could spend days on this site without finishing it. Get full access to FutureBlind at futureblind.com/subscribe

    27 min
  4. The Future of Space, Part I: The Setup

    04/26/2021

    The Future of Space, Part I: The Setup

    The cost and ease of getting to space are about to improve by many orders of magnitude. This will drive the space industry to be one of the biggest sources of growth over the next 10-20 years. This is the first of a two-part essay on the upcoming future of the space industry.  See the full post: https://futureblind.com/2021/03/03/the-future-of-space-1/ Footnotes & References Here's the Google Sheet with the table I used to calculate the breakdown of cost to LEO for Falcon and Starship rockets. On Elon Musk's use of first principles that helped build SpaceX. A good explainer video on the full-flow combustion cycle engine and its efficiencies. Here’s a timeline of a few milestones of the recent commercialization of space: 2008-12 — Commercial Resupply Services (CRS) contract of $1.6B to SpaceX and $1.9B to Orbital Sciences to deliver supplies to ISS. This helps fund Falcon 9 development. 2012-05 — SpaceX Dragon capsule launches “empty” to perform tests and dock with the ISS, the first commercial spacecraft ever to do so. 2012-10 — SpaceX CRS-1 mission sends Dragon with supplies to ISS. Dragon is the only cargo vehicle at the time capable of returning supplies to Earth. 2014-09 — NASA awards final Commercial Crew Program (CCP) contract to SpaceX ($2.6B) and Boeing ($4.2B) for the capability to send 4-5 astronauts to the ISS. First flights for both initially planned in 2017. 2020-01 — NASA awards Axiom Space the first ever contract to build a commercial module for the ISS. 2020-04 — NASA awards lunar lander contracts to Blue Origin, Dynetics, and SpaceX under the Artemis program. The goal is to land “the first woman and the next man” on the Moon by 2024. 2020-05 — Commercial Crew Demo mission sends 2 astronauts to ISS. These are the first astronauts on a commercial mission, and the first from US soil since retirement of the Space Shuttle in 2011. 10 million people worldwide watched it live. 2020-11 — Crew 1, the first operational flight, sends 4 astronauts to ISS. Due to delays and other issues, Boeing’s Starliner isn’t set to fly for another year. 2020-12 — NASA awards Blue Origin a Launch Services contract to transport planetary, Earth observation, exploration and scientific satellites. Get full access to FutureBlind at futureblind.com/subscribe

    12 min

About

This is the audio edition of the FutureBlind blog. Episodes will be rare: on occasion I'll record an audio version of an essay that tries to take advantage of the audio medium with clips from others, good sound design, and more. futureblind.com