52 min

High Agency Pydantic > VC Backed Frameworks — with Jason Liu of Instructor Latent Space: The AI Engineer Podcast — Practitioners talking LLMs, CodeGen, Agents, Multimodality, AI UX, GPU Infra and al

- Technology

We are reuniting for the 2nd AI UX demo day in SF on Apr 28. Sign up to demo here!
And don’t forget tickets for the AI Engineer World’s Fair — for early birds who join before keynote announcements!
About a year ago there was a lot of buzz around prompt engineering techniques to force structured output. Our friend Simon Willison tweeted a bunch of tips and tricks, but the most iconic one is Riley Goodside making it a matter of life or death:
Guardrails (friend of the pod and AI Engineer speaker), Marvin (AI Engineer speaker), and jsonformer had also come out at the time. In June 2023, Jason Liu (today’s guest!) open sourced his “OpenAI Function Call and Pydantic Integration Module”, now known as Instructor, which quickly turned prompt engineering black magic into a clean, developer-friendly SDK.
A few months later, model providers started to add function calling capabilities to their APIs as well as structured outputs support like “JSON Mode”, which was announced at OpenAI Dev Day (see recap here).
In just a handful of months, we went from threatening to kill grandmas to first-class support from the research labs. And yet, Instructor was still downloaded 150,000 times last month. Why?
What Instructor looks like
Instructor patches your LLM provider SDKs to offer a new response_model option to which you can pass a structure defined in Pydantic. It currently supports OpenAI, Anthropic, Cohere, and a long tail of models through LiteLLM.
What Instructor is for
There are three core use cases to Instructor:
* Extracting structured data: Taking an input like an image of a receipt and extracting structured data from it, such as a list of checkout items with their prices, fees, and coupon codes.
* Extracting graphs: Identifying nodes and edges in a given input to extract complex entities and their relationships. For example, extracting relationships between characters in a story or dependencies between tasks.
* Query understanding: Defining a schema for an API call and using a language model to resolve a request into a more complex one that an embedding could not handle. For example, creating date intervals from queries like “what was the latest thing that happened this week?” to then pass onto a RAG system or similar.
Jason called all these different ways of getting data from LLMs “typed responses”: taking strings and turning them into data structures.
Structured outputs as a planning tool
The first wave of agents was all about open-ended iteration and planning, with projects like AutoGPT and BabyAGI. Models would come up with a possible list of steps, and start going down the list one by one. It’s really easy for them to go down the wrong branch, or get stuck on a single step with no way to intervene.
What if these planning steps were returned to us as DAGs using structured output, and then managed as workflows? This also makes it easy to better train model on how to create these plans, as they are much more structured than a bullet point list. Once you have this structure, each piece can be modified individually by different specialized models.
You can read some of Jason’s experiments here:
While LLMs will keep improving (Llama3 just got released as we write this), having a consistent structure for the output will make it a lot easier to swap models in and out.
Jason’s overall message on how we can move from ReAct loops to more controllable Agent workflows mirrors the “Process” discussion from our Elicit episode:

Watch the talk
As a bonus, here’s Jason’s talk from last year’s AI Engineer Summit. He’ll also be a speaker at this year’s AI Engineer World’s Fair!
Timestamps
* [00:00:00] Introductions
* [00:02:23] Early experiments with Generative AI at StitchFix
* [00:08:11] Design philosophy behind the Instructor library
* [00:11:12] JSON Mode vs Function Calling
* [00:12:30] Single vs parallel function calling
* [00:14:00] How many functions is too many?
* [00:17:39] How to evaluat