Open Source Startup Podcast Robby (Cowboy VC) & Tim (Essence VC)
-
- Teknologi
The leading podcast on how to build a successful open source company.
Learn from the founders of HashiCorp, Chronosphere, Vercel, MongoDB, DBT, mobile.dev and more!
-
E134: Making Complex Data RAG-Ready with Unstructured
Brian Raymond is Founder & CEO of Unstructured, the platform to extract and transform complex data for use with every major vector database and LLM framework. Their open source project has 7K stars on GitHub and includes libraries and APIs that let users build custom preprocessing pipelines for labeling, training, and production machine learning pipelines. Today, they have over 6M downloads and 50K companies using their tools.
Unstructured has raised $65M from investors including Bain, Essence VC, and Menlo Ventures.
In this episode, we dig into Brian's process of talking to 100 data scientists before launching Unstructured, why the long tail of data matters for LLMs, competing with their own open source, why being a "boring company" is valuable for today's LLM stack, why they liked having government design partners, why world-class design & marketing are huge differentiators for open source companies & more! -
E133: Reinventing Authorization with Google's Zanzibar Paper
Jake Moshenko is Co-Founder & CEO of AuthZed, the scalable authorization platform based on Google's Zanzibar white paper. Their open source permissions database spiceDB has 5K stars on GitHub and enables fine-grained access control for customer applications.
AuthZed has raised $4M from investors including Work-Bench and Amplify.
In this episode, we dig into the Zanzibar approach to auth, branding themselves as a database, building for big companies from the get-go, their Hacker News launch and how getting on the front page kickstarted their project's growth, monetizing early & more! -
E132: From General Purpose to Specialized Databases
Joran Dirk Greef is Founder & CEO of TigerBeetle, the open source financial transactions database. Their project, also called tigerbeetle, has over 7K stars and is a database designed for mission-critical workloads and performance.
TigerBeetle has raised $6M from investors including Amplify.
In this episode, we discuss why general purpose databases don't scale for high volume transactional workloads - and the need for specialized databases generally, open source vs. source available, the enterprise commercial stack of management, monitoring, security, and identity, their unique take on monetization & more! -
E131: Why the Next Generation of Time Series Databases Will Be Multimodal
Niko West is Co-Founder & CEO of Rerun, the open source visualization engine for streams of multimodal data.
Rerun has raised over $3M from investors including Costanoa.
In this episode, we discuss how Rerun found early success in gaming, why building in Rust was important, how open source expanded the segments Rerun could serve, why they thought about monetization early, the importance of visual and video content & more! -
E130: Orchestrating AI Workloads with Union AI
Ketan Umare is Co-Founder & CEO of Union AI, the scalable MLOps platform focused on AI orchestration based on the flyte open source project.
Union AI has raised $29M from investors including NEA & Nava Ventures.
In this episode, we dig into the differences between Union AI and Airflow, what's unique about orchestrating AI workloads, bringing software engineering practices to AI & more! -
E129: The Race to Help Build Custom AI Models
Sahil Chaudhary is Founder of Glaive AI, the platform to build models that are faster, cheaper and outperform general purpose models with the help of synthetic data.
In this episode, we discuss why education is so important for GenAI infra companies at this stage, how synthetic data helps companies move from prototype to production, why synthetic data may be a better approach vs. cleaning data, why they're targeting AI native startups as an initial market & more!