6 Folgen

How AI is Built dives into the different building blocks necessary to develop AI applications: how they work, how you can get started, and how you can master them. Build on the breakthroughs of others. Follow along, as Nicolay learns from the best data engineers, ML engineers, solution architects, and tech founders.

How AI Is Built Nicolay Gerold

    • Technologie
    • 5,0 • 3 Bewertungen

How AI is Built dives into the different building blocks necessary to develop AI applications: how they work, how you can get started, and how you can master them. Build on the breakthroughs of others. Follow along, as Nicolay learns from the best data engineers, ML engineers, solution architects, and tech founders.

    Lance v2: Rethinking Columnar Storage for Faster Lookups, Nulls, and Flexible Encodings | changelog 2

    Lance v2: Rethinking Columnar Storage for Faster Lookups, Nulls, and Flexible Encodings | changelog 2

    In this episode of Changelog, Weston Pace dives into the latest updates to LanceDB, an open-source vector database and file format. Lance's new V2 file format redefines the traditional notion of columnar storage, allowing for more efficient handling of large multimodal datasets like images and embeddings. Weston discusses the goals driving LanceDB's development, including null value support, multimodal data handling, and finding an optimal balance for search performance.

    Sound Bites

    "A little bit more power to actually just try."
    "We're becoming a little bit more feature complete with returns of arrow."
    "Weird data representations that are actually really optimized for your use case."

    Key Points


    Weston introduces LanceDB, an open-source multimodal vector database and file format.
    The goals behind LanceDB's design: handling null values, multimodal data, and finding the right balance between point lookups and full dataset scan performance.
    Lance V2 File Format:
    Potential Use Cases

    Conversation Highlights


    On the benefits of Arrow integration: Strengthening the connection with the Arrow data ecosystem for seamless data handling.
    Why "columnar container format"?: A broader definition than "table format" to encompass more unconventional use cases.
    Tackling multimodal data: How LanceDB V2 enables storage of large multimodal data efficiently and without needing tons of memory.
    Python's role in encoding experimentation: Providing a way to rapidly prototype custom encodings and plug them into LanceDB.

    LanceDB:


    X (Twitter)
    GitHub
    Web
    Discord
    VectorDB Recipes
    Lance V2

    Weston Pace:


    LinkedIn
    GitHub

    Nicolay Gerold:


    ⁠LinkedIn⁠
    ⁠X (Twitter)

    Chapters

    00:00 Introducing Lance: A New File Format

    06:46 Enabling Custom Encodings in Lance

    11:51 Exploring the Relationship Between Lance and Arrow

    20:04 New Chapter

    Lance file format, nulls, round-tripping data, optimized data representations, full-text search, encodings, downsides, multimodal data, compression, point lookups, full scan performance, non-contiguous columns, custom encodings


    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/nicolaygerold/message

    • 21 Min.
    Unlocking AI with Supabase: Postgres Configuration, Real-Time Processing, and Extensions

    Unlocking AI with Supabase: Postgres Configuration, Real-Time Processing, and Extensions

    Had a fantastic conversation with Christopher Williams, Solutions Architect at Supabase, about setting up Postgres the right way for AI. We dug deep into Supabase, exploring:


    Core components and how they power real-time AI solutions
    Optimizing Postgres for AI workloads
    The magic of PG Vector and other key extensions
    Supabase’s future and exciting new features

    Had a fantastic conversation with Christopher Williams, Solutions Architect at Supabase, about setting up Postgres the right way for AI. We dug deep into Supabase, exploring:


    Core components and how they power real-time AI solutions
    Optimizing Postgres for AI workloads
    The magic of PG Vector and other key extensions
    Supabase’s future and exciting new features


    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/nicolaygerold/message

    • 31 Min.
    AI Inside Your Database, Real-Time AI, Declarative ML/AI | ep 3

    AI Inside Your Database, Real-Time AI, Declarative ML/AI | ep 3

    If you've ever wanted a simpler way to integrate AI directly into your database, SuperDuperDB might be the answer. SuperDuperDB lets you easily apply AI processes to your data while keeping everything up-to-date with real-time calculations. It works with various databases and aims to make AI development less of a headache.

    In this podcast, we explore:


    How SuperDuperDB bridges the gap between AI and databases.
    The benefits of real-time AI processes within your data deployment.
    SuperDuperDB's framework for configuring AI workflows.
    The future of AI-powered databases.

    Takeaways


    SuperDuperDB enables developers to apply AI processes directly to their data stores
    The platform supports real-time computation of embeddings or classifications, keeping the data deployment up to date
    SuperDuperDB provides a framework for configuring AI processes that work in close combination with the data deployment
    The platform supports a variety of databases, including operational and analytical databases
    SuperDuperDB aims to simplify AI development by abstracting the data layer and infrastructure

    Duncan Blythe:


    LinkedIn

    SuperDuperDB:


    Docs
    Website
    LinkedIn

    Nicolay Gerold:


    LinkedIn
    X (Twitter)

    Chapters

    00:00 Introduction to SuperDuperDB

    04:19 Real-time Computation and Data Deployment

    13:46 Bringing Compute and Database Closer Together

    29:30 Declarative Machine Learning with SuperDuperDB

    35:09 Future Plans for SuperDuperDB

    SuperDuperDB, AI, databases, embeddings, classifications, data deployment, operational databases, analytical databases, AI development, data science


    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/nicolaygerold/message

    • 36 Min.
    Supabase acquires OrioleDB, A New Database Engine for PostgreSQL | changelog 1

    Supabase acquires OrioleDB, A New Database Engine for PostgreSQL | changelog 1

    Supabase just acquired OrioleDB, a storage engine for PostgreSQL.

    Oriole gets creative with MVCC! It uses an UNDO log rather than keeping multiple versions of an entire data row (tuple). This means when you update data, Oriole tracks the changes needed to "undo" the update if necessary. Think of this like the "undo" function in a text editor. Instead of keeping a full copy of the old text, it just remembers what changed. This can be much smaller. This also saves space by eliminating the need for a garbage collection process.



    It also has a bunch of additional performance boosters like data compression, easy integration with data lakes, and index-organized tables.

    Show notes:


    Oriole joins Supabase
    Oriole Git
    Percona Talk on OrioleDB
    Supabase

    Chris Gwilliams:


    LinkedIn

    Nicolay Gerold:


    LinkedIn
    X (Twitter)

    00:42 Introduction to OrioleDB

    04:38 The Undo Log Approach

    08:39 Improving Performance for High Throughput Databases

    11:08 My take on OrioleDB

    OrioleDB, storage engine, Postgres, table access methods, undo log, high throughput databases, automated features, new use cases, S3, data migration


    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/nicolaygerold/message

    • 13 Min.
    AI Powered Data Transformation, Combining gen & trad AI, Semantic Validation | ep 2

    AI Powered Data Transformation, Combining gen & trad AI, Semantic Validation | ep 2

    Today’s guest is Antonio Bustamante, a serial entrepreneur who previously built Kite and Silo and is now working to fix bad data. He is building bem, the data tool to transform any data into the schema your AI and software needs.

    bem.ai is a data tool that focuses on transforming any data into the schema needed for AI and software. It acts as a system's interoperability layer, allowing systems that couldn't communicate before to exchange information. Learn what place LLMs play in data transformation, how to build reliable data infrastructure and more.

    "Surprisingly, the hardest was semi-structured data. That is data that should be structured, but is unreliable, undocumented, hard to work with."

    "We were spending close to four or five million dollars a year just in integrations, which is no small budget for a company that size. So I was pretty much determined to fix this problem once and for all."

    "bem focuses on being the system's interoperability layer."

    "We basically take in anything you send us, we transform it exactly into your internal data schema so that you don't have to parse, process, transform anything of that sort."

    "LLMs are a 30% of it... A lot of it is very, very like thorough validation layers, great infrastructure, just ensuring reliability and connection to our user systems.”

    "You can use a million token context window and feed an entire document to an LLM. I can guarantee you if you don't, semantically chunk it out before you're not going to get the right results.”

    "We're obsessed with time to value... Our milestone is basically five minute onboarding max, and then you're ready to go."

    Antonio Bustamante


    LinkedIn
    X (Twitter)

    bem.ai


    LinkedIn
    Website

    Nicolay Gerold:


    LinkedIn
    X (Twitter)

    Semi-structured data, Data integrations, Large language models (LLMs), Data transformation, Schema interoperability, Fault tolerance, Validation layers, System reliability, Schema evolution, Enterprise software, Data pipelines.

    Chapters

    00:00 The Problem of Integrations

    05:58 Building Fault Tolerant Systems

    13:51 Versioning and Semantic Validation

    27:33 BEM in the Data Ecosystem

    34:40 Future Plans and Onboarding


    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/nicolaygerold/message

    • 37 Min.
    Multimodal AI, Storing 1 Billion Vectors, Building Data Infrastructure | ep 1

    Multimodal AI, Storing 1 Billion Vectors, Building Data Infrastructure | ep 1

    Imagine a world where data bottlenecks, slow data loaders, or memory issues on the VM don't hold back machine learning.

    Machine learning and AI success depends on the speed you can iterate. LanceDB is here to to enable fast experiments on top of terabytes of unstructured data. It is the database for AI. Dive with us into how LanceDB was built, what went into the decision to use Rust as the main implementation language, the potential of AI on top of LanceDB, and more.

    "LanceDB is the database for AI...to manage their data, to do a performant billion scale vector search."

    “We're big believers in the composable data systems vision."

    "You can insert data into LanceDB using Panda's data frames...to sort of really large 'embed the internet' kind of workflows."

    "We wanted to create a new generation of data infrastructure that makes their [AI engineers] lives a lot easier."

    "LanceDB offers up to 1,000 times faster performance than Parquet."

    Change She:


    LinkedIn
    X (Twitter)

    LanceDB:


    X (Twitter)
    GitHub
    Web
    Discord
    VectorDB Recipes

    Nicolay Gerold:


    LinkedIn
    X (Twitter)

    Chapters:

    00:00 Introduction to LanceDB

    02:16 Building LanceDB in Rust

    12:10 Optimizing Data Infrastructure

    26:20 Surprising Use Cases for LanceDB

    32:01 The Future of LanceDB

    LanceDB, AI, database, Rust, multimodal AI, data infrastructure, embeddings, images, performance, Parquet, machine learning, model database, function registries, agents.


    ---

    Send in a voice message: https://podcasters.spotify.com/pod/show/nicolaygerold/message

    • 34 Min.

Kundenrezensionen

5,0 von 5
3 Bewertungen

3 Bewertungen

Top‑Podcasts in Technologie

Lex Fridman Podcast
Lex Fridman
c’t uplink - der IT-Podcast aus Nerdistan
c’t Magazin
Bits und so
Undsoversum GmbH
Hard Fork
The New York Times
NewMinds.AI -  Podcast
Jens Polomski & Max Anzile
Flugforensik - Abstürze und ihre Geschichte
Flugforensik