DBRX and Open Source Mixture of Experts LLMs with Hagay Lupesko ODSC's Ai X Podcast

    • Technology

Listen on Apple Podcasts
Requires macOS 11.4 or higher

Today on our podcast, we're thrilled to have Hagay Lupesko, Senior Director of Engineering in the Mosaic AI team at Databricks and one of the key architects behind Databricks' groundbreaking large language model, DBRX.

Previously Haguy was the VP of Engineering at Moscia ML, which was acquired by Databricks in 2023. Hagay has also held AI engineering leadership roles at Meta, AWS, and GE Healthcare.

Our topic today is the the open-source DBRX large language model, which stands out in the LLM AI landscape due to its innovative use of the Mixture of Experts (MoE) architecture. This architecture allows DBRX to efficiently scale by distributing tasks across 64 combinable experts, allowing the model to select the most suitable set of internal configurations from a pool of experts for each specific task. This results in faster processing and potentially better performance compared to traditional LLM architectures.


We'll be exploring the inspiration behind DBRX, the advantages of Mixture of Experts, and how it positions DBRX within the larger LLM landscape.

Podcast Topics:

- DBRX backstory and Databricks' Mosaic AI Research team
- Inspiration for the open Source DBRX LLMs, and what gap it fills
- Core features of DBRX that distinguish it from other LLMs (LLMs)
- Mixture of Experts - Mixture-of-experts (MoE) architecture
- How Mixture-of-Experts (MoE) architecture enhances LLM
- Comparison to other MoE models like Mixtral-8x7B and Grok-1
- Advanced DBRX Architecture features?
- Rotary Position Encodings (RoPE): https://paperswithcode.com/method/rope
- Gated Linear Units (GLU): https://paperswithcode.com/method/glu
- Grouped Query Attention (GQA): https://towardsdatascience.com/demystifying-gqa-grouped-query-attention-3fb97b678e4a
- GPT-4 Tokenizer (tiktoken): https://github.com/openai/tiktoken
- Types of tasks and applications Mixture-of-Experts models are particularly well-suited for
- RAG (Retrieval Augmented Generation ) and LLMs for Enterprise applications
- How open source LLM MoE Models like DBRX are being used by Databricks customers
- What’s next in 2024 for DBRX, mixture-of-experts models, and LLMs in general
- Keep up with the evolving AI field?

Show Notes:

Learn more about and connect with Hagay Lupesko: https://www.linkedin.com/in/hagaylupesko/

Learn more about DBRX, its use of Mixture of Experts, and scaling laws:

- Introducing DBRX: A New State-of-the-Art Open LLM
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
- Mixture of Experts
https://opendatascience.com/what-is-mixture-of-experts-and-how-can-they-boost-llms/#google_vignette
- Lost in the Middle: How Language Models Use Long Contexts
https://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.arxiv2023.pdf
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
https://hotpotqa.github.io
- Scaling Laws
https://en.wikipedia.org/wiki/Neural_scaling_law
- Training Compute-Optimal Large Language Models
https://arxiv.org/pdf/2203.15556
- Advance Architecture Features & Techniques in DBRX
- Rotary Position Encodings (RoPE): https://paperswithcode.com/method/rope
- Gated Linear Units (GLU): https://paperswithcode.com/method/glu
- Grouped Query Attention (GQA): https://towardsdatascience.com/demystifying-gqa-grouped-query-attention-3fb97b678e4a
- GPT-4 Tokenizer (tiktoken): https://github.com/openai/tiktoken

This episode was sponsored by:
Ai+ Training https://aiplus.training/
Home to hundreds of hours of on-demand, self-paced AI training, ODSC interviews, free webinars, and certifications in in-demand skills like LLMs and Prompt Engineering

And created in partnership with ODSC https://odsc.com/
The Leading AI Training Conference, featuring expert-led, hands-on workshops, training sessions, and talks on cutting-edge AI topics and

Never miss an episode, subscribe now!

Today on our podcast, we're thrilled to have Hagay Lupesko, Senior Director of Engineering in the Mosaic AI team at Databricks and one of the key architects behind Databricks' groundbreaking large language model, DBRX.

Previously Haguy was the VP of Engineering at Moscia ML, which was acquired by Databricks in 2023. Hagay has also held AI engineering leadership roles at Meta, AWS, and GE Healthcare.

Our topic today is the the open-source DBRX large language model, which stands out in the LLM AI landscape due to its innovative use of the Mixture of Experts (MoE) architecture. This architecture allows DBRX to efficiently scale by distributing tasks across 64 combinable experts, allowing the model to select the most suitable set of internal configurations from a pool of experts for each specific task. This results in faster processing and potentially better performance compared to traditional LLM architectures.


We'll be exploring the inspiration behind DBRX, the advantages of Mixture of Experts, and how it positions DBRX within the larger LLM landscape.

Podcast Topics:

- DBRX backstory and Databricks' Mosaic AI Research team
- Inspiration for the open Source DBRX LLMs, and what gap it fills
- Core features of DBRX that distinguish it from other LLMs (LLMs)
- Mixture of Experts - Mixture-of-experts (MoE) architecture
- How Mixture-of-Experts (MoE) architecture enhances LLM
- Comparison to other MoE models like Mixtral-8x7B and Grok-1
- Advanced DBRX Architecture features?
- Rotary Position Encodings (RoPE): https://paperswithcode.com/method/rope
- Gated Linear Units (GLU): https://paperswithcode.com/method/glu
- Grouped Query Attention (GQA): https://towardsdatascience.com/demystifying-gqa-grouped-query-attention-3fb97b678e4a
- GPT-4 Tokenizer (tiktoken): https://github.com/openai/tiktoken
- Types of tasks and applications Mixture-of-Experts models are particularly well-suited for
- RAG (Retrieval Augmented Generation ) and LLMs for Enterprise applications
- How open source LLM MoE Models like DBRX are being used by Databricks customers
- What’s next in 2024 for DBRX, mixture-of-experts models, and LLMs in general
- Keep up with the evolving AI field?

Show Notes:

Learn more about and connect with Hagay Lupesko: https://www.linkedin.com/in/hagaylupesko/

Learn more about DBRX, its use of Mixture of Experts, and scaling laws:

- Introducing DBRX: A New State-of-the-Art Open LLM
https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm
- Mixture of Experts
https://opendatascience.com/what-is-mixture-of-experts-and-how-can-they-boost-llms/#google_vignette
- Lost in the Middle: How Language Models Use Long Contexts
https://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.arxiv2023.pdf
- HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
https://hotpotqa.github.io
- Scaling Laws
https://en.wikipedia.org/wiki/Neural_scaling_law
- Training Compute-Optimal Large Language Models
https://arxiv.org/pdf/2203.15556
- Advance Architecture Features & Techniques in DBRX
- Rotary Position Encodings (RoPE): https://paperswithcode.com/method/rope
- Gated Linear Units (GLU): https://paperswithcode.com/method/glu
- Grouped Query Attention (GQA): https://towardsdatascience.com/demystifying-gqa-grouped-query-attention-3fb97b678e4a
- GPT-4 Tokenizer (tiktoken): https://github.com/openai/tiktoken

This episode was sponsored by:
Ai+ Training https://aiplus.training/
Home to hundreds of hours of on-demand, self-paced AI training, ODSC interviews, free webinars, and certifications in in-demand skills like LLMs and Prompt Engineering

And created in partnership with ODSC https://odsc.com/
The Leading AI Training Conference, featuring expert-led, hands-on workshops, training sessions, and talks on cutting-edge AI topics and

Never miss an episode, subscribe now!

Top Podcasts In Technology

Acquired
Ben Gilbert and David Rosenthal
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Lex Fridman Podcast
Lex Fridman
Search Engine
PJ Vogt, Audacy, Jigsaw
Hard Fork
The New York Times
TED Radio Hour
NPR