273 episodes

GPT Reviews Earkind

- News

A daily show about AI made by AI: news, announcements, and research from arXiv, mixed in with some fun. Hosted by Giovani Pete Tizzano, an overly hyped AI enthusiast; Robert, an often unimpressed analyst, Olivia, an overly online reader, and Belinda, a witty research expert.

- 28 JUN 2024
Open LLM Upgrades 🆕 // Gemma 2 Performance 💎 // SeaKR's Self-aware Learning 🧠

Open LLM Upgrades 🆕 // Gemma 2 Performance 💎 // SeaKR's Self-aware Learning 🧠

HuggingFace has upgraded the Open LLM Leaderboard to v2, adding new benchmarks and improving the evaluation suite for easier reproducibility.

Gemma 2, a new addition to the Gemma family of lightweight open models, delivers the best performance for its size and offers competitive alternatives to models that are 2-3× bigger.

SeaKR is a new model that re-ranks retrieved knowledge based on the LLM's self-aware uncertainty, outperforming existing adaptive RAG methods in generating text with relevant and accurate information.

Step-DPO is a new method that enhances the robustness and factuality of LLMs by learning from human feedback, achieving impressive results in long-chain mathematical reasoning.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:21 HuggingFace Updates Open LLM Leaderboard

03:19 Gemma 2: Improving Open Language Models at a Practical Size

04:16 From bare metal to a 70B model: infrastructure set-up and scripts

05:21 Fake sponsor

07:11 SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation

08:47 Simulating Classroom Education with LLM-Empowered Agents

10:16 Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs

12:31 Outro
- 13 min
- 27 JUN 2024
OpenAI Voice Delay ⏰ // Evolution-Simulating language model 🦕 // Multi-granularity vision flow 🌉

OpenAI Voice Delay ⏰ // Evolution-Simulating language model 🦕 // Multi-granularity vision flow 🌉

OpenAI's advanced Voice Mode for ChatGPT Plus users has been delayed, but the company is taking a cautious approach to ensure safety and reliability.

ESM3 is a language model that can simulate 500 million years of evolution, making biology programmable and opening up possibilities for medicine, biology research, and clean energy.

R2R is an open-source project on GitHub that offers a comprehensive and state-of-the-art retrieval-augmented generation system for developers, making it accessible to anyone who wants to try it out.

MG-LLaVA is a new multi-modal large language model that enhances visual processing capabilities by incorporating a multi-granularity vision flow, including low-resolution, high-resolution, and object-centric features.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:36 OpenAI Delays ChatGPT Voice Mode

03:27 ESM3 Simulating 500 million years of evolution with a language model

04:38 Rag to Riches

06:00 Fake sponsor

08:11 MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

09:49 Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon

11:13 Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models

13:02 Outro
- 14 min
- 25 JUN 2024
Apple-Meta Partnership Fails Due to Privacy 🍎 // AI Meets Quantum Computing ⚛️ // Record Labels Sue AI Startups 🎵

Apple-Meta Partnership Fails Due to Privacy 🍎 // AI Meets Quantum Computing ⚛️ // Record Labels Sue AI Startups 🎵

Apple and Meta's failed partnership due to privacy concerns

IBM's integration of AI technology into quantum computing

Record labels suing AI startups for training on copyrighted material

Research papers on improving multimodal understanding, reinforcement learning, and automated software engineering

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

02:07 Apple shelved the idea of integrating Meta’s AI models over privacy concerns, report says

03:25 IBM Develops The AI-Quantum Link

05:25 Record Labels Sue Two Startups for Training AI Models on Their Songs

06:50 Fake sponsor

08:42 Long Context Transfer from Language to Vision

10:27 WARP: On the Benefits of Weight Averaged Rewarded Policies

12:11 BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

13:55 Outro
- 15 min
- 24 JUN 2024
Safe Superintelligence Inc. 👍 // Massive Supercomputer Partnership 💻 // Claude 3.5 Sonnet Launch 🚀

Safe Superintelligence Inc. 👍 // Massive Supercomputer Partnership 💻 // Claude 3.5 Sonnet Launch 🚀

Safe Superintelligence Inc. has launched with the goal of building a safe superintelligence AI that won't turn on humanity.

Dell, Nvidia, and Super Micro Computer are partnering with xAI and Elon Musk to build a massive supercomputer that could use up to 100,000 Nvidia H100 GPUs, potentially making it 4x larger than the biggest existing AI clusters.

Anthropic has launched Claude 3.5 Sonnet, their latest model family, which outperforms competitor models and even their own Claude 3 Opus on a wide range of evaluations.

The papers discussed in this episode explore the decision boundaries of large language models, auto-optimized training hyperparameters for IR models, and thinking step-by-step across modalities using whiteboard-of-thought. These findings could have important implications for the future development of AI.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:40 Ilya Sutskever Launches Safe Superintelligence Inc.

03:04 Dell joins forces with Nvidia, Grok, xAI and Elon Musk

04:23 Anthropic Lauches Claude 3.5 Sonnet

06:10 Fake sponsor

08:16 Probing the Decision Boundaries of In-context Learning in Large Language Models

09:47 Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels

11:05 Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities

12:54 Outro
- 14 min
- 24 JUN 2024
DeepMind's AI Soundtracks 🎥 // Challenges of Training AI Clusters ⚡ // Large Language Model Factual Knowledge 🤯

DeepMind's AI Soundtracks 🎥 // Challenges of Training AI Clusters ⚡ // Large Language Model Factual Knowledge 🤯

Google DeepMind's new AI tool that generates video soundtracks by combining text prompts with visual content.

Challenges of building large training AI clusters, including power, network topology, and reliability.

How large language models acquire factual knowledge during pretraining and their probabilistic reasoning capabilities.

LLARVA's vision-action instruction tuning that enhances robot learning.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:47 Google DeepMind’s new AI tool uses video pixels and text prompts to generate soundtracks

03:31 100,000 H100 Clusters: Power, Network Topology, Ethernet vs InfiniBand, Reliability, Failures, Checkpointing

05:22 Large language model data pipelines and Common Crawl (WARC/WAT/WET)

06:47 Fake sponsor

08:20 How Do Large Language Models Acquire Factual Knowledge During Pretraining?

10:01 What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

11:22 LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning

13:06 Outro
- 14 min
- 17 JUN 2024
TikTok's AI-Generated Avatars 🌎 // NVIDIA's Synthetic Data 🧪 // Cohere's Generative Models 🤖

TikTok's AI-Generated Avatars 🌎 // NVIDIA's Synthetic Data 🧪 // Cohere's Generative Models 🤖

TikTok is expanding its Symphony ad suite with AI-generated avatars of creators and paid actors, as well as a global translation tool for multi-language support.

NVIDIA has released an open synthetic data generation pipeline for training large language models, which could benefit industries that rely on natural language processing.

Cohere's latest generative models, Command R and R+, can automate and streamline complex business workflows, saving time and increasing efficiency.

XLand-100B is a large-scale dataset for in-context reinforcement learning, providing a challenging benchmark for researchers in the field. CountGen addresses the challenge of controlling the number of depicted objects in text-to-image generation, while MM-NIAH is the first benchmark specifically designed to test the comprehension abilities of existing multimodal large language models.

Contact: sergi@earkind.com

Timestamps:

00:34 Introduction

01:23 TikTok ads may soon contain AI-generated avatars of your favorite creators

02:59 NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models

04:43 Automating Complex Business Workflows with Cohere: Multi-Step Tool Use in Action

06:17 Fake sponsor

08:22 XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning

10:23 Make It Count: Text-to-Image Generation with an Accurate Number of Objects

11:58 Needle In A Multimodal Haystack

13:37 Outro
- 14 min