![](/assets/artwork/1x1-42817eea7ade52607a760cbee00d1495.gif)
273 episodes
![](/assets/artwork/1x1-42817eea7ade52607a760cbee00d1495.gif)
GPT Reviews Earkind
-
- News
A daily show about AI made by AI: news, announcements, and research from arXiv, mixed in with some fun. Hosted by Giovani Pete Tizzano, an overly hyped AI enthusiast; Robert, an often unimpressed analyst, Olivia, an overly online reader, and Belinda, a witty research expert.
-
Open LLM Upgrades 🆕 // Gemma 2 Performance 💎 // SeaKR's Self-aware Learning 🧠
HuggingFace has upgraded the Open LLM Leaderboard to v2, adding new benchmarks and improving the evaluation suite for easier reproducibility.
Gemma 2, a new addition to the Gemma family of lightweight open models, delivers the best performance for its size and offers competitive alternatives to models that are 2-3× bigger.
SeaKR is a new model that re-ranks retrieved knowledge based on the LLM's self-aware uncertainty, outperforming existing adaptive RAG methods in generating text with relevant and accurate information.
Step-DPO is a new method that enhances the robustness and factuality of LLMs by learning from human feedback, achieving impressive results in long-chain mathematical reasoning.
Contact: sergi@earkind.com
Timestamps:
00:34 Introduction
01:21 HuggingFace Updates Open LLM Leaderboard
03:19 Gemma 2: Improving Open Language Models at a Practical Size
04:16 From bare metal to a 70B model: infrastructure set-up and scripts
05:21 Fake sponsor
07:11 SeaKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation
08:47 Simulating Classroom Education with LLM-Empowered Agents
10:16 Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
12:31 Outro -
OpenAI Voice Delay ⏰ // Evolution-Simulating language model 🦕 // Multi-granularity vision flow 🌉
OpenAI's advanced Voice Mode for ChatGPT Plus users has been delayed, but the company is taking a cautious approach to ensure safety and reliability.
ESM3 is a language model that can simulate 500 million years of evolution, making biology programmable and opening up possibilities for medicine, biology research, and clean energy.
R2R is an open-source project on GitHub that offers a comprehensive and state-of-the-art retrieval-augmented generation system for developers, making it accessible to anyone who wants to try it out.
MG-LLaVA is a new multi-modal large language model that enhances visual processing capabilities by incorporating a multi-granularity vision flow, including low-resolution, high-resolution, and object-centric features.
Contact: sergi@earkind.com
Timestamps:
00:34 Introduction
01:36 OpenAI Delays ChatGPT Voice Mode
03:27 ESM3 Simulating 500 million years of evolution with a language model
04:38 Rag to Riches
06:00 Fake sponsor
08:11 MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning
09:49 Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon
11:13 Math-LLaVA: Bootstrapping Mathematical Reasoning for Multimodal Large Language Models
13:02 Outro -
Apple-Meta Partnership Fails Due to Privacy 🍎 // AI Meets Quantum Computing ⚛️ // Record Labels Sue AI Startups 🎵
Apple and Meta's failed partnership due to privacy concerns
IBM's integration of AI technology into quantum computing
Record labels suing AI startups for training on copyrighted material
Research papers on improving multimodal understanding, reinforcement learning, and automated software engineering
Contact: sergi@earkind.com
Timestamps:
00:34 Introduction
02:07 Apple shelved the idea of integrating Meta’s AI models over privacy concerns, report says
03:25 IBM Develops The AI-Quantum Link
05:25 Record Labels Sue Two Startups for Training AI Models on Their Songs
06:50 Fake sponsor
08:42 Long Context Transfer from Language to Vision
10:27 WARP: On the Benefits of Weight Averaged Rewarded Policies
12:11 BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
13:55 Outro -
Safe Superintelligence Inc. 👍 // Massive Supercomputer Partnership 💻 // Claude 3.5 Sonnet Launch 🚀
Safe Superintelligence Inc. has launched with the goal of building a safe superintelligence AI that won't turn on humanity.
Dell, Nvidia, and Super Micro Computer are partnering with xAI and Elon Musk to build a massive supercomputer that could use up to 100,000 Nvidia H100 GPUs, potentially making it 4x larger than the biggest existing AI clusters.
Anthropic has launched Claude 3.5 Sonnet, their latest model family, which outperforms competitor models and even their own Claude 3 Opus on a wide range of evaluations.
The papers discussed in this episode explore the decision boundaries of large language models, auto-optimized training hyperparameters for IR models, and thinking step-by-step across modalities using whiteboard-of-thought. These findings could have important implications for the future development of AI.
Contact: sergi@earkind.com
Timestamps:
00:34 Introduction
01:40 Ilya Sutskever Launches Safe Superintelligence Inc.
03:04 Dell joins forces with Nvidia, Grok, xAI and Elon Musk
04:23 Anthropic Lauches Claude 3.5 Sonnet
06:10 Fake sponsor
08:16 Probing the Decision Boundaries of In-context Learning in Large Language Models
09:47 Prompts as Auto-Optimized Training Hyperparameters: Training Best-in-Class IR Models from Scratch with 10 Gold Labels
11:05 Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
12:54 Outro -
DeepMind's AI Soundtracks 🎥 // Challenges of Training AI Clusters ⚡ // Large Language Model Factual Knowledge 🤯
Google DeepMind's new AI tool that generates video soundtracks by combining text prompts with visual content.
Challenges of building large training AI clusters, including power, network topology, and reliability.
How large language models acquire factual knowledge during pretraining and their probabilistic reasoning capabilities.
LLARVA's vision-action instruction tuning that enhances robot learning.
Contact: sergi@earkind.com
Timestamps:
00:34 Introduction
01:47 Google DeepMind’s new AI tool uses video pixels and text prompts to generate soundtracks
03:31 100,000 H100 Clusters: Power, Network Topology, Ethernet vs InfiniBand, Reliability, Failures, Checkpointing
05:22 Large language model data pipelines and Common Crawl (WARC/WAT/WET)
06:47 Fake sponsor
08:20 How Do Large Language Models Acquire Factual Knowledge During Pretraining?
10:01 What Are the Odds? Language Models Are Capable of Probabilistic Reasoning
11:22 LLARVA: Vision-Action Instruction Tuning Enhances Robot Learning
13:06 Outro -
TikTok's AI-Generated Avatars 🌎 // NVIDIA's Synthetic Data 🧪 // Cohere's Generative Models 🤖
TikTok is expanding its Symphony ad suite with AI-generated avatars of creators and paid actors, as well as a global translation tool for multi-language support.
NVIDIA has released an open synthetic data generation pipeline for training large language models, which could benefit industries that rely on natural language processing.
Cohere's latest generative models, Command R and R+, can automate and streamline complex business workflows, saving time and increasing efficiency.
XLand-100B is a large-scale dataset for in-context reinforcement learning, providing a challenging benchmark for researchers in the field. CountGen addresses the challenge of controlling the number of depicted objects in text-to-image generation, while MM-NIAH is the first benchmark specifically designed to test the comprehension abilities of existing multimodal large language models.
Contact: sergi@earkind.com
Timestamps:
00:34 Introduction
01:23 TikTok ads may soon contain AI-generated avatars of your favorite creators
02:59 NVIDIA Releases Open Synthetic Data Generation Pipeline for Training Large Language Models
04:43 Automating Complex Business Workflows with Cohere: Multi-Step Tool Use in Action
06:17 Fake sponsor
08:22 XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning
10:23 Make It Count: Text-to-Image Generation with an Accurate Number of Objects
11:58 Needle In A Multimodal Haystack
13:37 Outro