ThursdAI - The top AI news from the past week

From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week
ThursdAI - The top AI news from the past week

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news

  1. 6 DAYS AGO

    ๐Ÿ“† ThursdAI - Nov 14 - Qwen 2.5 Coder, No Walls, Gemini 1114 ๐Ÿ‘‘ LLM, ChatGPT OS integrations & more AI news

    This week is a very exciting one in the world of AI news, as we get 3 SOTA models, one in overall LLM rankings, on in OSS coding and one in OSS voice + a bunch of new breaking news during the show (which we reacted to live on the pod, and as we're now doing video, you can see us freak out in real time at 59:32) 00:00 Welcome to ThursdAI 00:25 Meet the Hosts 02:38 Show Format and Community 03:18 TLDR Overview 04:01 Open Source Highlights 13:31 Qwen Coder 2.5 Release 14:00 Speculative Decoding and Model Performance 22:18 Interactive Demos and Artifacts 28:20 Training Insights and Future Prospects 33:54 Breaking News: Nexus Flow 36:23 Exploring Athene v2 Agent Capabilities 36:48 Understanding ArenaHard and Benchmarking 40:55 Scaling and Limitations in AI Models 43:04 Nexus Flow and Scaling Debate 49:00 Open Source LLMs and New Releases 52:29 FrontierMath Benchmark and Quantization Challenges 58:50 Gemini Experimental 1114 Release and Performance 01:11:28 LLM Observability with Weave 01:14:55 Introduction to Tracing and Evaluations 01:15:50 Weave API Toolkit Overview 01:16:08 Buzz Corner: Weights & Biases 01:16:18 Nous Forge Reasoning API 01:26:39 Breaking News: OpenAI's New MacOS Features 01:27:41 Live Demo: ChatGPT Integration with VS Code 01:34:28 Ultravox: Real-Time AI Conversations 01:42:03 Tilde Research and Stargazer Tool 01:46:12 Conclusion and Final Thoughts This week also, there was a debate online, whether deep learning (and scale is all you need) has hit a wall, with folks like Ilya Sutskever being cited by publications claiming it has, folks like Yann LeCoon calling "I told you so". TL;DR? multiple huge breakthroughs later, and both Oriol from DeepMind and Sam Altman are saying "what wall?" and Heiner from X.ai saying "skill issue", there is no walls in sight, despite some tech journalism love to pretend there is. Also, what happened to Yann? ๐Ÿ˜ตโ€๐Ÿ’ซ Ok, back to our scheduled programming, here's the TL;DR, afterwhich, a breakdown of the most important things about today's update, and as always, I encourage you to watch / listen to the show, as we cover way more than I summarize here ๐Ÿ™‚ TL;DR and Show Notes: * Open Source LLMs * Qwen Coder 2.5 32B (+5 others) - Sonnet @ home (HF, Blog, Tech Report) * The End of Quantization? (X, Original Thread) * Epoch : FrontierMath new benchmark for advanced MATH reasoning in AI (Blog) * Common Corpus: Largest multilingual 2T token dataset (blog) * NexusFlow - Athena v2 - open model suite (X, Blog, HF) * Big CO LLMs + APIs * Gemini 1114 is new king LLM #1 LMArena (X) * Nous Forge Reasoning API - beta (Blog, X) * Reuters reports "AI is hitting a wall" and it's becoming a meme (Article) * Cursor acq. SuperMaven (X) * This Weeks Buzz * Weave JS/TS support is here ๐Ÿ™Œ * Voice & Audio * Fixie releases UltraVox SOTA (Demo, HF, API) * Suno v4 is coming and it's bonkers amazing (Alex Song, SOTA Jingle) * Tools demoed * Qwen artifacts - HF Demo * Tilde Galaxy - Interp Tool This is a public episode. If youโ€™d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe

    1h 49m
  2. NOV 8

    ๐Ÿ“† ThursdAI - Nov 7 - Video version, full o1 was given and taken away, Anthropic price hike-u, halloween ๐Ÿ’€ recap & more AI news

    ๐Ÿ‘‹ Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr). I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams submitted incredible projects ๐Ÿ‘ You can follow some of these here I then decided to stick around and record the show from SF, and finally pulled the plug and asked for some budget, and I present, the first ThursdAI, recorded from the newly minted W&B Podcast studio at our office in SF ๐ŸŽ‰ This isn't the only first, today also, for the first time, all of the regular co-hosts of ThursdAI, met on video for the first time, after over a year of hanging out weekly, we've finally made the switch to video, and you know what? Given how good AI podcasts are getting, we may have to stick around with this video thing! We played one such clip from a new model called hertz-dev, which is a Given that today's episode is a video podcast, I would love for you to see it, so here's the timestamps for the chapters, which will be followed by the TL;DR and show notes in raw format. I would love to hear from folks who read the longer form style newsletters, do you miss them? Should I bring them back? Please leave me a comment ๐Ÿ™ (I may send you a survey) This was a generally slow week (for AI!! not for... ehrm other stuff) and it was a fun podcast! Leave me a comment about what you think about this new format. Chapter Timestamps 00:00 Introduction and Agenda Overview 00:15 Open Source LLMs: Small Models 01:25 Open Source LLMs: Large Models 02:22 Big Companies and LLM Announcements 04:47 Hackathon Recap and Community Highlights 18:46 Technical Deep Dive: HertzDev and FishSpeech 33:11 Human in the Loop: AI Agents 36:24 Augmented Reality Lab Assistant 36:53 Hackathon Highlights and Community Vibes 37:17 Chef Puppet and Meta Ray Bans Raffle 37:46 Introducing Fester the Skeleton 38:37 Fester's Performance and Community Reactions 39:35 Technical Insights and Project Details 42:42 Big Companies API Updates 43:17 Haiku 3.5: Performance and Pricing 43:44 Comparing Haiku and Sonnet Models 51:32 XAI Grok: New Features and Pricing 57:23 OpenAI's O1 Model: Leaks and Expectations 01:08:42 Transformer ASIC: The Future of AI Hardware 01:13:18 The Future of Training and Inference Chips 01:13:52 Oasis Demo and Etched AI Controversy 01:14:37 Nisten's Skepticism on Etched AI 01:19:15 Human Layer Introduction with Dex 01:19:24 Building and Managing AI Agents 01:20:54 Challenges and Innovations in AI Agent Development 01:21:28 Human Layer's Vision and Future 01:36:34 Recap and Closing Remarks ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Show Notes and Links: * Interview * Dexter Horthy (X) from HumanLayer * Open Source LLMs * SmolLM2: the new, best, and open 1B-parameter language mode (X) * Meta released MobileLLM (125M, 350M, 600M, 1B) (HF) * Tencent Hunyuan Large - 389B X 52B (Active) MoE (X, HF, Paper) * Big CO LLMs + APIs * OpenAI buys and opens chat.com * Anthropic releases Claude Haiku 3.5 via API (X, Blog) * OpenAI drops o1 full - and pulls it back (but not before it got Jailbroken) * X.ai now offers $25/mo free of Grok API credits (X, Platform) * Etched announces Sonu - first Transformer ASIC - 500K tok/s (etched) * PPXL is not valued at 9B lol * This weeks Buzz * Recap of SF Hackathon w/ AI Tinkerers (X) * Fester the Halloween Toy aka Project Halloweave videos from trick or treating (X, Writeup) * Voice & Audio * Hertz-dev - 8.5B conversation audio gen (X, Blog ) * Fish Agent v0.1 3B - Speech to Speech model (HF, Demo) * AI Art & Diffusion & 3D * FLUX 1.1 [pro] is how HD - 4x resolution (X, blog) Full Transcription for convenience below: This is a public episode. If youโ€™d like to discuss this with other subscr

    1h 38m
  3. NOV 1

    ๐Ÿ“† ThursdAI - Spooky Halloween edition with Video!

    Hey everyone, Happy Halloween! Alex here, coming to you live from my mad scientist lair! For the first ever, live video stream of ThursdAI, I dressed up as a mad scientist and had my co-host, Fester the AI powered Skeleton join me (as well as my usual cohosts haha) in a very energetic and hopefully entertaining video stream! Since it's Halloween today, Fester (and I) have a very busy schedule, so no super length ThursdAI news-letter today, as we're still not in the realm of Gemini being able to write a decent draft that takes everything we talked about and cover all the breaking news, I'm afraid I will have to wish you a Happy Halloween and ask that you watch/listen to the episode. The TL;DR and show links from today, don't cover all the breaking news but the major things we saw today (and caught live on the show as Breaking News) were, ChatGPT now has search, Gemini has grounded search as well (seems like the release something before Google announces it streak from OpenAI continues). Here's a quick trailer of the major things that happened: This weeks buzz - Halloween AI toy with Weave In this weeks buzz, my long awaited Halloween project is finally live and operational! I've posted a public Weave dashboard here and the code (that you can run on your mac!) here Really looking forward to see all the amazing costumers the kiddos come up with and how Gemini will be able to respond to them, follow along! ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Ok and finally my raw TL;DR notes and links for this week. Happy halloween everyone, I'm running off to spook the kiddos (and of course record and post about it!) ThursdAI - Oct 31 - TL;DR TL;DR of all topics covered: * Open Source LLMs: * Microsoft's OmniParser: SOTA UI parsing (MIT Licensed) ๐• * Groundbreaking model for web automation (MIT license). * State-of-the-art UI parsing and understanding. * Outperforms GPT-4V in parsing web UI. * Designed for web automation tasks. * Can be integrated into various development workflows. * ZhipuAI's GLM-4-Voice: End-to-end Chinese/English speech ๐• * End-to-end voice model for Chinese and English speech. * Open-sourced and readily available. * Focuses on direct speech understanding and generation. * Potential applications in various speech-related tasks. * Meta releases LongVU: Video LM for long videos ๐• * Handles long videos with impressive performance. * Uses DINOv2 for downsampling, eliminating redundant scenes. * Fuses features using DINOv2 and SigLIP. * Select tokens are passed to Qwen2/Llama-3.2-3B. * Demo and model are available on HuggingFace. * Potential for significant advancements in video understanding. * OpenAI new factuality benchmark (Blog, Github) * Introducing SimpleQA: new factuality benchmark * Goal: high correctness, diversity, challenging for frontier models * Question Curation: AI trainers, verified by second trainer * Quality Assurance: 3% inherent error rate * Topic Diversity: wide range of topics * Grading Methodology: "correct", "incorrect", "not attempted" * Model Comparison: smaller models answer fewer correctly * Calibration Measurement: larger models more calibrated * Limitations: only for short, fact-seeking queries * Conclusion: drive research on trustworthy AI * Big CO LLMs + APIs: * ChatGPT now has Search! (X) * Grounded search results in browsing the web * Still hallucinates * Reincarnation of Search GPT inside ChatGPT * Apple Intelligence Launch: Image features for iOS 18.2 [๐•]( Link not provided in source material) * Officially launched for developers in iOS 18.2. * Includes Image Playground and Gen Moji. * Aims to enhance image creation and manipulation on iPhones. * GitHub Universe AI News: Co-pilot expands, new Spark tool ๐• * GitHub Co-pilot now supports Claude, Gemini, and OpenAI models. * GitHub Spark: Create micro-apps using natural

    1h 49m
  4. OCT 25

    ๐Ÿ“… ThursdAI - Oct 24 - Claude 3.5 controls your PC?! Talking AIs with ๐Ÿฆพ, Multimodal Weave, Video Models mania + more AI news from this ๐Ÿ”ฅ week.

    Hey all, Alex here, coming to you from the (surprisingly) sunny Seattle, with just a mind-boggling week of releases. Really, just on Tuesday there was so much news already! I had to post a recap thread, something I do usually after I finish ThursdAI! From Anthropic reclaiming close-second sometimes-first AI lab position + giving Claude the wheel in the form of computer use powers, to more than 3 AI video generation updates with open source ones, to Apple updating Apple Intelligence beta, it's honestly been very hard to keep up, and again, this is literally part of my job! But once again I'm glad that we were able to cover this in ~2hrs, including multiple interviews with returning co-hosts ( Simon Willison came back, Killian came back) so definitely if you're only a reader at this point, listen to the show! Ok as always (recently) the TL;DR and show notes at the bottom (I'm trying to get you to scroll through ha, is it working?) so grab a bucket of popcorn, let's dive in ๐Ÿ‘‡ ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Claude's Big Week: Computer Control, Code Wizardry, and the Mysterious Case of the Missing Opus Anthropic dominated the headlines this week with a flurry of updates and announcements. Let's start with the new Claude Sonnet 3.5 (really, they didn't update the version number, it's still 3.5 tho a different API model) Claude Sonnet 3.5: Coding Prodigy or Benchmark Buster? The new Sonnet model shows impressive results on coding benchmarks, surpassing even OpenAI's O1 preview on some. "It absolutely crushes coding benchmarks like Aider and Swe-bench verified," I exclaimed on the show. But a closer look reveals a more nuanced picture. Mixed results on other benchmarks indicate that Sonnet 3.5 might not be the universal champion some anticipated. My friend who has held back internal benchmarks was disappointed highlighting weaknesses in scientific reasoning and certain writing tasks. Some folks are seeing it being lazy-er for some full code completion, while the context window is now doubled from 4K to 8K! This goes to show again, that benchmarks don't tell the full story, so we wait for LMArena (formerly LMSys Arena) and the vibe checks from across the community. However it absolutely dominates in code tasks, that much is clear already. This is a screenshot of the new model on Aider code editing benchmark, a fairly reliable way to judge models code output, they also have a code refactoring benchmark Haiku 3.5 and the Vanishing Opus: Anthropic's Cryptic Clues Further adding to the intrigue, Anthropic announced Claude 3.5 Haiku! They usually provide immediate access, but Haiku remains elusive, saying that it's available by end of the month, which is very very soon. Making things even more curious, their highly anticipated Opus model has seemingly vanished from their website. "They've gone completely silent on 3.5 Opus," Simon Willison (๐•) noted, mentioning conspiracy theories that this new Sonnet might simply be a rebranded Opus? ๐Ÿ•ฏ๏ธ ๐Ÿ•ฏ๏ธ We'll make a summoning circle for new Opus and update you once it lands (maybe next year) Claude Takes Control (Sort Of): Computer Use API and the Dawn of AI Agents (๐•) The biggest bombshell this week? Anthropic's Computer Use. This isn't just about executing code; itโ€™s about Claude interacting with computers, clicking buttons, browsing the web, and yes, even ordering pizza! Killian Lukas (๐•), creator of Open Interpreter, returned to ThursdAI to discuss this groundbreaking development. "This stuff of computer useโ€ฆitโ€™s the same argument for having humanoid robots, the web is human shaped, and we need AIs to interact with computers and the web the way humans do" Killian explained, illuminating the potential for bridging the digital and physical worlds. Simon, though enthusiastic, provided a dose of realism

    1h 56m
  5. OCT 18

    ๐Ÿ“† ThursdAI - Oct 17 - Robots, Rockets, and Multi Modal Mania with open source voice cloning, OpenAI new voice API and more AI news

    Hey folks, Alex here from Weights & Biases, and this week has been absolutely bonkers. From robots walking among us to rockets landing on chopsticks (well, almost), the future is feeling palpably closer. And if real-world robots and reusable spaceship boosters weren't enough, the open-source AI community has been cooking, dropping new models and techniques faster than a Starship launch. So buckle up, grab your space helmet and noise-canceling headphones (weโ€™ll get to why those are important!), and let's blast off into this weekโ€™s AI adventures! TL;DR and show-notes + links at the end of the post ๐Ÿ‘‡ Robots and Rockets: A Glimpse into the Future I gotta start with the real-world stuff because, let's be honest, it's mind-blowing. We had Robert Scoble (yes, the Robert Scoble) join us after attending the Tesla We, Robot AI event, reporting on Optimus robots strolling through crowds, serving drinks, and generally being ridiculously futuristic. Autonomous robo-taxis were also cruising around, giving us a taste of a driverless future. Robertโ€™s enthusiasm was infectious: "It was a vision of the future, and from that standpoint, it succeeded wonderfully." I couldn't agree more. While the market might have had a mini-meltdown (apparently investors aren't ready for robot butlers yet), the sheer audacity of Teslaโ€™s vision is exhilarating. These robots aren't just cool gadgets; they represent a fundamental shift in how we interact with technology and the world around us. And theyโ€™re learning fast. Just days after the event, Tesla released a video of Optimus operating autonomously, showcasing the rapid progress theyโ€™re making. And speaking of audacious visions, SpaceX decided to one-up everyone (including themselves) by launching Starship and catching the booster with Mechazilla โ€“ their giant robotic chopsticks (okay, technically a launch tower, but you get the picture). Waking up early with my daughter to watch this live was pure magic. As Ryan Carson put it, "It was magical watching thisโ€ฆ my kid who's 16โ€ฆ all of his friends are getting their imaginations lit by this experience." Thatโ€™s exactly what we need - more imagination and less doomerism! The future is coming whether we like it or not, and I, for one, am excited. Open Source LLMs and Tools: The Community Delivers (Again!) Okay, back to the virtual world (for now). This week's open-source scene was electric, with new model releases and tools that have everyone buzzing (and benchmarking like crazy!). * Nemotron 70B: Hype vs. Reality: NVIDIA dropped their Nemotron 70B instruct model, claiming impressive scores on certain benchmarks (Arena Hard, AlpacaEval), even suggesting it outperforms GPT-4 and Claude 3.5. As always, we take these claims with a grain of salt (remember Reflection?), and our resident expert, Nisten, was quick to run his own tests. The verdict? Nemotron is good, "a pretty good model to use," but maybe not the giant-killer some hyped it up to be. Still, kudos to NVIDIA for pushing the open-source boundaries. (Hugging Face, Harrison Kingsley evals) * Zamba 2 : Hybrid Vigor: Zyphra, in collaboration with NVIDIA, released Zamba 2, a hybrid Sparse Mixture of Experts (SME) model. We had Paolo Glorioso, a researcher from Ziphra, join us to break down this unique architecture, which combines the strengths of transformers and state space models (SSMs). He highlighted the memory and latency advantages of SSMs, especially for on-device applications. Definitely worth checking out if youโ€™re interested in transformer alternatives and efficient inference. * Zyda 2: Data is King (and Queen): Alongside Zamba 2, Zyphra also dropped Zyda 2, a massive 5 trillion token dataset, filtered, deduplicated, and ready for LLM training. This kind of open-source data release is a huge boon to the community, fueling the next generation of models. (X) * Ministral: Pocket-Sized Power: On the one-year anniversary of the iconic Mistral 7B release, Mistral announced two new sm

    1h 35m
  6. OCT 10

    ๐Ÿ“† ThursdAI - Oct 10 - Two Nobel Prizes in AI!? Meta Movie Gen (and sounds ) amazing, Pyramid Flow a 2B video model, 2 new VLMs & more AI news!

    Hey Folks, we are finally due for a "relaxing" week in AI, no more HUGE company announcements (if you don't consider Meta Movie Gen huge), no conferences or dev days, and some time for Open Source projects to shine. (while we all wait for Opus 3.5 to shake things up) This week was very multimodal on the show, we covered 2 new video models, one that's tiny and is open source, and one massive from Meta that is aiming for SORA's crown, and 2 new VLMs, one from our friends at REKA that understands videos and audio, while the other from Rhymes is apache 2 licensed and we had a chat with Kwindla Kramer about OpenAI RealTime API and it's shortcomings and voice AI's in general. ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. All right, let's TL;DR and show notes, and we'll start with the 2 Nobel prizes in AI ๐Ÿ‘‡ * 2 AI nobel prizes * John Hopfield and Geoffrey Hinton have been awarded a Physics Nobel prize * Demis Hassabis, John Jumper & David Baker, have been awarded this year's #NobelPrize in Chemistry. * Open Source LLMs & VLMs * TxT360: a globally deduplicated dataset for LLM pre-training ( Blog, Dataset) * Rhymes Aria - 25.3B multimodal MoE model that can take image/video inputs Apache 2 (Blog, HF, Try It) * Maitrix and LLM360 launch a new decentralized arena (Leaderboard, Blog) * New Gradio 5 with server side rendering (X) * LLamaFile now comes with a chat interface and syntax highlighting (X) * Big CO LLMs + APIs * OpenAI releases MLEBench - new kaggle focused benchmarks for AI Agents (Paper, Github) * Inflection is still alive - going for enterprise lol (Blog) * new Reka Flash 21B - (X, Blog, Try It) * This weeks Buzz * We chatted about Cursor, it went viral, there are many tips * WandB releases HEMM - benchmarks of text-to-image generation models (X, Github, Leaderboard) * Vision & Video * Meta presents Movie Gen 30B - img and text to video models (blog, paper) * Pyramid Flow - open source img2video model MIT license (X, Blog, HF, Paper, Github) * Voice & Audio * Working with OpenAI RealTime Audio - Alex conversation with Kwindla from trydaily.com * Cartesia Sonic goes multilingual (X) * Voice hackathon in SF with 20K prizes (and a remote track) - sign up * Tools * LM Studio ships with MLX natively (X, Download) * UITHUB.com - turn any github repo into 1 long file for LLMs A Historic Week: TWO AI Nobel Prizes! This week wasn't just big; it was HISTORIC. As Yam put it, "two Nobel prizes for AI in a single week. It's historic." And he's absolutely spot on! Geoffrey Hinton, often called the "grandfather of modern AI," alongside John Hopfield, were awarded the Nobel Prize in Physics for their foundational work on neural networks - work that paved the way for everything we're seeing today. Think back propagation, Boltzmann machines โ€“ these are concepts that underpin much of modern deep learning. Itโ€™s about time they got the recognition they deserve! Yoshua Bengio posted about this in a very nice quote: @HopfieldJohn and @geoffreyhinton, along with collaborators, have created a beautiful and insightful bridge between physics and AI. They invented neural networks that were not only inspired by the brain, but also by central notions in physics such as energy, temperature, system dynamics, energy barriers, the role of randomness and noise, connecting the local properties, e.g., of atoms or neurons, to global ones like entropy and attractors. And they went beyond the physics to show how these ideas could give rise to memory, learning and generative models; concepts which are still at the forefront of modern AI research And Hinton's post-Nobel quote? Pure gold: โ€œIโ€™m particularly proud of the fact that one of my students fired Sam Altman." He went on to explain his concerns about OpenAI's apparent shift in focus from safety to profits. Spicy take! It sparked quite a conversation about the

    1h 30m
  7. OCT 4

    ๐Ÿ“† ThursdAI - Oct 3 - OpenAI RealTime API, ChatGPT Canvas & other DevDay news (how I met Sam Altman), Gemini 1.5 8B is basically free, BFL makes FLUX 1.1 6x faster, Rev breaks whisper records...

    Hey, it's Alex. Ok, so mind is officially blown. I was sure this week was going to be wild, but I didn't expect everyone else besides OpenAI to pile on, exactly on ThursdAI. Coming back from Dev Day (number 2) and am still processing, and wanted to actually do a recap by humans, not just the NotebookLM one I posted during the keynote itself (which was awesome and scary in a "will AI replace me as a podcaster" kind of way), and was incredible to have Simon Willison who was sitting just behind me most of Dev Day, join me for the recap! But then the news kept coming, OpenAI released Canvas, which is a whole new way of interacting with chatGPT, BFL released a new Flux version that's 8x faster, Rev released a Whisper killer ASR that does diarizaiton and Google released Gemini 1.5 Flash 8B, and said that with prompt caching (which OpenAI now also has, yay) this will cost a whopping 0.01 / Mtok. That's 1 cent per million tokens, for a multimodal model with 1 million context window. ๐Ÿคฏ This whole week was crazy, as last ThursdAI after finishing the newsletter I went to meet tons of folks at the AI Tinkerers in Seattle, and did a little EvalForge demo (which you can see here) and wanted to share EvalForge with you as well, it's early but very promising so feedback and PRs are welcome! WHAT A WEEK, TL;DR for those who want the links and let's dive in ๐Ÿ‘‡ * OpenAI - Dev Day Recap (Alex, Simon Willison) * Recap of Dev Day * RealTime API launched * Prompt Caching launched * Model Distillation is the new finetune * Finetuning 4o with images (Skalski guide) * Fireside chat Q&A with Sam * Open Source LLMs * NVIDIA finally releases NVML (HF) * This weeks Buzz * Alex discussed his demo of EvalForge at the AI Tinkers event in Seattle in "This Week's Buzz". (Demo, EvalForge, AI TInkerers) * Big Companies & APIs * Google has released Gemini Flash 8B - 0.01 per million tokens cached (X, Blog) * Voice & Audio * Rev breaks SOTA on ASR with Rev ASR and Rev Diarize (Blog, Github, HF) * AI Art & Diffusion & 3D * BFL relases Flux1.1[pro] - 3x-6x faster than 1.0 and higher quality (was ๐Ÿซ) - (Blog, Try it) The day I met Sam Altman / Dev Day recap Last Dev Day (my coverage here) was a "singular" day in AI for me, given it also had the "keep AI open source" with Nous Research and Grimes, and this Dev Day I was delighted to find out that the vibe was completely different, and focused less on bombastic announcements or models, but on practical dev focused things. This meant that OpenAI cherry picked folks who actively develop with their tools, and they didn't invite traditional media, only folks like yours truly, @swyx from Latent space, Rowan from Rundown, Simon Willison and Dan Shipper, you know, newsletter and podcast folks who actually build! This also allowed for many many OpenAI employees who work on the products and APIs we get to use, were there to receive feedback, help folks with prompting, and just generally interact with the devs, and build that community. I want to shoutout my friends Ilan (who was in the keynote as the strawberry salesman interacting with RealTime API agent), Will DePue from the SORA team, with whom we had an incredible conversation about ethics and legality of projects, Christine McLeavey who runs the Audio team, with whom I shared a video of my daughter crying when chatGPT didn't understand her, Katia, Kevin and Romain on the incredible DevEx/DevRel team and finally, my new buddy Jason who does infra, and was fighting bugs all day and only joined the pub after shipping RealTime to all of us. I've collected all these folks in a convenient and super high signal X list here so definitely give that list a follow if you'd like to tap into their streams For the actual announcements, I've already covered this in my Dev Day post here (which was payed subscribers only, but is now open to all) and Simon did an incredible summary on his Substack as well The highlights were definitely the new RealTime API that le

    1h 45m
  8. OCT 1

    OpenAI Dev Day 2024 keynote

    Hey, Alex here. Super quick, as Iโ€™m still attending Dev Day, but I didnโ€™t want to leave you hanging (if you're a paid subscriber!), I have decided to outsource my job and give the amazing podcasters of NoteBookLM the whole transcript of the opening keynote of OpenAI Dev Day. You can see a blog of everything they just posted here Hereโ€™s a summary of all what was announced: * Developer-Centric Approach: OpenAI consistently emphasized the importance of developers in their mission to build beneficial AGI. The speaker stated, "OpenAI's mission is to build AGI that benefits all of humanity, and developers are critical to that mission... we cannot do this without you." * Reasoning as a New Frontier: The introduction of the GPT-4 series, specifically the "O1" models, marks a significant step towards AI with advanced reasoning capabilities, going beyond the limitations of previous models like GPT-3. * Multimodal Capabilities: OpenAI is expanding the potential of AI applications by introducing multimodal capabilities, particularly focusing on real-time speech-to-speech interaction through the new Realtime API. * Customization and Fine-Tuning: Empowering developers to customize models is a key theme. OpenAI introduced Vision for fine-tuning with images and announced easier access to fine-tuning with model distillation tools. * Accessibility and Scalability: OpenAI demonstrated a commitment to making AI more accessible and cost-effective for developers through initiatives like price reductions, prompt caching, and model distillation tools. Important Ideas and Facts: 1. The O1 Models: * Represent a shift towards AI models with enhanced reasoning capabilities, surpassing previous generations in problem-solving and logical thought processes. * O1 Preview is positioned as the most powerful reasoning model, designed for complex problems requiring extended thought processes. * O1 Mini offers a faster, cheaper, and smaller alternative, particularly suited for tasks like code debugging and agent-based applications. * Both models demonstrate advanced capabilities in coding, math, and scientific reasoning. * OpenAI highlighted the ability of O1 models to work with developers as "thought partners," understanding complex instructions and contributing to the development process. Quote: "The shift to reasoning introduces a new shape of AI capability. The ability for our model to scale and correct the process is pretty mind-blowing. So we are resetting the clock, and we are introducing a new series of models under the name O1." 2. Realtime API: * Enables developers to build real-time AI experiences directly into their applications using WebSockets. * Launches with support for speech-to-speech interaction, leveraging the technology behind ChatGPT's advanced voice models. * Offers natural and seamless integration of voice capabilities, allowing for dynamic and interactive user experiences. * Showcased the potential to revolutionize human-computer interaction across various domains like driving, education, and accessibility. Quote: "You know, a lot of you have been asking about building amazing speech-to-speech experiences right into your apps. Well now, you can." 3. Vision, Fine-Tuning, and Model Distillation: * Vision introduces the ability to use images for fine-tuning, enabling developers to enhance model performance in image understanding tasks. * Fine-tuning with Vision opens up opportunities in diverse fields such as product recommendations, medical imaging, and autonomous driving. * OpenAI emphasized the accessibility of these features, stating that "fine-tuning with Vision is available to every single developer." * Model distillation tools facilitate the creation of smaller, more efficient models by transferring knowledge from larger models like O1 and GPT-4. * This approach addresses cost concerns and makes advanced AI capabilities more accessible for a wider range of applications and developers. Quote: "With distillation, you take the o

    6 min

Ratings & Reviews

5
out of 5
11 Ratings

About

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news

You Might Also Like

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada