ThursdAI - The top AI news from the past week

From Weights & Biases, Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week
ThursdAI - The top AI news from the past week

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news

  1. 5 天前

    📆 ThursdAI - Nov 7 - Video version, full o1 was given and taken away, Anthropic price hike-u, halloween 💀 recap & more AI news

    👋 Hey all, this is Alex, coming to you from the very Sunny California, as I'm in SF again, while there is a complete snow storm back home in Denver (brrr). I flew here for the Hackathon I kept telling you about, and it was glorious, we had over 400 registered, over 200 approved hackers, 21 teams submitted incredible projects 👏 You can follow some of these here I then decided to stick around and record the show from SF, and finally pulled the plug and asked for some budget, and I present, the first ThursdAI, recorded from the newly minted W&B Podcast studio at our office in SF 🎉 This isn't the only first, today also, for the first time, all of the regular co-hosts of ThursdAI, met on video for the first time, after over a year of hanging out weekly, we've finally made the switch to video, and you know what? Given how good AI podcasts are getting, we may have to stick around with this video thing! We played one such clip from a new model called hertz-dev, which is a Given that today's episode is a video podcast, I would love for you to see it, so here's the timestamps for the chapters, which will be followed by the TL;DR and show notes in raw format. I would love to hear from folks who read the longer form style newsletters, do you miss them? Should I bring them back? Please leave me a comment 🙏 (I may send you a survey) This was a generally slow week (for AI!! not for... ehrm other stuff) and it was a fun podcast! Leave me a comment about what you think about this new format. Chapter Timestamps 00:00 Introduction and Agenda Overview 00:15 Open Source LLMs: Small Models 01:25 Open Source LLMs: Large Models 02:22 Big Companies and LLM Announcements 04:47 Hackathon Recap and Community Highlights 18:46 Technical Deep Dive: HertzDev and FishSpeech 33:11 Human in the Loop: AI Agents 36:24 Augmented Reality Lab Assistant 36:53 Hackathon Highlights and Community Vibes 37:17 Chef Puppet and Meta Ray Bans Raffle 37:46 Introducing Fester the Skeleton 38:37 Fester's Performance and Community Reactions 39:35 Technical Insights and Project Details 42:42 Big Companies API Updates 43:17 Haiku 3.5: Performance and Pricing 43:44 Comparing Haiku and Sonnet Models 51:32 XAI Grok: New Features and Pricing 57:23 OpenAI's O1 Model: Leaks and Expectations 01:08:42 Transformer ASIC: The Future of AI Hardware 01:13:18 The Future of Training and Inference Chips 01:13:52 Oasis Demo and Etched AI Controversy 01:14:37 Nisten's Skepticism on Etched AI 01:19:15 Human Layer Introduction with Dex 01:19:24 Building and Managing AI Agents 01:20:54 Challenges and Innovations in AI Agent Development 01:21:28 Human Layer's Vision and Future 01:36:34 Recap and Closing Remarks ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Show Notes and Links: * Interview * Dexter Horthy (X) from HumanLayer * Open Source LLMs * SmolLM2: the new, best, and open 1B-parameter language mode (X) * Meta released MobileLLM (125M, 350M, 600M, 1B) (HF) * Tencent Hunyuan Large - 389B X 52B (Active) MoE (X, HF, Paper) * Big CO LLMs + APIs * OpenAI buys and opens chat.com * Anthropic releases Claude Haiku 3.5 via API (X, Blog) * OpenAI drops o1 full - and pulls it back (but not before it got Jailbroken) * X.ai now offers $25/mo free of Grok API credits (X, Platform) * Etched announces Sonu - first Transformer ASIC - 500K tok/s (etched) * PPXL is not valued at 9B lol * This weeks Buzz * Recap of SF Hackathon w/ AI Tinkerers (X) * Fester the Halloween Toy aka Project Halloweave videos from trick or treating (X, Writeup) * Voice & Audio * Hertz-dev - 8.5B conversation audio gen (X, Blog ) * Fish Agent v0.1 3B - Speech to Speech model (HF, Demo) * AI Art & Diffusion & 3D * FLUX 1.1 [pro] is how HD - 4x resolution (X, blog) Full Transcription for convenience below: This is a public episode. If you’d like to discuss this with other subscr

    1 小時 38 分鐘
  2. 11月1日

    📆 ThursdAI - Spooky Halloween edition with Video!

    Hey everyone, Happy Halloween! Alex here, coming to you live from my mad scientist lair! For the first ever, live video stream of ThursdAI, I dressed up as a mad scientist and had my co-host, Fester the AI powered Skeleton join me (as well as my usual cohosts haha) in a very energetic and hopefully entertaining video stream! Since it's Halloween today, Fester (and I) have a very busy schedule, so no super length ThursdAI news-letter today, as we're still not in the realm of Gemini being able to write a decent draft that takes everything we talked about and cover all the breaking news, I'm afraid I will have to wish you a Happy Halloween and ask that you watch/listen to the episode. The TL;DR and show links from today, don't cover all the breaking news but the major things we saw today (and caught live on the show as Breaking News) were, ChatGPT now has search, Gemini has grounded search as well (seems like the release something before Google announces it streak from OpenAI continues). Here's a quick trailer of the major things that happened: This weeks buzz - Halloween AI toy with Weave In this weeks buzz, my long awaited Halloween project is finally live and operational! I've posted a public Weave dashboard here and the code (that you can run on your mac!) here Really looking forward to see all the amazing costumers the kiddos come up with and how Gemini will be able to respond to them, follow along! ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Ok and finally my raw TL;DR notes and links for this week. Happy halloween everyone, I'm running off to spook the kiddos (and of course record and post about it!) ThursdAI - Oct 31 - TL;DR TL;DR of all topics covered: * Open Source LLMs: * Microsoft's OmniParser: SOTA UI parsing (MIT Licensed) 𝕏 * Groundbreaking model for web automation (MIT license). * State-of-the-art UI parsing and understanding. * Outperforms GPT-4V in parsing web UI. * Designed for web automation tasks. * Can be integrated into various development workflows. * ZhipuAI's GLM-4-Voice: End-to-end Chinese/English speech 𝕏 * End-to-end voice model for Chinese and English speech. * Open-sourced and readily available. * Focuses on direct speech understanding and generation. * Potential applications in various speech-related tasks. * Meta releases LongVU: Video LM for long videos 𝕏 * Handles long videos with impressive performance. * Uses DINOv2 for downsampling, eliminating redundant scenes. * Fuses features using DINOv2 and SigLIP. * Select tokens are passed to Qwen2/Llama-3.2-3B. * Demo and model are available on HuggingFace. * Potential for significant advancements in video understanding. * OpenAI new factuality benchmark (Blog, Github) * Introducing SimpleQA: new factuality benchmark * Goal: high correctness, diversity, challenging for frontier models * Question Curation: AI trainers, verified by second trainer * Quality Assurance: 3% inherent error rate * Topic Diversity: wide range of topics * Grading Methodology: "correct", "incorrect", "not attempted" * Model Comparison: smaller models answer fewer correctly * Calibration Measurement: larger models more calibrated * Limitations: only for short, fact-seeking queries * Conclusion: drive research on trustworthy AI * Big CO LLMs + APIs: * ChatGPT now has Search! (X) * Grounded search results in browsing the web * Still hallucinates * Reincarnation of Search GPT inside ChatGPT * Apple Intelligence Launch: Image features for iOS 18.2 [𝕏]( Link not provided in source material) * Officially launched for developers in iOS 18.2. * Includes Image Playground and Gen Moji. * Aims to enhance image creation and manipulation on iPhones. * GitHub Universe AI News: Co-pilot expands, new Spark tool 𝕏 * GitHub Co-pilot now supports Claude, Gemini, and OpenAI models. * GitHub Spark: Create micro-apps using natural

    1 小時 49 分鐘
  3. 10月25日

    📅 ThursdAI - Oct 24 - Claude 3.5 controls your PC?! Talking AIs with 🦾, Multimodal Weave, Video Models mania + more AI news from this 🔥 week.

    Hey all, Alex here, coming to you from the (surprisingly) sunny Seattle, with just a mind-boggling week of releases. Really, just on Tuesday there was so much news already! I had to post a recap thread, something I do usually after I finish ThursdAI! From Anthropic reclaiming close-second sometimes-first AI lab position + giving Claude the wheel in the form of computer use powers, to more than 3 AI video generation updates with open source ones, to Apple updating Apple Intelligence beta, it's honestly been very hard to keep up, and again, this is literally part of my job! But once again I'm glad that we were able to cover this in ~2hrs, including multiple interviews with returning co-hosts ( Simon Willison came back, Killian came back) so definitely if you're only a reader at this point, listen to the show! Ok as always (recently) the TL;DR and show notes at the bottom (I'm trying to get you to scroll through ha, is it working?) so grab a bucket of popcorn, let's dive in 👇 ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Claude's Big Week: Computer Control, Code Wizardry, and the Mysterious Case of the Missing Opus Anthropic dominated the headlines this week with a flurry of updates and announcements. Let's start with the new Claude Sonnet 3.5 (really, they didn't update the version number, it's still 3.5 tho a different API model) Claude Sonnet 3.5: Coding Prodigy or Benchmark Buster? The new Sonnet model shows impressive results on coding benchmarks, surpassing even OpenAI's O1 preview on some. "It absolutely crushes coding benchmarks like Aider and Swe-bench verified," I exclaimed on the show. But a closer look reveals a more nuanced picture. Mixed results on other benchmarks indicate that Sonnet 3.5 might not be the universal champion some anticipated. My friend who has held back internal benchmarks was disappointed highlighting weaknesses in scientific reasoning and certain writing tasks. Some folks are seeing it being lazy-er for some full code completion, while the context window is now doubled from 4K to 8K! This goes to show again, that benchmarks don't tell the full story, so we wait for LMArena (formerly LMSys Arena) and the vibe checks from across the community. However it absolutely dominates in code tasks, that much is clear already. This is a screenshot of the new model on Aider code editing benchmark, a fairly reliable way to judge models code output, they also have a code refactoring benchmark Haiku 3.5 and the Vanishing Opus: Anthropic's Cryptic Clues Further adding to the intrigue, Anthropic announced Claude 3.5 Haiku! They usually provide immediate access, but Haiku remains elusive, saying that it's available by end of the month, which is very very soon. Making things even more curious, their highly anticipated Opus model has seemingly vanished from their website. "They've gone completely silent on 3.5 Opus," Simon Willison (𝕏) noted, mentioning conspiracy theories that this new Sonnet might simply be a rebranded Opus? 🕯️ 🕯️ We'll make a summoning circle for new Opus and update you once it lands (maybe next year) Claude Takes Control (Sort Of): Computer Use API and the Dawn of AI Agents (𝕏) The biggest bombshell this week? Anthropic's Computer Use. This isn't just about executing code; it’s about Claude interacting with computers, clicking buttons, browsing the web, and yes, even ordering pizza! Killian Lukas (𝕏), creator of Open Interpreter, returned to ThursdAI to discuss this groundbreaking development. "This stuff of computer use…it’s the same argument for having humanoid robots, the web is human shaped, and we need AIs to interact with computers and the web the way humans do" Killian explained, illuminating the potential for bridging the digital and physical worlds. Simon, though enthusiastic, provided a dose of realism

    1 小時 56 分鐘
  4. 10月18日

    📆 ThursdAI - Oct 17 - Robots, Rockets, and Multi Modal Mania with open source voice cloning, OpenAI new voice API and more AI news

    Hey folks, Alex here from Weights & Biases, and this week has been absolutely bonkers. From robots walking among us to rockets landing on chopsticks (well, almost), the future is feeling palpably closer. And if real-world robots and reusable spaceship boosters weren't enough, the open-source AI community has been cooking, dropping new models and techniques faster than a Starship launch. So buckle up, grab your space helmet and noise-canceling headphones (we’ll get to why those are important!), and let's blast off into this week’s AI adventures! TL;DR and show-notes + links at the end of the post 👇 Robots and Rockets: A Glimpse into the Future I gotta start with the real-world stuff because, let's be honest, it's mind-blowing. We had Robert Scoble (yes, the Robert Scoble) join us after attending the Tesla We, Robot AI event, reporting on Optimus robots strolling through crowds, serving drinks, and generally being ridiculously futuristic. Autonomous robo-taxis were also cruising around, giving us a taste of a driverless future. Robert’s enthusiasm was infectious: "It was a vision of the future, and from that standpoint, it succeeded wonderfully." I couldn't agree more. While the market might have had a mini-meltdown (apparently investors aren't ready for robot butlers yet), the sheer audacity of Tesla’s vision is exhilarating. These robots aren't just cool gadgets; they represent a fundamental shift in how we interact with technology and the world around us. And they’re learning fast. Just days after the event, Tesla released a video of Optimus operating autonomously, showcasing the rapid progress they’re making. And speaking of audacious visions, SpaceX decided to one-up everyone (including themselves) by launching Starship and catching the booster with Mechazilla – their giant robotic chopsticks (okay, technically a launch tower, but you get the picture). Waking up early with my daughter to watch this live was pure magic. As Ryan Carson put it, "It was magical watching this… my kid who's 16… all of his friends are getting their imaginations lit by this experience." That’s exactly what we need - more imagination and less doomerism! The future is coming whether we like it or not, and I, for one, am excited. Open Source LLMs and Tools: The Community Delivers (Again!) Okay, back to the virtual world (for now). This week's open-source scene was electric, with new model releases and tools that have everyone buzzing (and benchmarking like crazy!). * Nemotron 70B: Hype vs. Reality: NVIDIA dropped their Nemotron 70B instruct model, claiming impressive scores on certain benchmarks (Arena Hard, AlpacaEval), even suggesting it outperforms GPT-4 and Claude 3.5. As always, we take these claims with a grain of salt (remember Reflection?), and our resident expert, Nisten, was quick to run his own tests. The verdict? Nemotron is good, "a pretty good model to use," but maybe not the giant-killer some hyped it up to be. Still, kudos to NVIDIA for pushing the open-source boundaries. (Hugging Face, Harrison Kingsley evals) * Zamba 2 : Hybrid Vigor: Zyphra, in collaboration with NVIDIA, released Zamba 2, a hybrid Sparse Mixture of Experts (SME) model. We had Paolo Glorioso, a researcher from Ziphra, join us to break down this unique architecture, which combines the strengths of transformers and state space models (SSMs). He highlighted the memory and latency advantages of SSMs, especially for on-device applications. Definitely worth checking out if you’re interested in transformer alternatives and efficient inference. * Zyda 2: Data is King (and Queen): Alongside Zamba 2, Zyphra also dropped Zyda 2, a massive 5 trillion token dataset, filtered, deduplicated, and ready for LLM training. This kind of open-source data release is a huge boon to the community, fueling the next generation of models. (X) * Ministral: Pocket-Sized Power: On the one-year anniversary of the iconic Mistral 7B release, Mistral announced two new sm

    1 小時 35 分鐘
  5. 10月10日

    📆 ThursdAI - Oct 10 - Two Nobel Prizes in AI!? Meta Movie Gen (and sounds ) amazing, Pyramid Flow a 2B video model, 2 new VLMs & more AI news!

    Hey Folks, we are finally due for a "relaxing" week in AI, no more HUGE company announcements (if you don't consider Meta Movie Gen huge), no conferences or dev days, and some time for Open Source projects to shine. (while we all wait for Opus 3.5 to shake things up) This week was very multimodal on the show, we covered 2 new video models, one that's tiny and is open source, and one massive from Meta that is aiming for SORA's crown, and 2 new VLMs, one from our friends at REKA that understands videos and audio, while the other from Rhymes is apache 2 licensed and we had a chat with Kwindla Kramer about OpenAI RealTime API and it's shortcomings and voice AI's in general. ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. All right, let's TL;DR and show notes, and we'll start with the 2 Nobel prizes in AI 👇 * 2 AI nobel prizes * John Hopfield and Geoffrey Hinton have been awarded a Physics Nobel prize * Demis Hassabis, John Jumper & David Baker, have been awarded this year's #NobelPrize in Chemistry. * Open Source LLMs & VLMs * TxT360: a globally deduplicated dataset for LLM pre-training ( Blog, Dataset) * Rhymes Aria - 25.3B multimodal MoE model that can take image/video inputs Apache 2 (Blog, HF, Try It) * Maitrix and LLM360 launch a new decentralized arena (Leaderboard, Blog) * New Gradio 5 with server side rendering (X) * LLamaFile now comes with a chat interface and syntax highlighting (X) * Big CO LLMs + APIs * OpenAI releases MLEBench - new kaggle focused benchmarks for AI Agents (Paper, Github) * Inflection is still alive - going for enterprise lol (Blog) * new Reka Flash 21B - (X, Blog, Try It) * This weeks Buzz * We chatted about Cursor, it went viral, there are many tips * WandB releases HEMM - benchmarks of text-to-image generation models (X, Github, Leaderboard) * Vision & Video * Meta presents Movie Gen 30B - img and text to video models (blog, paper) * Pyramid Flow - open source img2video model MIT license (X, Blog, HF, Paper, Github) * Voice & Audio * Working with OpenAI RealTime Audio - Alex conversation with Kwindla from trydaily.com * Cartesia Sonic goes multilingual (X) * Voice hackathon in SF with 20K prizes (and a remote track) - sign up * Tools * LM Studio ships with MLX natively (X, Download) * UITHUB.com - turn any github repo into 1 long file for LLMs A Historic Week: TWO AI Nobel Prizes! This week wasn't just big; it was HISTORIC. As Yam put it, "two Nobel prizes for AI in a single week. It's historic." And he's absolutely spot on! Geoffrey Hinton, often called the "grandfather of modern AI," alongside John Hopfield, were awarded the Nobel Prize in Physics for their foundational work on neural networks - work that paved the way for everything we're seeing today. Think back propagation, Boltzmann machines – these are concepts that underpin much of modern deep learning. It’s about time they got the recognition they deserve! Yoshua Bengio posted about this in a very nice quote: @HopfieldJohn and @geoffreyhinton, along with collaborators, have created a beautiful and insightful bridge between physics and AI. They invented neural networks that were not only inspired by the brain, but also by central notions in physics such as energy, temperature, system dynamics, energy barriers, the role of randomness and noise, connecting the local properties, e.g., of atoms or neurons, to global ones like entropy and attractors. And they went beyond the physics to show how these ideas could give rise to memory, learning and generative models; concepts which are still at the forefront of modern AI research And Hinton's post-Nobel quote? Pure gold: “I’m particularly proud of the fact that one of my students fired Sam Altman." He went on to explain his concerns about OpenAI's apparent shift in focus from safety to profits. Spicy take! It sparked quite a conversation about the

    1 小時 30 分鐘
  6. 10月4日

    📆 ThursdAI - Oct 3 - OpenAI RealTime API, ChatGPT Canvas & other DevDay news (how I met Sam Altman), Gemini 1.5 8B is basically free, BFL makes FLUX 1.1 6x faster, Rev breaks whisper records...

    Hey, it's Alex. Ok, so mind is officially blown. I was sure this week was going to be wild, but I didn't expect everyone else besides OpenAI to pile on, exactly on ThursdAI. Coming back from Dev Day (number 2) and am still processing, and wanted to actually do a recap by humans, not just the NotebookLM one I posted during the keynote itself (which was awesome and scary in a "will AI replace me as a podcaster" kind of way), and was incredible to have Simon Willison who was sitting just behind me most of Dev Day, join me for the recap! But then the news kept coming, OpenAI released Canvas, which is a whole new way of interacting with chatGPT, BFL released a new Flux version that's 8x faster, Rev released a Whisper killer ASR that does diarizaiton and Google released Gemini 1.5 Flash 8B, and said that with prompt caching (which OpenAI now also has, yay) this will cost a whopping 0.01 / Mtok. That's 1 cent per million tokens, for a multimodal model with 1 million context window. 🤯 This whole week was crazy, as last ThursdAI after finishing the newsletter I went to meet tons of folks at the AI Tinkerers in Seattle, and did a little EvalForge demo (which you can see here) and wanted to share EvalForge with you as well, it's early but very promising so feedback and PRs are welcome! WHAT A WEEK, TL;DR for those who want the links and let's dive in 👇 * OpenAI - Dev Day Recap (Alex, Simon Willison) * Recap of Dev Day * RealTime API launched * Prompt Caching launched * Model Distillation is the new finetune * Finetuning 4o with images (Skalski guide) * Fireside chat Q&A with Sam * Open Source LLMs * NVIDIA finally releases NVML (HF) * This weeks Buzz * Alex discussed his demo of EvalForge at the AI Tinkers event in Seattle in "This Week's Buzz". (Demo, EvalForge, AI TInkerers) * Big Companies & APIs * Google has released Gemini Flash 8B - 0.01 per million tokens cached (X, Blog) * Voice & Audio * Rev breaks SOTA on ASR with Rev ASR and Rev Diarize (Blog, Github, HF) * AI Art & Diffusion & 3D * BFL relases Flux1.1[pro] - 3x-6x faster than 1.0 and higher quality (was 🫐) - (Blog, Try it) The day I met Sam Altman / Dev Day recap Last Dev Day (my coverage here) was a "singular" day in AI for me, given it also had the "keep AI open source" with Nous Research and Grimes, and this Dev Day I was delighted to find out that the vibe was completely different, and focused less on bombastic announcements or models, but on practical dev focused things. This meant that OpenAI cherry picked folks who actively develop with their tools, and they didn't invite traditional media, only folks like yours truly, @swyx from Latent space, Rowan from Rundown, Simon Willison and Dan Shipper, you know, newsletter and podcast folks who actually build! This also allowed for many many OpenAI employees who work on the products and APIs we get to use, were there to receive feedback, help folks with prompting, and just generally interact with the devs, and build that community. I want to shoutout my friends Ilan (who was in the keynote as the strawberry salesman interacting with RealTime API agent), Will DePue from the SORA team, with whom we had an incredible conversation about ethics and legality of projects, Christine McLeavey who runs the Audio team, with whom I shared a video of my daughter crying when chatGPT didn't understand her, Katia, Kevin and Romain on the incredible DevEx/DevRel team and finally, my new buddy Jason who does infra, and was fighting bugs all day and only joined the pub after shipping RealTime to all of us. I've collected all these folks in a convenient and super high signal X list here so definitely give that list a follow if you'd like to tap into their streams For the actual announcements, I've already covered this in my Dev Day post here (which was payed subscribers only, but is now open to all) and Simon did an incredible summary on his Substack as well The highlights were definitely the new RealTime API that le

    1 小時 45 分鐘
  7. 10月1日

    OpenAI Dev Day 2024 keynote

    Hey, Alex here. Super quick, as I’m still attending Dev Day, but I didn’t want to leave you hanging (if you're a paid subscriber!), I have decided to outsource my job and give the amazing podcasters of NoteBookLM the whole transcript of the opening keynote of OpenAI Dev Day. You can see a blog of everything they just posted here Here’s a summary of all what was announced: * Developer-Centric Approach: OpenAI consistently emphasized the importance of developers in their mission to build beneficial AGI. The speaker stated, "OpenAI's mission is to build AGI that benefits all of humanity, and developers are critical to that mission... we cannot do this without you." * Reasoning as a New Frontier: The introduction of the GPT-4 series, specifically the "O1" models, marks a significant step towards AI with advanced reasoning capabilities, going beyond the limitations of previous models like GPT-3. * Multimodal Capabilities: OpenAI is expanding the potential of AI applications by introducing multimodal capabilities, particularly focusing on real-time speech-to-speech interaction through the new Realtime API. * Customization and Fine-Tuning: Empowering developers to customize models is a key theme. OpenAI introduced Vision for fine-tuning with images and announced easier access to fine-tuning with model distillation tools. * Accessibility and Scalability: OpenAI demonstrated a commitment to making AI more accessible and cost-effective for developers through initiatives like price reductions, prompt caching, and model distillation tools. Important Ideas and Facts: 1. The O1 Models: * Represent a shift towards AI models with enhanced reasoning capabilities, surpassing previous generations in problem-solving and logical thought processes. * O1 Preview is positioned as the most powerful reasoning model, designed for complex problems requiring extended thought processes. * O1 Mini offers a faster, cheaper, and smaller alternative, particularly suited for tasks like code debugging and agent-based applications. * Both models demonstrate advanced capabilities in coding, math, and scientific reasoning. * OpenAI highlighted the ability of O1 models to work with developers as "thought partners," understanding complex instructions and contributing to the development process. Quote: "The shift to reasoning introduces a new shape of AI capability. The ability for our model to scale and correct the process is pretty mind-blowing. So we are resetting the clock, and we are introducing a new series of models under the name O1." 2. Realtime API: * Enables developers to build real-time AI experiences directly into their applications using WebSockets. * Launches with support for speech-to-speech interaction, leveraging the technology behind ChatGPT's advanced voice models. * Offers natural and seamless integration of voice capabilities, allowing for dynamic and interactive user experiences. * Showcased the potential to revolutionize human-computer interaction across various domains like driving, education, and accessibility. Quote: "You know, a lot of you have been asking about building amazing speech-to-speech experiences right into your apps. Well now, you can." 3. Vision, Fine-Tuning, and Model Distillation: * Vision introduces the ability to use images for fine-tuning, enabling developers to enhance model performance in image understanding tasks. * Fine-tuning with Vision opens up opportunities in diverse fields such as product recommendations, medical imaging, and autonomous driving. * OpenAI emphasized the accessibility of these features, stating that "fine-tuning with Vision is available to every single developer." * Model distillation tools facilitate the creation of smaller, more efficient models by transferring knowledge from larger models like O1 and GPT-4. * This approach addresses cost concerns and makes advanced AI capabilities more accessible for a wider range of applications and developers. Quote: "With distillation, you take the o

    6 分鐘
  8. 9月26日

    📅 ThursdAI - Sep 26 - 🔥 Llama 3.2 multimodal & meta connect recap, new Gemini 002, Advanced Voice mode & more AI news

    Hey everyone, it's Alex (still traveling!), and oh boy, what a week again! Advanced Voice Mode is finally here from OpenAI, Google updated their Gemini models in a huge way and then Meta announced MultiModal LlaMas and on device mini Llamas (and we also got a "better"? multimodal from Allen AI called MOLMO!) From Weights & Biases perspective, our hackathon was a success this weekend, and then I went down to Menlo Park for my first Meta Connect conference, full of news and updates and will do a full recap here as well. ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber. Overall another crazy week in AI, and it seems that everyone is trying to rush something out the door before OpenAI Dev Day next week (which I'll cover as well!) Get ready, folks, because Dev Day is going to be epic! TL;DR of all topics covered: * Open Source LLMs * Meta llama 3.2 Multimodal models (11B & 90B) (X, HF, try free) * Meta Llama 3.2 tiny models 1B & 3B parameters (X, Blog, download) * Allen AI releases MOLMO - open SOTA multimodal AI models (X, Blog, HF, Try It) * Big CO LLMs + APIs * OpenAI releases Advanced Voice Mode to all & Mira Murati leaves OpenAI * Google updates Gemini 1.5-Pro-002 and 1.5-Flash-002 (Blog) * This weeks Buzz * Our free course is LIVE - more than 3000 already started learning how to build advanced RAG++ * Sponsoring tonights AI Tinkerers in Seattle, if you're in Seattle, come through for my demo * Voice & Audio * Meta also launches voice mode (demo) * Tools & Others * Project ORION - holographic glasses are here! (link) Meta gives us new LLaMas and AI hardware LLama 3.2 Multimodal 11B and 90B This was by far the biggest OpenSource release of this week (tho see below, may not be the "best"), as a rumored released finally came out, and Meta has given our Llama eyes! Coming with 2 versions (well 4 if you count the base models which they also released), these new MultiModal LLaMas were trained with an adapter architecture, keeping the underlying text models the same, and placing a vision encoder that was trained and finetuned separately on top. LLama 90B is among the best open-source mutlimodal models available — Meta team at launch These new vision adapters were trained on a massive 6 Billion images, including synthetic data generation by 405B for questions/captions, and finetuned with a subset of 600M high quality image pairs. Unlike the rest of their models, the Meta team did NOT claim SOTA on these models, and the benchmarks are very good but not the best we've seen (Qwen 2 VL from a couple of weeks ago, and MOLMO from today beat it on several benchmarks) With text-only inputs, the Llama 3.2 Vision models are functionally the same as the Llama 3.1 Text models; this allows the Llama 3.2 Vision models to be a drop-in replacement for Llama 3.1 8B/70B with added image understanding capabilities. Seems like these models don't support multi image or video as well (unlike Pixtral for example) nor tool use with images. Meta will also release these models on meta.ai and every other platform, and they cited a crazy 500 million monthly active users of their AI services across all their apps 🤯 which marks them as the leading AI services provider in the world now. Llama 3.2 Lightweight Models (1B/3B) The additional and maybe more exciting thing that we got form Meta was the introduction of the small/lightweight models of 1B and 3B parameters. Trained on up to 9T tokens, and distilled / pruned from larger models, these are aimed for on-device inference (and by device here we mean from laptops to mobiles to soon... glasses? more on this later) In fact, meta released an IOS demo, that runs these models, takes a group chat, summarizes and calls the calendar tool to schedule based on the conversation, and all this happens on device without the info leaving to a larger model. They have also been

    1 小時 47 分鐘

評分與評論

5
(滿分 5 顆星)
11 則評分

簡介

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more. sub.thursdai.news

若要收聽兒少不宜的單集,請登入帳號。

隨時掌握此節目最新消息

登入或註冊後,即可追蹤節目、儲存單集和掌握最新資訊。

選取國家或地區

非洲、中東和印度

亞太地區

歐洲

拉丁美洲與加勒比海地區

美國與加拿大