Hello SundAI - our world through the lense of AI

Roger Basler de Roca

"Hello SundAI - Our World Through the Lens of AI," is your twice-weekly dive into how artificial intelligence shapes our digital landscape. Hosted by Roger and SundAI the AI, this podcast brings you practical tips, cutting-edge tools, and insightful interviews every Sunday and Wednesday morning. Whether you're a seasoned tech enthusiast or just starting to explore the digital domain, tune in to discover innovative ways to get things done and propel yourself forward in a world increasingly driven by AI. Our hashtag is: #helloSundai

  1. 08/24/2025

    The Illusion of Thinking: Decoding AI's Reasoning Limits

    In this episode, we enter the world of Large Reasoning Models (LRMs). We explore advanced AI systems such as OpenAI’s o1/o3, DeepSeek-R1, and Claude 3.7 Sonnet Thinking—models that generate detailed "thinking processes" (Chain-of-Thought, CoT) with built-in self-reflection before answering. These systems promise a new era of problem-solving. Yet, their true capabilities, scaling behavior, and limitations remain only partially understood. By conducting systematic investigations in controlled puzzle environments—including the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World—we uncover both the strengths and surprising weaknesses of LRMs. These environments allow precise control over task complexity while avoiding data contamination issues that often plague established benchmarks in mathematics and coding. A striking finding: LRMs face a complete accuracy collapse beyond certain complexity thresholds. Paradoxically, their reasoning effort (measured in "thinking tokens") first increases with complexity, only to decline after a point—even when token budgets are sufficient. We identify three distinct performance regimes: Low-complexity tasks – where standard Large Language Models (LLMs) still outperform LRMs. Medium-complexity tasks – where LRMs’ additional "thinking" shows a clear advantage. High-complexity tasks – where both LLMs and LRMs collapse entirely. Another challenge is “overthinking.” On simpler problems, LRMs often find correct solutions early but continue to pursue false alternatives, wasting computational resources. Even more surprising is their weakness in exact computation: they fail to leverage explicit algorithms, even when provided, and show inconsistent reasoning across different puzzle types. This episode invites you to rethink assumptions about AI’s capacity for generalizable reasoning. What does it truly mean for a machine to "think" under increasing complexity? And how should these insights shape the next generation of AI design and deployment? Sources: Shojaee, P., Mirzadeh, I., Alizadeh, K., Horton, M., Bengio, S., & Farajtabar, M. (2025). The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity. (Unpublished manuscript). https://arxiv.org/abs/2506.06941 Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only. ⁠https://rogerbasler.ch/en/contact/

    16 min
  2. 06/09/2025

    AI Cannot Think: When AI Reasoning Models Hit Their Limit

    Join us as we dive into a groundbreaking study that systematically investigates the strengths and fundamental limitations of Large Reasoning Models (LRMs), the cutting-edge AI systems behind advanced "thinking" mechanisms like Chain-of-Thought with self-reflection. Moving beyond traditional, often contaminated, mathematical and coding benchmarks, this research uses controllable puzzle environments like the Tower of Hanoi, Checker Jumping, River Crossing, and Blocks World to precisely manipulate problem complexity and offer unprecedented insights into how LRMs "think". You'll discover surprising findings, including: Three distinct performance regimes: Standard Large Language Models (LLMs) surprisingly outperform LRMs on low-complexity tasks; LRMs demonstrate an advantage on medium-complexity tasks due to their additional "thinking" processes; but crucially, both model types experience a complete accuracy collapse on high-complexity tasks.A counter-intuitive scaling limit: LRMs' reasoning effort, measured by token usage, increases up to a certain complexity point, then paradoxically declines despite having an adequate token budget. This suggests a fundamental inference-time scaling limitation in their reasoning capabilities relative to problem complexity. Inconsistencies and limitations in exact computation: LRMs struggle to benefit from being explicitly given algorithms, failing to improve performance even when provided with step-by-step instructions for puzzles like the Tower of HanoiThey also exhibit inconsistent reasoning across different puzzle types, performing many correct moves in one scenario (e.g., Tower of Hanoi) but failing much earlier in another (e.g., River Crossing), indicating potential issues with generalizable reasoning rather than just problem-solving strategy discovery"Overthinking" phenomenon: For simpler problems, LRMs often find correct solutions early in their reasoning trace but then continue to inefficiently explore incorrect alternatives, wasting computational effort This episode challenges prevailing assumptions about LRM capabilities and raises crucial questions about their true reasoning potential, paving the way for future investigations into more robust AI reasoning. Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only. ⁠https://rogerbasler.ch/en/contact/

    16 min
  3. 04/20/2025

    AI finally passed the Turing Test

    Has AI finally passed the Turing Test? Dive into the groundbreaking news from UC San Diego, where research published in March 2025 claims that GPT 4.5 convinced human judges it was a real person 73% of the time, even more often than actual humans in the same test. But what does this historic moment truly signify for the future of artificial intelligence? This podcast explores the original concept of the Turing Test, proposed by Alan Turing in 1950 as a practical measure of a machine's ability to exhibit intelligent behavior indistinguishable from that of a human through conversation. We'll examine the rigorous controlled study that led to GPT 4.5's alleged success, involving 284 participants and five-minute conversations. We'll delve into what passing the Turing Test actually means – and, crucially, what it doesn't. Is this the dawn of true AI consciousness or Artificial General Intelligence (AGI)? The sources clarify that the Turing Test specifically measures conversational ability and human likeness in dialogue, not sentience or general intelligence. Discover the key factors that contributed to this breakthrough, including massive increases in model parameters and training data, sophisticated prompting (especially the use of a "persona prompt"), learning from human feedback, and models designed for conversation. We will also discuss the intriguing finding that human judges often identified someone as human when they lacked knowledge or made mistakes, showing a shift in our perception of AI. However, the podcast will also address the criticisms and limitations of the Turing Test. We'll explore the argument that it's merely a test of functionality and doesn't necessarily indicate genuine human-like thinking. We'll also touch on alternative tests for AI that aim to assess creativity, problem-solving, and other aspects of intelligence beyond conversation, such as the Metzinger Test and the Lovelace 2.0 Test. Finally, we will consider the profound implications of AI systems convincingly simulating human conversation, including the economic impact on roles requiring human-like interaction, the potential effects on social relationships, and the ethical considerations around deception and manipulation. Join us to unpack this milestone in computing history and discuss what the blurring lines between human and machine communication mean for our society, economy, and lives. Source: https://theconversation.com/chatgpt-just-passed-the-turing-test-but-that-doesnt-mean-ai-is-now-as-smart-as-humans-253946 Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only. ⁠https://rogerbasler.ch/en/contact/

    17 min
  4. 04/15/2025

    Googles approach to AGI - artificial general intelligence

    h 145-page paper from Google DeepMind, outlining their strategic approach to managing the risks and responsibilities of AGI development. 1. Defining AGI and ‘Exceptional AGI’We begin by clarifying what DeepMind means by AGI: an AI system capable of performing any task a human can. More specifically, they introduce the notion of ‘Exceptional AGI’ – a system whose performance matches or exceeds that of the top 1% of professionals across a wide range of non-physical tasks. (Note: DeepMind is a British AI company, founded in 2012 and acquired by Google in 2014.) 2. Understanding the Risk LandscapeAGI, while full of potential, also presents serious risks – from systemic harm to outright existential threats. DeepMind identifies four core areas of concern: Abuse (intentional misuse by actors with harmful intent) Misconduct (reckless or unethical use) Errors (unexpected failures or flaws in design) Structural risks (long-term unintended societal or economic consequences) Among these, abuse and misconduct are given particular attention due to their immediacy and severity. 3. Mitigating AGI Threats: DeepMind’s Technical StrategyTo counter these dangers, DeepMind proposes a multi-layered technical safety strategy. The goal is twofold: To prevent access to powerful capabilities by bad actors To better understand and predict AI behaviour as systems grow in autonomy and complexity This approach integrates mechanisms for oversight, constraint, and continual evaluation. 4. Debate Within the AI FieldHowever, the path is far from settled. Within the AI research community, there is ongoing skepticism regarding both the feasibility of AGI and the assumptions underlying safety interventions. Critics argue that AGI remains too vaguely defined to justify such extensive safeguards, while others warn that dismissing risks could be equally shortsighted. 5. Timelines and TrajectoriesWhen might we see AGI? DeepMind’s report considers the emergence of ‘Exceptional AGI’ as plausible before the end of this decade – that is, before 2030. While no exact date is predicted, the implication is clear: preparation cannot wait. This episode offers a rare look behind the scenes at how a leading AI lab is thinking about, and preparing for, the future of artificial general intelligence. It also raises the broader question: how should societies respond when technology begins to exceed traditional human limits? Source: https://storage.googleapis.com/deepmind-media/DeepMind.com/Blog/evaluating-potential-cybersecurity-threats-of-advanced-ai/An_Approach_to_Technical_AGI_Safety_Apr_2025.pdf Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only. ⁠https://rogerbasler.ch/en/contact/

    26 min
  5. 03/16/2025

    The Byte Latent Transformer (BLT): A Token-Free Approach to LLMs

    The Byte Latent Transformer (BLT) is a novel byte-level large language model (LLM) that processes raw byte data by dynamically grouping bytes into entropy-based patches, eliminating the need for tokenization. Dynamic Patching: BLT segments data into variable-length patches based on entropy, allocating more computation where complexity is higher—unlike token-based models that treat all tokens equally.Efficiency & Robustness: BLT matches tokenized LLM performance while improving inference efficiency (using up to 50% fewer FLOPs) and enhancing robustness to noisy inputs and character-level tasks.Scalability: Scaling studies up to 8B parameters and 4T training bytes show that BLT achieves better scaling trends at a fixed inference cost than token-based models.Architecture:Entropy-Based Patching: A small byte-level model estimates entropy to determine patch boundaries, allocating more compute to complex sequences (e.g., word beginnings).Performance Gains: BLT achieves parity with Llama 3 in FLOP-controlled training and outperforms it in character-level tasks and low-resource translation.Patch Size Scaling: Larger patches (e.g., 8 bytes) improve scaling efficiency by reducing latent transformer compute needs, enabling larger model sizes within a fixed inference budget."Byte-ifying" Tokenizers: Pre-trained token-based models (e.g., Llama 3.1) can initialize BLT’s transformer, leading to faster convergence and improved performance on specific tasks.BLT introduces a fundamentally new approach to LLMs, leveraging raw bytes instead of tokens for more efficient, scalable, and robust language modeling. This is Hello Sunday - the podcast in digital business where we look back and ahead, so you can focus on next weeks challenges Thank you for listening to Hello Sunday - make sure to subscribe and spread the word, so others can be inspired too Hello SundAI - our world through the lense of AI Disclaimer: This podcast is generated by Roger Basler de Roca (contact) by the use of AI. The voices are artificially generated and the discussion is based on public research data. I do not claim any ownership of the presented material as it is for education purpose only. ⁠https://rogerbasler.ch/en/contact/

    15 min

Ratings & Reviews

3
out of 5
3 Ratings

About

"Hello SundAI - Our World Through the Lens of AI," is your twice-weekly dive into how artificial intelligence shapes our digital landscape. Hosted by Roger and SundAI the AI, this podcast brings you practical tips, cutting-edge tools, and insightful interviews every Sunday and Wednesday morning. Whether you're a seasoned tech enthusiast or just starting to explore the digital domain, tune in to discover innovative ways to get things done and propel yourself forward in a world increasingly driven by AI. Our hashtag is: #helloSundai