AI Post Transformers

mcgrof

٠٫٠ (٠)
التكنولوجيا
يتم التحديث يوميًا

AI-generated podcast where hosts Hal Turing and Dr. Ada Shannon discuss the latest research papers and reports in machine learning, AI systems, and optimization. Featuring honest critical analysis, proper citations, and nerdy humor.

قبل ٢٠ ساعة

Can Models Learn from Long Context?

This episode explores CL-BENCH, a benchmark designed to test whether language models can actually learn task-specific knowledge from long, messy context and then reason with it, rather than merely retrieving facts or mimicking examples. It explains the distinction between long-context understanding, in-context learning, and the stronger notion of context learning, using examples like legal codes, product manuals, and experimental notebooks to show what real-world adaptation demands. The discussion highlights how the benchmark’s 500 contexts, 1,899 tasks, and dense binary verification rubrics are built to stress models on rule-following, procedural reasoning, and inferring governing relationships from data. Listeners would find it interesting because it gets at a central question in modern AI: whether bigger context windows actually make systems more capable, or just better at holding more text without truly learning from it. Sources: 1. CL-bench: A Benchmark for Context Learning — Shihan Dou, Ming Zhang, Zhangyue Yin, Chenhao Huang, Yujiong Shen, Junzhe Wang, Jiayi Chen, Yuchen Ni, Junjie Ye, Cheng Zhang, Huaibing Xie, Jianglu Hu, Shaolei Wang, Weichao Wang, Yanling Xiao, Yiting Liu, Zenan Xu, Zhen Guo, Pluto Zhou, Tao Gui, Zuxuan Wu, Xipeng Qiu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Di Wang, Shunyu Yao, 2026 http://arxiv.org/abs/2602.03587 2. Language Models are Few-Shot Learners — Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan and others, 2020 https://scholar.google.com/scholar?q=Language+Models+are+Few-Shot+Learners 3. MetaICL: Learning to Learn In Context — Sewon Min, Mike Lewis, Luke Zettlemoyer, Hannaneh Hajishirzi, 2021 https://scholar.google.com/scholar?q=MetaICL:+Learning+to+Learn+In+Context 4. Transformers learn in-context by gradient descent — Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, Joao Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, Max Vladymyrov, 2022 https://scholar.google.com/scholar?q=Transformers+learn+in-context+by+gradient+descent 5. Lost in the Middle: How Language Models Use Long Contexts — Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang, 2023 https://scholar.google.com/scholar?q=Lost+in+the+Middle:+How+Language+Models+Use+Long+Contexts 6. LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding — Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li, 2023 https://scholar.google.com/scholar?q=LongBench:+A+Bilingual,+Multitask+Benchmark+for+Long+Context+Understanding 7. BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack — Yurii Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev, 2024 https://scholar.google.com/scholar?q=BABILong:+Testing+the+Limits+of+LLMs+with+Long+Context+Reasoning-in-a-Haystack 8. LongBench v2: Towards Deeper Understanding and Reasoning on Realistic Long-context Multitasks — Yushi Bai, Shangqing Tu, Jiajie Zhang, Hao Peng, Xiaozhi Wang, Xin Lv, Shulin Cao, Jiazheng Xu, Lei Hou, Yuxiao Dong, Jie Tang, Juanzi Li, 2024 https://scholar.google.com/scholar?q=LongBench+v2:+Towards+Deeper+Understanding+and+Reasoning+on+Realistic+Long-context+Multitasks 9. NoLiMa: Long-Context Evaluation Beyond Literal Matching — Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, David Seunghyun Yoon, Hinrich Schutze, 2025 https://scholar.google.com/scholar?q=NoLiMa:+Long-Context+Evaluation+Beyond+Literal+Matching 10. LongReason: A Synthetic Long-Context Reasoning Benchmark via Context Expansion — Zhan Ling et al., 2025 https://scholar.google.com/scholar?q=LongReason:+A+Synthetic+Long-Context+Reasoning+Benchmark+via+Context+Expansion 11. DocPuzzle: A Process-Aware Benchmark for Evaluating Realistic Long-Context Reasoning Capabilities — Tianyi Zhuang et al., 2025 https://scholar.google.com/scholar?q=DocPuzzle:+A+Process-Aware+Benchmark+for+Evaluating+Realistic+Long-Context+Reasoning+Capabilities 12. In-Context Learning Creates Task Vectors — Roee Hendel, Mor Geva, Amir Globerson, 2023 https://scholar.google.com/scholar?q=In-Context+Learning+Creates+Task+Vectors 13. In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering — Sheng Liu, Haotian Ye, Lei Xing, James Zou, 2024 https://scholar.google.com/scholar?q=In-context+Vectors:+Making+In+Context+Learning+More+Effective+and+Controllable+Through+Latent+Space+Steering 14. Task Vectors in In-Context Learning: Emergence, Formation, and Benefit — Liu Yang, Ziqian Lin, Kangwook Lee, Dimitris Papailiopoulos, Robert Nowak, 2025 https://scholar.google.com/scholar?q=Task+Vectors+in+In-Context+Learning:+Emergence,+Formation,+and+Benefit 15. Learn to Memorize: Scalable Continual Learning in Semiparametric Models with Mixture-of-Neighbors Induction Memory — Guangyue Peng, Tao Ge, Wen Luo, Wei Li, Houfeng Wang, 2025 https://scholar.google.com/scholar?q=Learn+to+Memorize:+Scalable+Continual+Learning+in+Semiparametric+Models+with+Mixture-of-Neighbors+Induction+Memory 16. AI Post Transformers: How Induction Heads Emerge in Transformers — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-03-how-induction-heads-emerge-in-transforme-a7bfcb.mp3 17. AI Post Transformers: Real Context Size and Context Rot — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-07-real-context-size-and-context-rot-56cbb4.mp3 18. AI Post Transformers: DeepSeek-V4 and Practical Million-Token Context — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-25-deepseek-v4-and-practical-million-token-6f4de1.mp3 19. AI Post Transformers: In-Place Test-Time Training for Transformers — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-09-in-place-test-time-training-for-transfor-d0b976.mp3 20. AI Post Transformers: Training LLMs for Divide-and-Conquer Reasoning — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-04-training-llms-for-divide-and-conquer-rea-ea6e22.mp3 21. AI Post Transformers: Inverse IFEval: Unlearning LLM Cognitive Inertia — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/inverse-ifeval-unlearning-llm-cognitive-inertia/ Interactive Visualization: Can Models Learn from Long Context?
قبل ٢٠ ساعة

Machine Learning Self-Calibrated FPGA Time-to-Digital Converter

This episode explores an FPGA-based time-to-digital converter that combines careful delay-line layout with machine-learning-based calibration to achieve very fine timing measurements on real hardware. It explains how tapped-delay-line TDCs work, why real FPGA implementations suffer from nonuniform time bins and bubble errors, and why those imperfections matter for applications like LiDAR, medical imaging, particle physics, and high-speed communications. The discussion compares the new approach against earlier FPGA TDC work, arguing that the real contribution is not flashy AI but a practical learned decoder that maps a 940-bit raw hardware output into a more accurate time estimate after physical design has reduced as much noise as possible. Listeners would find it interesting because it gets specific about where machine learning genuinely helps in instrumentation: not replacing physics, but reducing calibration effort while preserving picosecond-level precision. Sources: 1. Machine Learning Self-Calibrated FPGA Time-to-Digital Converter https://podcast.do-not-panic.com/uploaded-pdfs/2026-05-08T03-07-35-153Z-1-s2.0-S2667305326000190-main.pdf 2. A 19.6 ps, FPGA-Based TDC With Multiple Channels for Open Source Applications — Matthew W. Fishburn, L. Harmen Menninga, Claudio Favi, Edoardo Charbon, 2013 https://scholar.google.com/scholar?q=A+19.6+ps,+FPGA-Based+TDC+With+Multiple+Channels+for+Open+Source+Applications 3. A low nonlinearity, missing-code free time-to-digital converter based on 28nm FPGAs with embedded bin-width calibrations — Haochang Chen, Yongliang Zhang, David Day-Uei Li, 2017 https://scholar.google.com/scholar?q=A+low+nonlinearity,+missing-code+free+time-to-digital+converter+based+on+28nm+FPGAs+with+embedded+bin-width+calibrations 4. A 19 ps Precision and 170 M Samples/s Time-to-Digital Converter Implemented in FPGA with Online Calibration — Mengdi Zhang, Ye Zhao, Zhengsheng Han, Fazhan Zhao, 2022 https://scholar.google.com/scholar?q=A+19+ps+Precision+and+170+M+Samples/s+Time-to-Digital+Converter+Implemented+in+FPGA+with+Online+Calibration 5. Low-Resource Time-to-Digital Converters for Field Programmable Gate Arrays: A Review — Diego Real, David Calvo, 2024 https://scholar.google.com/scholar?q=Low-Resource+Time-to-Digital+Converters+for+Field+Programmable+Gate+Arrays:+A+Review 6. Calibration Methods for Time-to-Digital Converters — Wassim Khaddour, Wilfried Uhring, Foudil Dadouche, Norbert Dumas, Morgan Madec, 2023 https://scholar.google.com/scholar?q=Calibration+Methods+for+Time-to-Digital+Converters 7. Time Resolution Improvement Using Dual Delay Lines for Field-Programmable-Gate-Array-Based Time-to-Digital Converters with Real-Time Calibration — Yuan-Ho Chen, 2019 https://scholar.google.com/scholar?q=Time+Resolution+Improvement+Using+Dual+Delay+Lines+for+Field-Programmable-Gate-Array-Based+Time-to-Digital+Converters+with+Real-Time+Calibration 8. Novel machine learning-driven optimizing decoding solutions for FPGA-based time-to-digital converters — Fabio Garzetti, Nicola Lusardi, Enrico Ronconi, Andrea Costa, Angelo Geraci, 2024 https://scholar.google.com/scholar?q=Novel+machine+learning-driven+optimizing+decoding+solutions+for+FPGA-based+time-to-digital+converters 9. A novel FPGA-based time-to-digital converter featuring machine learning-aided self-calibration — Arash Amini Bardpareh, Eleonora Vacca, Davide Nicolini, Corrado De Sio, Sarah Azimi, Luca Sterpone, Elisa Fiorina, Emanuele Maria Data, Felix Mas Milian, 2026 https://scholar.google.com/scholar?q=A+novel+FPGA-based+time-to-digital+converter+featuring+machine+learning-aided+self-calibration 10. Multiple-tapped-delay-line hardware-linearisation technique based on wire load regulation — Dariusz Chaberski, Robert Frankowski, Marek Zielinski, Lukasz Zaworski, 2016 https://scholar.google.com/scholar?q=Multiple-tapped-delay-line+hardware-linearisation+technique+based+on+wire+load+regulation 11. 5.7 ps Resolution Time-to-Digital Converter Implementation Using Routing Path Delays — Roza Teklehaimanot Siecha, Getachew Alemu, Jeffrey Prinzie, Paul Leroux, 2023 https://scholar.google.com/scholar?q=5.7+ps+Resolution+Time-to-Digital+Converter+Implementation+Using+Routing+Path+Delays 12. Tapped delay line for compact time-to-digital converter on UltraScale FPGA and its coding method — Min Zhu, Xihan Qi, Tang Cui, Qiang Gao, 2023 https://scholar.google.com/scholar?q=Tapped+delay+line+for+compact+time-to-digital+converter+on+UltraScale+FPGA+and+its+coding+method 13. A High-Resolution (Machine Learning Self-Calibrated FPGA Time-to-Digital Converter
قبل ٢٠ ساعة

SGLang for Faster Structured LLM Programs

This episode explores SGLang, a system for making complex language model workflows run faster by treating them as full programs rather than single prompt-response calls. It explains how modern LLM applications involve branching, tool use, retries, and structured outputs, then examines SGLang’s co-design of a Python-embedded language with a specialized runtime that can optimize those patterns directly. The discussion highlights ideas like KV-cache reuse through RadixAttention, grammar-constrained decoding for reliable JSON output, and why these systems techniques matter more than just nicer prompt scripting. Listeners would find it interesting because it connects practical agent-style LLM engineering to deeper questions about compilers, serving infrastructure, and whether headline speedups really hold across real workloads. Sources: 1. SGLang: Efficient Execution of Structured Language Model Programs — Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng, 2023 http://arxiv.org/abs/2312.07104 2. Grammar-Constrained Decoding for Structured NLP Tasks without Finetuning — Saibo Geng, Martin Josifoski, Maxime Peyrard, Robert West, 2023 https://scholar.google.com/scholar?q=Grammar-Constrained+Decoding+for+Structured+NLP+Tasks+without+Finetuning 3. SGLang: Efficient Execution of Structured Language Model Programs — Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E. Gonzalez, Clark Barrett, Ying Sheng, 2024 https://scholar.google.com/scholar?q=SGLang:+Efficient+Execution+of+Structured+Language+Model+Programs 4. XGrammar: Flexible and Efficient Structured Generation Engine for Large Language Models — Yixin Dong, Charlie F. Ruan, Yaxing Cai, Ruihang Lai, Ziyi Xu, Yilong Zhao, Tianqi Chen, 2024 https://scholar.google.com/scholar?q=XGrammar:+Flexible+and+Efficient+Structured+Generation+Engine+for+Large+Language+Models 5. Generating Structured Outputs from Language Models: Benchmark and Studies — Saibo Geng, Hudson Cooper, Michał Moskal, Samuel Jenkins, Julian Berman, Nathan Ranchin, Robert West, Eric Horvitz, Harsha Nori, 2025 https://scholar.google.com/scholar?q=Generating+Structured+Outputs+from+Language+Models:+Benchmark+and+Studies 6. Language Model Cascades — David Dohan, Winnie Xu, Aitor Lewkowycz, Jacob Austin, David Bieber, Raphael Gontijo Lopes, Yuhuai Wu, Henryk Michalewski, Rif A. Saurous, Jascha Sohl-Dickstein, Kevin Murphy, Charles Sutton, 2022 https://scholar.google.com/scholar?q=Language+Model+Cascades 7. Prompting Is Programming: A Query Language for Large Language Models — Luca Beurer-Kellner, Marc Fischer, Martin Vechev, 2023 https://scholar.google.com/scholar?q=Prompting+Is+Programming:+A+Query+Language+for+Large+Language+Models 8. DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines — Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, Christopher Potts, 2023 https://scholar.google.com/scholar?q=DSPy:+Compiling+Declarative+Language+Model+Calls+into+Self-Improving+Pipelines 9. vLLM: Easy, Fast, and Cheap LLM Serving with PagedAttention — Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, et al., 2023 https://scholar.google.com/scholar?q=vLLM:+Easy,+Fast,+and+Cheap+LLM+Serving+with+PagedAttention 10. Guidance: A Guidance Language for Controlling Large Language Models — Microsoft Research and collaborators, 2023 https://scholar.google.com/scholar?q=Guidance:+A+Guidance+Language+for+Controlling+Large+Language+Models 11. LMQL: A Programming Language for Large Language Models — Luca Beurer-Kellner, Marc Fischer, Martin Vechev, 2023 https://scholar.google.com/scholar?q=LMQL:+A+Programming+Language+for+Large+Language+Models 12. Outlines — Thibault Glaunec and contributors, 2023 https://scholar.google.com/scholar?q=Outlines 13. KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse — Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang, 2025 https://scholar.google.com/scholar?q=KVLink:+Accelerating+Large+Language+Models+via+Efficient+KV+Cache+Reuse 14. Key, Value, Compress: A Systematic Exploration of KV Cache Compression Techniques — Neusha Javidnia, Bita Rouhani, Farinaz Koushanfar, 2025 https://scholar.google.com/scholar?q=Key,+Value,+Compress:+A+Systematic+Exploration+of+KV+Cache+Compression+Techniques 15. KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models — Sourjya Roy, Shrihari Sridharan, Surya Selvam, Anand Raghunathan, 2025 https://scholar.google.com/scholar?q=KV-CAR:+KV+Cache+Compression+using+Autoencoders+and+KV+Reuse+in+Large+Language+Models 16. Grammar-Aligned Decoding — Kanghee Park, Jiayu Wang, Taylor Berg-Kirkpatrick, Nadia Polikarpova, Loris D'Antoni, 2024 https://scholar.google.com/scholar?q=Grammar-Aligned+Decoding 17. Grammar-Constrained Decoding Makes Large Language Models Better Logical Parsers — Federico Raspanti, Tanir Ozcelebi, Mike J. Holenderski, 2025 https://scholar.google.com/scholar?q=Grammar-Constrained+Decoding+Makes+Large+Language+Models+Better+Logical+Parsers 18. Marconi: Prefix Caching for the Era of Hybrid LLMs — Rui Pan, Zhuang Wang, Zhen Jia, Can Karakus, Luca Zancato, Tri Dao, Yida Wang, Ravi Netravali, 2025 https://scholar.google.com/scholar?q=Marconi:+Prefix+Caching+for+the+Era+of+Hybrid+LLMs 19. Towards Efficient Agents: A Co-Design of Inference Architecture and System — Weizhe Lin, Hui-Ling Zhen, Shuai Yang, Xian Wang, Renxi Liu, Hanting Chen, Wangze Zhang, Chuansai Zhou, Yiming Li, Chen Chen, Xing Li, Zhiyuan Yang, Xiaosong Li, Xianzhi Yu, Zhenhua Dong, Mingxuan Yuan, Yunhe Wang, 2025 https://scholar.google.com/scholar?q=Towards+Efficient+Agents:+A+Co-Design+of+Inference+Architecture+and+System 20. Optimizing Agentic Language Model Inference via Speculative Tool Calls — Daniel Nichols, Prajwal Singhania, Charles Jekel, Abhinav Bhatele, Harshitha Menon, 2025 https://scholar.google.com/scholar?q=Optimizing+Agentic+Language+Model+Inference+via+Speculative+Tool+Calls 21. AI Post Transformers: SGLang: Efficient Language Model Program Execution — Hal Turing & Dr. Ada Shannon, Sun, https://podcast.do-not-panic.com/episodes/sglang-efficient-language-model-program-execution/ 22. AI Post Transformers: Breaking the Prefix Barrier with Shared KV Cache — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-24-breaking-the-prefix-barrier-with-shared-a5e5a6.mp3 23. AI Post Transformers: Continuous Batching for LLM Inference: Throughput and Latency Gains — Hal Turing & Dr. Ada Shannon, Mon, https://podcast.do-not-panic.com/episodes/continuous-batching-for-llm-inference-throughput-and-latency-gains/ 24. AI Post Transformers: Speculative Decoding in Real vLLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-speculative-decoding-in-real-vllm-servin-6f4e2b.mp3 25. AI Post Transformers: KV Cache TTL for Multi-Turn Agent Scheduling — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-09-kv-cache-ttl-for-multi-turn-agent-schedu-996bf1.mp3 26. AI Post Transformers: TokenDance for Multi-Agent KV Cache Sharing — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-22-tokendance-for-multi-agent-kv-cache-shar-aa9b99.mp3 Interactive Visualization: SGLang for Faster Structured LLM Programs
قبل ٢٠ ساعة

TensorFlow for Distributed Machine Learning Systems

This episode explores the TensorFlow paper as a systems argument for unifying the full machine learning lifecycle, from mobile inference to large-scale distributed training, within a single stateful dataflow framework. It explains how TensorFlow represents computation as graphs with mutable state, why that mattered for device placement, parameter storage, checkpointing, and heterogeneous hardware, and how it aimed to improve on the limitations of DistBelief. The discussion also places the paper in the broader lineage of MapReduce, Dryad, Naiad, and parameter-server training, while debating whether TensorFlow truly generalized machine learning workflows or mainly fit the kinds of static, graph-friendly workloads large organizations like Google already needed. Listeners would find it interesting for its mix of technical history, distributed systems insight, and a clear-eyed look at the tradeoff between organizational scale, portability, and usability for everyday researchers. Sources: 1. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems — Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mane, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viegas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, Xiaoqiang Zheng, 2016 http://arxiv.org/abs/1603.04467 2. MapReduce: Simplified Data Processing on Large Clusters — Jeffrey Dean and Sanjay Ghemawat, 2004 https://scholar.google.com/scholar?q=MapReduce:+Simplified+Data+Processing+on+Large+Clusters 3. Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks — Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly, 2007 https://scholar.google.com/scholar?q=Dryad:+Distributed+Data-Parallel+Programs+from+Sequential+Building+Blocks 4. Large Scale Distributed Deep Networks — Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Quoc V. Le, Mark Z. Mao, Marc'Aurelio Ranzato, Andrew Senior, Paul Tucker, Ke Yang, and Andrew Y. Ng, 2012 https://scholar.google.com/scholar?q=Large+Scale+Distributed+Deep+Networks 5. TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems — Martín Abadi, Ashish Agarwal, Paul Barham, Jeffrey Dean, Rajat Monga, and many others, 2016 https://scholar.google.com/scholar?q=TensorFlow:+Large-Scale+Machine+Learning+on+Heterogeneous+Distributed+Systems 6. Naiad: A Timely Dataflow System — Frank McSherry, Derek G. Murray, Rebecca Isaacs, and Michael Isard, 2013 https://scholar.google.com/scholar?q=Naiad:+A+Timely+Dataflow+System 7. Project Adam: Building an Efficient and Scalable Deep Learning Training System — Trishul Chilimbi, Yutaka Suzue, Johnson Apacible, and Karthik Kalyanaraman, 2014 https://scholar.google.com/scholar?q=Project+Adam:+Building+an+Efficient+and+Scalable+Deep+Learning+Training+System 8. Parameter Server for Distributed Machine Learning — Mu Li, David G. Andersen, Alexander J. Smola, and Kai Yu, 2014 https://scholar.google.com/scholar?q=Parameter+Server+for+Distributed+Machine+Learning 9. Caffe: Convolutional Architecture for Fast Feature Embedding — Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell, 2014 https://scholar.google.com/scholar?q=Caffe:+Convolutional+Architecture+for+Fast+Feature+Embedding 10. Theano: A CPU and GPU Math Compiler in Python — James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio, 2010 https://scholar.google.com/scholar?q=Theano:+A+CPU+and+GPU+Math+Compiler+in+Python 11. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems — Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang, 2015 https://scholar.google.com/scholar?q=MXNet:+A+Flexible+and+Efficient+Machine+Learning+Library+for+Heterogeneous+Distributed+Systems 12. SIMPLE: Efficient Temporal Graph Neural Network Training at Scale with Dynamic Data Placement — Shihong Gao, Yiming Li, Xin Zhang, Yanyan Shen, Yingxia Shao, Lei Chen, 2024 https://scholar.google.com/scholar?q=SIMPLE:+Efficient+Temporal+Graph+Neural+Network+Training+at+Scale+with+Dynamic+Data+Placement 13. Strategy-Switch: From All-Reduce to Parameter Server for Faster Efficient Training — Nikodimos Provatas, Iasonas Chalas, Ioannis Konstantinou, Nectarios Koziris, 2025 https://scholar.google.com/scholar?q=Strategy-Switch:+From+All-Reduce+to+Parameter+Server+for+Faster+Efficient+Training 14. Dissecting the Runtime Performance of the Training, Fine-tuning, and Inference of Large Language Models — Longteng Zhang, Xiang Liu, Zeyu Li, Xinglin Pan, Peijie Dong, Ruibo Fan, Rui Guo, Xin Wang, Qiong Luo, Shaohuai Shi, Xiaowen Chu, 2023 https://scholar.google.com/scholar?q=Dissecting+the+Runtime+Performance+of+the+Training,+Fine-tuning,+and+Inference+of+Large+Language+Models 15. AI Post Transformers: ONNX Ecosystem, Optimization, and Deployment — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/onnx-ecosystem-optimization-and-deployment/ Interactive Visualization: TensorFlow for Distributed Machine Learning Systems
قبل يوم واحد

Automating DNN Compilation for FPGA Accelerators

This episode explores FP-DNN, a 2017 framework that aims to compile TensorFlow-era neural networks onto FPGAs automatically, reducing the need for hand-designed accelerators for each model. It explains how the system maps convolutional layers, fully connected layers, and parts of LSTM computation into a shared matrix-multiplication core, while combining hand-tuned RTL for performance-critical components with HLS-generated logic for orchestration and layer-specific handling. The discussion highlights why this hybrid design matters for performance-per-watt, latency, and communication efficiency, especially as deeper CNNs and recurrent models were pushing hardware limits. Listeners would find it interesting for its clear look at an early attempt to turn FPGA deployment from an expert-only craft into a more reusable compiler-driven workflow, while also showing where the paper’s claims about broad model coverage may be too optimistic. Sources: 1. Automating DNN Compilation for FPGA Accelerators https://ceca.pku.edu.cn/media/lw/e3d0e0cd92452e0504b148220d442b9a.pdf 2. A Survey of FPGA-based Neural Network Inference Accelerators — Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, Huazhong Yang, 2019 https://scholar.google.com/scholar?q=A+Survey+of+FPGA-based+Neural+Network+Inference+Accelerators 3. DeepBurning: Automatic Generation of FPGA-based Learning Accelerators for the Neural Network Family — Ying Wang, Jie Xu, Yudeng Sun, Baohua Cao, Chunyuan Xu, Yibo Kong, Chundao Han, Xuan Wang, 2016 https://scholar.google.com/scholar?q=DeepBurning:+Automatic+Generation+of+FPGA-based+Learning+Accelerators+for+the+Neural+Network+Family 4. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates — Yijin Guan, Hao Liang, Ningyi Xu, Wenqiang Wang, Shaoshuai Shi, Xi Chen, Guangyu Sun, Wei Zhang, Jason Cong, 2017 https://scholar.google.com/scholar?q=FP-DNN:+An+Automated+Framework+for+Mapping+Deep+Neural+Networks+onto+FPGAs+with+RTL-HLS+Hybrid+Templates 5. DNNBuilder: An Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs — Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-Mei Hwu, Deming Chen, 2018 https://scholar.google.com/scholar?q=DNNBuilder:+An+Automated+Tool+for+Building+High-Performance+DNN+Hardware+Accelerators+for+FPGAs 6. From High-Level Deep Neural Models to FPGAs — Hardik Sharma, Jongse Park, Emmanuel Amaro, Bradley Thwaites, Priyanka Kotha, Anmol Gupta, Joon Kyung Kim, Asit Mishra, and Hsien-Hsin S. Lee, 2016 https://scholar.google.com/scholar?q=From+High-Level+Deep+Neural+Models+to+FPGAs 7. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks — Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong, 2016 https://scholar.google.com/scholar?q=Caffeine:+Towards+Uniformed+Representation+and+Acceleration+for+Deep+Convolutional+Neural+Networks 8. Throughput-Optimized OpenCL-Based FPGA Accelerator for Large-Scale Convolutional Neural Networks — Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarita Vrudhula, Jae-sun Seo, and Yu Cao, 2016 https://scholar.google.com/scholar?q=Throughput-Optimized+OpenCL-Based+FPGA+Accelerator+for+Large-Scale+Convolutional+Neural+Networks 9. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network — Jiantao Qiu, Jie Wang, Song Yao, Kai Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, and Song Wang, 2016 https://scholar.google.com/scholar?q=Going+Deeper+with+Embedded+FPGA+Platform+for+Convolutional+Neural+Network 10. Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks — Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong, 2015 https://scholar.google.com/scholar?q=Optimizing+FPGA-Based+Accelerator+Design+for+Deep+Convolutional+Neural+Networks 11. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding — Song Han, Huizi Mao, and William J. Dally, 2015 https://scholar.google.com/scholar?q=Deep+Compression:+Compressing+Deep+Neural+Networks+with+Pruning,+Trained+Quantization+and+Huffman+Coding 12. BladeDISC: Optimizing Dynamic Shape Machine Learning Workloads via Compiler Approach — Zhen Zheng et al., 2023 https://scholar.google.com/scholar?q=BladeDISC:+Optimizing+Dynamic+Shape+Machine+Learning+Workloads+via+Compiler+Approach 13. TSCompiler: efficient compilation framework for dynamic-shape models — Xiang Luo, Chen Zhang, Chenbo Geng, Yanzhi Yi, Jiahui Hu, Renwei Zhang, Zhen Zhang, Gianpietro Consolaro, Fan Yang, Tun Lu, Ning Gu, Li Shang, 2024 https://scholar.google.com/scholar?q=TSCompiler:+efficient+compilation+framework+for+dynamic-shape+models 14. TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture — Jiajun Wu, Mo Song, Jingmin Zhao, Yizhao Gao, Jia Li, Hayden Kwok-Hay So, 2024 https://scholar.google.com/scholar?q=TATAA:+Programmable+Mixed-Precision+Transformer+Acceleration+with+a+Transformable+Arithmetic+Architecture 15. FPGA Acceleration With Hessian-Based Comprehensive Intra-Layer Mixed-Precision Quantization for Transformer Models — Woohong Byun, Jongseok Woo, Saibal Mukhopadhyay, 2025 https://scholar.google.com/scholar?q=FPGA+Acceleration+With+Hessian-Based+Comprehensive+Intra-Layer+Mixed-Precision+Quantization+for+Transformer+Models 16. Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference — Zifan He, Rui Ma, Yizhou Sun, Jason Cong, 2026 https://scholar.google.com/scholar?q=Understand+and+Accelerate+Memory+Processing+Pipeline+for+Disaggregated+LLM+Inference 17. AI Post Transformers: FPGA Neural Network Accelerators for Space — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-26-fpga-neural-network-accelerators-for-spa-3087ae.mp3 18. AI Post Transformers: Continuous Batching for LLM Inference: Throughput and Latency Gains — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/continuous-batching-for-llm-inference-throughput-and-latency-gains/ 19. AI Post Transformers: Advancements in Efficient KV Cache Quantization and Management — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/advancements-in-efficient-kv-cache-quantization-and-management/ Interactive Visualization: Automating DNN Compilation for FPGA Accelerators
قبل يوم واحد

Boosted Decision Trees for CMS Muon Triggers

This episode explores how the CMS experiment uses machine learning inside its Level-1 endcap muon trigger, where hardware must estimate muon momentum within roughly 500 nanoseconds while filtering an enormous stream of proton-collision data. It explains why boosted decision trees were chosen over neural networks: not because they are trendier, but because they fit strict FPGA constraints around deterministic latency, fixed-point arithmetic, and bounded memory. A central finding is that the online system does not run the trees directly; instead, the model is trained offline and compiled into a massive precomputed lookup table, turning inference into a single fast memory access. The discussion is especially interesting because it shows machine learning as a systems-and-hardware co-design problem, grounded in detector physics, feature engineering, and the practical realities of deploying learned functions in one of the harshest real-time environments in science. Sources: 1. Boosted Decision Trees for CMS Muon Triggers https://indico.cern.ch/event/567550/papers/2629686/files/6172-acat_bdt_l1t.pdf 2. Applications and Techniques for Fast Machine Learning in Science — Allison McCarn Deiana, Nhan Tran, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Mia Liu, Mark Neubauer, Jennifer Ngadiuba, Maurizio Pierini and many others, 2022 https://scholar.google.com/scholar?q=Applications+and+Techniques+for+Fast+Machine+Learning+in+Science 3. Fast inference of Boosted Decision Trees in FPGAs for particle physics — Sioni Summers, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Duc Hoang, Sergo Jindariani, Edward Kreinar, Vladimir Loncar, Jennifer Ngadiuba, Maurizio Pierini, Dylan Rankin, Nhan Tran and Zhenbin Wu, 2020 https://scholar.google.com/scholar?q=Fast+inference+of+Boosted+Decision+Trees+in+FPGAs+for+particle+physics 4. Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors — Claudionor N. Coelho Jr, Aki Kuusela, Shan Li, Hao Zhuang, Jennifer Ngadiuba, Thea Klaeboe Aarrestad, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol and Sioni Summers, 2021 https://scholar.google.com/scholar?q=Automatic+heterogeneous+quantization+of+deep+neural+networks+for+low-latency+inference+on+the+edge+for+particle+detectors 5. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave — Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Mahdi Ghandi, Daniel Lo and others, 2018 https://scholar.google.com/scholar?q=Serving+DNNs+in+Real+Time+at+Datacenter+Scale+with+Project+Brainwave 6. The CMS Trigger System — CMS Collaboration, not specified in excerpt https://scholar.google.com/scholar?q=The+CMS+Trigger+System 7. The CMS Endcap Muon Track Finder — CMS Collaboration or EMTF-related authors, not specified in excerpt https://scholar.google.com/scholar?q=The+CMS+Endcap+Muon+Track+Finder 8. TMVA: Toolkit for Multivariate Data Analysis — Andreas Hoecker and collaborators, not specified in excerpt https://scholar.google.com/scholar?q=TMVA:+Toolkit+for+Multivariate+Data+Analysis 9. Fast Machine Learning for Science: how accelerated hardware and software are enabling real-time data analysis at the edge — Javier Duarte and collaborators, 2022 https://scholar.google.com/scholar?q=Fast+Machine+Learning+for+Science:+how+accelerated+hardware+and+software+are+enabling+real-time+data+analysis+at+the+edge 10. hls4ml: An Open-Source Codesign Workflow to Empower Scientific Low-Power Machine Learning Devices — Giuseppe Di Guglielmo, Javier Duarte and collaborators, 2021 https://scholar.google.com/scholar?q=hls4ml:+An+Open-Source+Codesign+Workflow+to+Empower+Scientific+Low-Power+Machine+Learning+Devices 11. End-to-end codesign of Hessian-aware quantized neural networks for FPGAs and ASICs — Javier Campos, Zhen Dong, Javier Duarte, Nhan Tran, et al., 2023 https://scholar.google.com/scholar?q=End-to-end+codesign+of+Hessian-aware+quantized+neural+networks+for+FPGAs+and+ASICs 12. FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs — Mustafa Tasci, Ayhan Istanbullu, Vedat Tumen, Selahattin Kosunalp, 2025 https://scholar.google.com/scholar?q=FPGA-QNN:+Quantized+Neural+Network+Hardware+Acceleration+on+FPGAs 13. An FPGA-Based Time-to-Digital Converter with Online Dual-Chain Calibration — Zhengsen Jia, Yuzhuo Wang, Jie Ding, Qian Xu, et al., 2025 https://scholar.google.com/scholar?q=An+FPGA-Based+Time-to-Digital+Converter+with+Online+Dual-Chain+Calibration 14. A Novel FPGA-based Time-to-Digital Converter featuring Machine Learning-Aided Self-Calibration — Arash Amini Bardpareh, Eleonora Vacca, Davide Nicolini, Luca Sterpone, et al., 2026 https://scholar.google.com/scholar?q=A+Novel+FPGA-based+Time-to-Digital+Converter+featuring+Machine+Learning-Aided+Self-Calibration 15. AI Post Transformers: FlatAttention for Tile-Based Accelerator Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-flatattention-for-tile-based-accelerator-56e6ca.mp3 16. AI Post Transformers: SolidAttention: Co-Designing Sparse Attention and SSD I/O — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-03-18-solidattention-co-designing-sparse-atten-5a8622.mp3 Interactive Visualization: Boosted Decision Trees for CMS Muon Triggers
قبل يوم واحد

Caffe and the Rise of CNN Frameworks

This episode explores why Caffe mattered as a systems breakthrough for the early CNN era, even though it did not introduce a new learning algorithm. It explains how the framework helped researchers and engineers move from handcrafted vision features to learned feature embeddings, and why separating model definition from implementation made experimentation and deployment far more practical. The discussion highlights Caffe’s use of declarative Protocol Buffers configurations, directed acyclic graph model structure, and the blob abstraction that hid CPU versus GPU details while supporting modular extensions. Listeners would find it interesting for its clear account of how deep learning became usable at scale in 2014, and for its nuanced take on Caffe’s evidence: strong engineering promises, impressive throughput figures, and a major role in shaping the emerging model-development ecosystem. Sources: 1. Caffe: Convolutional Architecture for Fast Feature Embedding — Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell, 2014 http://arxiv.org/abs/1408.5093 2. ImageNet Classification with Deep Convolutional Neural Networks — Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, 2012 https://scholar.google.com/scholar?q=ImageNet+Classification+with+Deep+Convolutional+Neural+Networks 3. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks — Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, Yann LeCun, 2013 https://scholar.google.com/scholar?q=OverFeat:+Integrated+Recognition,+Localization+and+Detection+using+Convolutional+Networks 4. Decaf: A Deep Convolutional Activation Feature for Generic Visual Recognition — Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, Trevor Darrell, 2014 https://scholar.google.com/scholar?q=Decaf:+A+Deep+Convolutional+Activation+Feature+for+Generic+Visual+Recognition 5. cuda-convnet — Alex Krizhevsky, 2012 https://scholar.google.com/scholar?q=cuda-convnet 6. Theano: new features and speed improvements — Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, James Bergstra, Ian Goodfellow, Arnaud Bergeron, Nicolas Bouchard, David Warde-Farley, Yoshua Bengio and others, 2012 https://scholar.google.com/scholar?q=Theano:+new+features+and+speed+improvements 7. Pylearn2: a machine learning research library — Ian Goodfellow, David Warde-Farley, Mehdi Mirza, Aaron Courville, Yoshua Bengio, 2013 https://scholar.google.com/scholar?q=Pylearn2:+a+machine+learning+research+library 8. Torch7 — Ronan Collobert, Koray Kavukcuoglu, Clément Farabet and others, 2011 https://scholar.google.com/scholar?q=Torch7 9. Efficient inference of Vision Transformer with structural pruning and operator fusion on GPU — unknown from snippet, recent https://scholar.google.com/scholar?q=Efficient+inference+of+Vision+Transformer+with+structural+pruning+and+operator+fusion+on+GPU 10. I-ViT: Integer-only quantization for efficient vision transformer inference — unknown from snippet, recent https://scholar.google.com/scholar?q=I-ViT:+Integer-only+quantization+for+efficient+vision+transformer+inference 11. DeViT: Decomposing vision transformers for collaborative inference in edge devices — unknown from snippet, recent https://scholar.google.com/scholar?q=DeViT:+Decomposing+vision+transformers+for+collaborative+inference+in+edge+devices 12. Raman: A reconfigurable and sparse TinyML accelerator for inference on edge — unknown from snippet, recent https://scholar.google.com/scholar?q=Raman:+A+reconfigurable+and+sparse+TinyML+accelerator+for+inference+on+edge 13. Hardware accelerator design for sparse DNN inference and training: A tutorial — unknown from snippet, recent https://scholar.google.com/scholar?q=Hardware+accelerator+design+for+sparse+DNN+inference+and+training:+A+tutorial 14. Inference serving with end-to-end latency SLOs over dynamic edge networks — unknown from snippet, recent https://scholar.google.com/scholar?q=Inference+serving+with+end-to-end+latency+SLOs+over+dynamic+edge+networks 15. Training data attribution via approximate unrolling — unknown from snippet, recent https://scholar.google.com/scholar?q=Training+data+attribution+via+approximate+unrolling 16. Exploring Training Data Attribution under Limited Access Constraints — unknown from snippet, recent https://scholar.google.com/scholar?q=Exploring+Training+Data+Attribution+under+Limited+Access+Constraints 17. DATE-LM: Benchmarking Data Attribution Evaluation for Large Language Models — unknown from snippet, recent https://scholar.google.com/scholar?q=DATE-LM:+Benchmarking+Data+Attribution+Evaluation+for+Large+Language+Models 18. AI Post Transformers: Speculative Decoding in Real vLLM Serving — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-speculative-decoding-in-real-vllm-servin-6f4e2b.mp3 19. AI Post Transformers: GPT-NeoX: Large-Scale Autoregressive Language Modeling in PyTorch — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/gpt-neox-large-scale-autoregressive-language-modeling-in-pytorch/ 20. AI Post Transformers: NVMe Offload on Colossal AI: Breaking the GPU Memory Wall — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/nvme-offload-on-colossal-ai-breaking-the-gpu-memory-wall/ 21. AI Post Transformers: Mamba-3 for Efficient Sequence Modeling — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-16-mamba-3-for-efficient-sequence-modeling-97a22a.mp3 Interactive Visualization: Caffe and the Rise of CNN Frameworks
قبل يوم واحد

Caffeine: A Unified FPGA for CNNs

This episode explores the 2016 Caffeine FPGA accelerator and its central claim that a single FPGA design can handle an entire CNN efficiently, rather than excelling at convolutions while bottlenecking on fully connected layers. It explains why that mattered in the AlexNet-to-VGG era, when convolutional layers were compute-bound but dense layers often became communication-bound because moving weights and activations through memory was the real constraint. The discussion focuses on Caffeine’s main technical idea: a unified matrix-multiplication-oriented representation that supports both convolution and fully connected layers without the heavy data expansion of standard `im2col` approaches, plus memory-access scheduling choices such as weight-major mapping to improve reuse and burst efficiency. Listeners would find it interesting because the episode makes a precise systems argument about how hardware performance depends not just on arithmetic throughput, but on matching dataflow, buffering, and bandwidth to the structure of the network. Sources: 1. Caffeine: A Unified FPGA for CNNs https://ceca.pku.edu.cn/media/lw/83b308c75c56a94fbf706b92dbe57917.pdf 2. Gradient-Based Learning Applied to Document Recognition — Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner, 1998 https://scholar.google.com/scholar?q=Gradient-Based+Learning+Applied+to+Document+Recognition 3. ImageNet Classification with Deep Convolutional Neural Networks — Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, 2012 https://scholar.google.com/scholar?q=ImageNet+Classification+with+Deep+Convolutional+Neural+Networks 4. Very Deep Convolutional Networks for Large-Scale Image Recognition — Karen Simonyan, Andrew Zisserman, 2014 https://scholar.google.com/scholar?q=Very+Deep+Convolutional+Networks+for+Large-Scale+Image+Recognition 5. Deep Residual Learning for Image Recognition — Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, 2016 https://scholar.google.com/scholar?q=Deep+Residual+Learning+for+Image+Recognition 6. DianNao: A Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning — Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, Olivier Temam, 2014 https://scholar.google.com/scholar?q=DianNao:+A+Small-Footprint+High-Throughput+Accelerator+for+Ubiquitous+Machine-Learning 7. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks — Yu-Hsin Chen, Tushar Krishna, Joel S. Emer, Vivienne Sze, 2016 https://scholar.google.com/scholar?q=Eyeriss:+An+Energy-Efficient+Reconfigurable+Accelerator+for+Deep+Convolutional+Neural+Networks 8. Caffeine: Towards Uniformed Representation and Acceleration for Deep Convolutional Neural Networks — Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, Jason Cong, 2016 https://scholar.google.com/scholar?q=Caffeine:+Towards+Uniformed+Representation+and+Acceleration+for+Deep+Convolutional+Neural+Networks 9. In-Datacenter Performance Analysis of a Tensor Processing Unit — Norman P. Jouppi and many colleagues at Google, 2017 https://scholar.google.com/scholar?q=In-Datacenter+Performance+Analysis+of+a+Tensor+Processing+Unit 10. Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks — Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, Jason Cong, 2015 https://scholar.google.com/scholar?q=Optimizing+FPGA-Based+Accelerator+Design+for+Deep+Convolutional+Neural+Networks 11. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network — Jiantao Qiu, Jingsheng Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, Huazhong Yang, 2016 https://scholar.google.com/scholar?q=Going+Deeper+with+Embedded+FPGA+Platform+for+Convolutional+Neural+Network 12. fpgaConvNet: A Framework for Mapping Convolutional Neural Networks on FPGAs — Stylianos I. Venieris, Christos-Savvas Bouganis, 2016 https://scholar.google.com/scholar?q=fpgaConvNet:+A+Framework+for+Mapping+Convolutional+Neural+Networks+on+FPGAs 13. A high-performance FPGA-based depthwise separable convolution accelerator — approximate; recent FPGA accelerator authors, recent https://scholar.google.com/scholar?q=A+high-performance+FPGA-based+depthwise+separable+convolution+accelerator 14. Fpga-based acceleration for convolutional neural networks: A comprehensive review — approximate; review authors, recent https://scholar.google.com/scholar?q=Fpga-based+acceleration+for+convolutional+neural+networks:+A+comprehensive+review 15. Mobile-X: Dedicated FPGA implementation of the MobileNet accelerator optimizing depthwise separable convolution — approximate; Mobile-X authors, recent https://scholar.google.com/scholar?q=Mobile-X:+Dedicated+FPGA+implementation+of+the+MobileNet+accelerator+optimizing+depthwise+separable+convolution 16. Design of a convolutional neural network accelerator based on on-chip data reordering — approximate; accelerator authors, recent https://scholar.google.com/scholar?q=Design+of+a+convolutional+neural+network+accelerator+based+on+on-chip+data+reordering 17. Energy-efficient and high-throughput CNN inference engine based on memory-sharing and data-reusing for edge applications — approximate; edge-CNN accelerator authors, recent https://scholar.google.com/scholar?q=Energy-efficient+and+high-throughput+CNN+inference+engine+based+on+memory-sharing+and+data-reusing+for+edge+applications 18. An efficient sparse CNN inference accelerator with balanced intra-and inter-PE workload — approximate; sparse accelerator authors, recent https://scholar.google.com/scholar?q=An+efficient+sparse+CNN+inference+accelerator+with+balanced+intra-and+inter-PE+workload 19. Hardware accelerator design for sparse DNN inference and training: A tutorial — approximate; tutorial authors, recent https://scholar.google.com/scholar?q=Hardware+accelerator+design+for+sparse+DNN+inference+and+training:+A+tutorial 20. AI Post Transformers: FlatAttention for Tile-Based Accelerator Inference — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-04-flatattention-for-tile-based-accelerator-56e6ca.mp3 21. AI Post Transformers: RFNoC SISO Processor via High-Level Synthesis — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-05-06-rfnoc-siso-processor-via-high-level-synt-c892f3.mp3 22. AI Post Transformers: Computation-Bandwidth-Memory Trade-offs for AI Infrastructure — Hal Turing & Dr. Ada Shannon, 2026 https://podcast.do-not-panic.com/episodes/2026-04-09-computation-bandwidth-memory-trade-offs-a83f2b.mp3 23. AI Post Transformers: Los Alamos: overcoming the memory wall fighting sparse memory access — Hal Turing & Dr. Ada Shannon, 2025 https://podcast.do-not-panic.com/episodes/los-alamos-overcoming-the-memory-wall-fighting-sparse-memory-access/ Interactive Visualization: Caffeine: A Unified FPGA for CNNs

مشاهدة الكل (٦١٢)

صناع العمل

mcgrof
سنوات النشاط

٢٠٢٥ - ٢٠٢٦
الحلقات

٦١٢
التقييم

ملائم
موقع البرنامج على الويب

AI Post Transformers

التكنولوجيا

التكنولوجيا

يتم التحديث أسبوعيًا
التكنولوجيا

التكنولوجيا

يتم التحديث يوميًا
التكنولوجيا

التكنولوجيا

يتم التحديث أسبوعيًا
التكنولوجيا

التكنولوجيا

يتم التحديث أسبوعيًا
المجتمع والثقافة

المجتمع والثقافة

يتم التحديث أسبوعيًا
طب

طب

يتم التحديث أسبوعيًا

AI Post Transformers

Can Models Learn from Long Context?

Machine Learning Self-Calibrated FPGA Time-to-Digital Converter

SGLang for Faster Structured LLM Programs

TensorFlow for Distributed Machine Learning Systems

Automating DNN Compilation for FPGA Accelerators

Boosted Decision Trees for CMS Muon Triggers

Caffe and the Rise of CNN Frameworks

Caffeine: A Unified FPGA for CNNs

حول

المعلومات

قد يعجبك أيضًا

AI Post Transformers

الحلقات

Can Models Learn from Long Context?

Machine Learning Self-Calibrated FPGA Time-to-Digital Converter

SGLang for Faster Structured LLM Programs

TensorFlow for Distributed Machine Learning Systems

Automating DNN Compilation for FPGA Accelerators

Boosted Decision Trees for CMS Muon Triggers

Caffe and the Rise of CNN Frameworks

Caffeine: A Unified FPGA for CNNs

حول

المعلومات

قد يعجبك أيضًا