9 episodes

I'm Gil Elbaz, Co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.

Unboxing AI: The Podcast for Computer Vision Engineers Unboxing AI

    • Science

I'm Gil Elbaz, Co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.

    YOLO: Building AI with an Open-Source Community

    YOLO: Building AI with an Open-Source Community

    ABSTRACTOur guest this episode is Glenn Jocher, CEO and founder of Ultralytics, the company that brought you YOLO v5 and v8. Gil and Glenn discuss how to build an open-source community on Github, the history of YOLO and even particle physics. They also talk about the progress of AI, diffusion and transformer models and the importance of simulated synthetic data today. The first episode of season 2 is full of stimulating conversation to understand the applications of YOLO and the impact of open source on the AI community.

    TOPICS & TIMESTAMPS

    0:00 Introduction2:03 First Steps in Machine Learning9:40 Neutrino Particles and Simulating Neutrino Detectors14:18 Ultralytics17:36 Github21:09 History of YOLO25:28 YOLO for Keypoints29:00 Applications of YOLO30:48 Transformer and Diffusion Models for Detection35:00 Speed Bottleneck37:23 Simulated Synthetic Data Today42:08 Sentience of AGI and Progress of AI46:42 ChatGPT, CLIP and LLaMA Open Source Models50:04 Advice for Next Generation CV Engineers

    LINKS & RESOURCES

    Linkedin

    Twitter

    Google scholar 

    Ultralytics

    Github

    National Geospatial Intelligence Agency

    Neutrino

    Antineutrino

    Joseph Redmon

    Ali Farhadi

    Enrico Fermi

    Kashmir World Foundation

    R-CNN

    Fast R-CNN

    LLaMA model

    MS COCO

    GUEST BIO

    Glenn Jocher is currently the founder and CEO of Ultralytics, a company focused on enabling developers to create practical, real-time computer vision capabilities with a mission to make AI easy to develop. He has built one of the largest developer communities on GitHub in the machine learning space with over 50,000 stars for his YOLO v5 and YOLO v8 releases. This is one of the leading packages used for the development of edge device computer vision with a focus on object classification, detection, and segmentation at real-time speeds with limited compute resources. Glenn previously worked at the United States National Geospatial Intelligence Agency and published the first ever Global Antineutrino map. 

    ABOUT THE HOST:

    I’m Gil Elbaz, co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.

    • 52 min
    Synthetic Data: Simulation & Visual Effects at Scale

    Synthetic Data: Simulation & Visual Effects at Scale

    ABSTRACT

    Gil Elbaz speaks with Tadas Baltrusaitis, who recently released the seminal paper DigiFace 1M: 1 Million Digital Face Images for Face Recognition. Tadas is a true believer in synthetic data and shares his deep knowledge of the subject along with  insights on the current state of the field and what CV engineers need to know. Join Gil as they discuss morphable models, multimodal learning, domain gaps, edge cases and more

    TOPICS & TIMESTAMPS

    0:00 Introduction

    2:06 Getting started in computer science

    3:40 Inferring mental states from facial expressions

    7:16 Challenges of facial expressions

    8:40 Open Face

    10:46 MATLAB to Python

    13:17 Multimodal Machine Learning

    15:52 Multimodals and Synthetic Data

    16:54 Morphable Models

    19:34 HoloLens

    22:07 Skill Sets for CV Engineers

    25:25 What is Synthetic Data?

    27:07 GANs and Diffusion Models

    31:24 Fake it Til You Make It

    35:25 Domain Gaps

    36:32 Long Tails (Edge Cases)

    39:42 Training vs. Testing

    41:53 Future of NeRF and Diffusion Models

    48:26 Avatars and VR/AR

    50:39 Advice for Next Generation CV Engineers

    51:58 Season One Wrap-Up

    LINKS & RESOURCES

    Tadas Baltrusaitis

    LinkedIn Github  Google Scholar

    Fake it Til You Make It

    Video 

    Github

    Digiface 1M

    A 3D Morphable Eye Region Model for Gaze Estimation
    Hololens

    Multimodal Machine Learning: A Survey and Taxonomy 

    3d face reconstruction with dense landmarks

    Open Face

    Open Face 2.0

    Dr. Rana el Kaliouby

    Dr. Louis-Philippe Morency

    Peter Robinson

    Jamie Shotton

    Errol Wood

    Affectiva

    GUEST BIO

    Tadas Baltrusaitis is a principal scientist working in the Microsoft Mixed Reality and AI lab in Cambridge, UK where he leads the human synthetics team. He recently co-authored the groundbreaking paper DigiFace 1M, a data set of 1 million synthetic images for facial recognition. Tadas is also the co-author of Fake It Till You Make It: Face Analysis in the Wild Using Synthetic Data Alone, among other outstanding papers. His PhD research focused on automatic facial expression analysis in  difficult real world settings and he was a postdoctoral associate at Carnegie Mellon University where his primary research lay in automatic understanding of human behavior, expressions and mental states using computer vision.

    ABOUT THE HOST

    I’m Gil Elbaz, co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.

    • 53 min
    SLAM and the Evolution of Spatial AI

    SLAM and the Evolution of Spatial AI

    Host Gil Elbaz welcomes Andrew J. Davison, the father of SLAM. Andrew and Gil dive right into how SLAM has evolved and how it started. They speak about Spatial AI and what it means along with a discussion about global belief propagation. Of course, they talk about robotics, how it's impacted by new technologies like NeRF and what is the current state-of-the-art.

    Timestamps and Topics

    [00:00:00] Intro

    [00:02:07] Early Research Leading to SLAM

    [00:04:49] Why SLAM

    [00:08:20] Computer Vision Based SLAM

    [00:09:18] MonoSLAM Breakthrough

    [00:13:47] Applications of SLAM
    [00:16:27] Modern Versions of SLAM
    [00:21:50] Spatial AI
    [00:26:04] Implicit vs. Explicit Scene Representations
    [00:34:32] Impact on Robotics
    [00:38:46] Reinforcement Learning (RL)
    [00:43:10] Belief Propagation Algorithms for Parallel Compute
    [00:50:51] Connection to Cellular Automata
    [00:55:55] Recommendations for the Next Generation of Researchers
    Interesting Links:

    Andrew Blake

    Hugh Durrant-Whyte

    John Leonard

    Steven J. Lovegrove

    Alex  Mordvintsev

    Prof. David Murray

    Richard Newcombe

    Renato Salas-Moreno 

    Andrew Zisserman

    A visual introduction to Gaussian Belief Propagation
    Github: Gaussian Belief Propagation

    A Robot Web for Distributed Many-Device Localisation

    In-Place Scene Labelling and Understanding with Implicit Scene Representation

    Video 

    Video: Robotic manipulation of object using SOTA

    Andrew Reacting to NERF in 2020

    Cellular automata

    Neural cellular automata

    Dyson Robotics

    Guest Bio

    Andrew Davison is a professor of Robot Vision at the Department of Computing, Imperial College London. In addition, he is the director and founder of the Dyson robotics laboratory. Andrew pioneered the cornerstone algorithm - SLAM (Simultaneous Localisation and Mapping) and has continued to develop SLAM  in substantial ways since then. His research focus is in improving & enhancing SLAM in terms of dynamics, scale, detail level, efficiency and semantic understanding of real-time video. SLAM has evolved into a whole new domain of “Spatial AI” leveraging neural implicit representations and the suite of cutting-edge methods creating a full coherent representation of the real world from video.

    About the Host

    I'm Gil Elbaz, co-founder and CTO of Datagen. I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It's about much more than the technical processes. It's about people, journeys and ideas. Turn up the volume, insights inside.

    • 1 hr 2 min
    The Next Frontier: Computer Vision on 3D Data - with Or Litany, Sr. Research Scientist, NVIDIA

    The Next Frontier: Computer Vision on 3D Data - with Or Litany, Sr. Research Scientist, NVIDIA

    Gil Elbaz hosts Or Litany, a senior research scientist at NVIDIA. They discuss the impact of 3D on computer vision and where it’s going in the near future. As well, they talk about the impact of industry on academia and vice versa. Or speaks about the future of 3D generative models, NeRF and how multi-modal models are changing computer vision. Together, Gil and Or explore the best ways to succeed in the field of AI.

    TOPICS & TIMESTAMPS

    [0:34] Intro

    [2:01] Starting his journey 

    [5:03] Heat transfer equation in graphics 

    [10:21] Multimodal changing Computer Vision 

    [17:47] Why is 3D Important?

    [23:17] 3D Generative Models in the next years

    [26:25] Neural Rendering

    [29:39] Connection between images/video & 3D

    [31:39] Temporal Data 

    [33:45] Autonomous Driving & Simulation

    [36:27] Prof Leonidas Guibas 

    [41:56] NeRF & Editing 3D information 

    [46:02] Manipulation of 3D representations

    [52:23]  Future of NeRF

    [1:02:31] Google 

    [1:06:03] Meta [FAIR] experience

    [1:09:57] Nvidia 

    [1:10:58] Sanya Fidler

    [1:16:38] Consciousness 

    [1:21:31] Career Tips for Computer Vision Engineers

    Or Litany:

    LinkedIn

    Google Scholar

    Github

    Interesting links:

    Alex Bronstein

    Angel Chang

    Sanja Fidler

    Leonidas Guibas

    Judy Hoffman

    Justin Johnson

    Fei-Fei Li

    Ameesh Makadia

    Manolis Saava

    Srinath Sridhar

    Charles Ruizhongtai Qi

    PointNet

    Red-Black Tree

    Nvidia

    Two minute papers

    The Three Body Problem 

    EG3D: Efficient Geometry-aware 3D Generative Adversarial Networks  



    GUEST BIO

    Our guest is Or Litany. Or Litany currently works as a senior research scientist at Nvidia. He earned his BSC in physics and mathematics from Hebrew University and his master's degree from the Technion. After that, he went on to do his PhD at Tel Aviv University, where he worked on analyzing 3D data with graph neural networks under professor Alex Bronstein. For his postdoc, Or attended Stanford University studying under the legendary professor Leonidas Guibas, as well as working as part of FAIR, the research group of Meta, where he pushed the cutting edge of 3D data analysis. Or is an extremely accomplished researcher with research that focuses on 3D deep learning for scene understanding, point cloud analysis and shape analysis. In 2023, Or will be joining the Technion as an assistant professor.

    ABOUT THE HOST

    I’m Gil Elbaz, Co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain

    • 1 hr 24 min
    Body Models Driving the Age of the Avatar – with Michael J. Black, Director, Perceiving Systems Department, Max Planck Institute for Intelligent Systems

    Body Models Driving the Age of the Avatar – with Michael J. Black, Director, Perceiving Systems Department, Max Planck Institute for Intelligent Systems

    In this episode of Unboxing AI, I host Michael J. Black from the Max Planck Institute. We speak about body models, his journeys in industry and academia, representing all human body types and the age of the avatar. Michael explains about the early days of computer vision, his experiences commercializing body models through his startup, Body Labs, and how the metaverse and our avatars will revolutionize our everyday lives.

    Episode transcript and more at UnboxingAI.show

    TOPICS & TIMESTAMPS

    00:39 Guest Intro
    01:41 What are body models and why are they so useful?
    04:17 Human interpretability - important or not?
    05:32 Real use cases for body models
    10:54 History of body model development leading to SMPL
    19:21 Body model development beyond SMPL: MANO, FLAME, SMPL-X, and more
    22:11 Edge cases: dealing with unique body shapes
    24:45 Early days of computer vision
    27:37 Working at Xerox PARC
    30:00 Shifting to academia
    31:30 The vision for Perceiving Systems at MPI-IS
    34:15 Innovation and team structure at Perceiving Systems
    37:40 Perceiving Systems - similarities to a startup
    40:38 Founding Body Labs
    45:30 Body Labs' Acquisition by Amazon
    47:24 Distinguished Amazon Scholar role
    49:03 About Meshcapade
    50:05 What is the metaverse?
    50:56 The age of the avatar
    56:32 Career Tips for Computer Vision Engineers

    LINKS AND RESOURCES

    Michael J. Black @ MPI-IS
    LinkedIn
    Google Scholar
    Twitter
    YouTube

    Papers at CVPR 2022
    BEV
    OSSO
    EMOCA

    Body Models
    SMPL
    FLAME
    MANO
    SMPL-X
    STAR
    SCAPE

    About Meshcapade
    Website
    GitHub
    Instagram

    About Perceiving Systems
    Overview Video
    Website

    GUEST BIO

    Our guest is Michael J. Black, one of the founding directors of the Max Planck Institute for Intelligent Systems in Tübingen, Germany. He completed his PhD in computer science at Yale University, his postdoc at the University of Toronto, and has co-authored over 200 peer-reviewed papers to date. His research focuses on understanding humans and their behavior in video, working at the boundary of computer vision, machine learning, and computer graphics. His work on realistic 3D human body models such as SMPL has been widely used in both academia and industry, and in 2017, the start-up he co-founded to commercialize these technologies was acquired by Amazon. Today, Michael and his teams at MPI are developing exciting new capabilities in computer vision that will be important for the future of 3D avatars, the metaverse and beyond.

    ABOUT THE HOST

    I’m Gil Elbaz, Co-founder and CTO of Datagen. In this podcast, I speak with interesting computer vision thinkers and practitioners. I ask the big questions that touch on the issues and challenges that ML and CV engineers deal with every day. On the way, I hope you uncover a new subject or gain a different perspective, as well as enjoying engaging conversation. It’s about much more than the technical processes – it’s about people, journeys, and ideas. Turn up the volume, insights inside.

    • 59 min
    Solving Autonomous Driving At Scale – With Vijay Badrinarayanan, VP of AI, Wayve

    Solving Autonomous Driving At Scale – With Vijay Badrinarayanan, VP of AI, Wayve

    In this episode of Unboxing AI, meet Vijay Badrinarayanan, the VP of AI at Wayve, and learn about Wayve’s end-to-end machine learning approach to self-driving. Along the way, Vijay shares what it was like working for Magic Leap in the early days, and relates the research journey that led to SegNet.

    TOPICS & TIMESTAMPS

    00:47 Guest Intro

    02:38 Academia & Classic Computer Vision

    08:56 PostDoc @ Cambridge - Road scene segmentation

    18:42 Technical Challenges Faced During Early Deep Computer Vision

    20:24 Meeting Alex Kendall; SegNet

    25:15 Transition from Academia to Production Computer Vision at Magic Leap

    27:09 Deep Eye-Gaze Estimation at Magic Leap

    33:21 Joining Wayve

    36:09 AV 1.0: First-gen autonomy

    40:08 On Tesla LIDARs and their unique approach to AV

    46:37 Wayve's AV 2.0 Approach

    48:42 Programming By Data / Data-as-Code

    51:02 Addressing the Long Tail Problem in AV

    53:13 Powering AV 2.0 with Simulation

    58:30 Re-simulation, Closing the Loop & Testing Neural Networks

    1:01:44 The Future of AI and Advanced Approaches

    1:11:50 Are there other 2.0s? Next industries to revolutionize

    1:13:48 Next Steps for Wayve

    1:14:59 Human-level AI

    1:16:35 Career Tips for Computer Vision Engineers



    LINKS AND RESOURCES

     - On The Guest - Vijay Badrinarayanan

    LinkedIn: https://www.linkedin.com/in/vijay-badrinarayanan-6578692/

    Twitter: https://twitter.com/vijaycivs

    Google Scholar: https://scholar.google.com/citations?user=WuJckpkAAAAJ



     - About Wayve

    https://wayve.ai/

    https://sifted.eu/articles/wayve-autonomous-driving/

    AV 2.0 Technical Thesis - Reimagining an autonomous vehicle: https://arxiv.org/abs/2108.05805



     - SegNet

    Vijay & Alex Kendall together with Roberto Cipolla release a revolutionary paper on segmentation with a novel and practical deep fully convolutional NN architecture for semantic pixel-wise segmentation

    https://ieeexplore.ieee.org/abstract/document/7803544/



     - Good NeRF explainer here:

    https://datagen.tech/guides/synthetic-data/neural-radiance-field-nerf/



     - DALL-E 2

    https://openai.com/dall-e-2/



     - StyleGAN2

    https://github.com/NVlabs/stylegan2



    GUEST BIO

    Vijay Badrinarayanan is VP of AI at Wayve, a company pioneering AI technology to enable autonomous vehicles to drive in complex urban environments. He has been at the forefront of deep learning and artificial intelligence (AI) research and product development from the inception of the new era of deep learning driven AI. His joint research work in semantic segmentation conducted in Cambridge University, along with Alex Kendall, CEO of Wayve, is one of the highly cited publications in deep learning. As Director of Deep Learning and Artificial Intelligence (AI) at Magic Leap Inc., California he led R&D teams to deliver impactful first of its kind deep neural network driven products for power constrained Mixed Reality headset applications. As VP of AI, Vijay aims to deepen Wayve’s investment in deep learning to develop the end-to-end learnt brains behind Wayve’s self-driving technology. He is actively building a vision and learning team at Mountain View, CA focusing on actively researched AI topics such as representation learning, simulation intelligence and combined vision and language models with a view towards making meaningful product impact and bringing this cutting-edge approach to AVs to market.



    FULL TRANSCRIPT AND MORE AT:

    https://unboxingai.show/

    • 1 hr 19 min

Top Podcasts In Science

עושים היסטוריה עם רן לוי Osim Historia With Ran Levi
רשת עושים היסטוריה
קול היקום
סוכנות החלל הישראלית
המעבדה The Lab
כאן | Kan
Making Sense with Sam Harris
Sam Harris
Голый землекоп
libo/libo
Hidden Brain
Hidden Brain, Shankar Vedantam