46 episodes

I make videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society.

Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar (preferred to Patreon): https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Yannic Kilcher Videos (Audio Only‪)‬ Yannic Kilcher

    • Technology

I make videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society.

Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar (preferred to Patreon): https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

    [ML News] Roomba Avoids Poop | Textless NLP | TikTok Algorithm Secrets | New Schmidhuber Blog

    [ML News] Roomba Avoids Poop | Textless NLP | TikTok Algorithm Secrets | New Schmidhuber Blog

    #schmidhuber #tiktok #roomba



    Your regularly irregular update on what's happening in the world of Machine Learning.



    OUTLINE:

    0:00 - Intro

    0:15 - Sponsor: Weights & Biases

    1:55 - ML YouTuber reaches 100k subscribers

    2:40 - Facebook AI pushes Textless NLP

    5:30 - Schmidhuber blog post: I invented everything

    7:55 - TikTok algorithm rabbitholes users

    10:45 - Roomba learns to avoid poop

    11:50 - AI can spot art forgeries

    14:55 - Deepmind's plans to separate from Google

    16:15 - Cohere raises 40M

    16:55 - US Judge rejects AI inventor on patent

    17:55 - Altman: GPT-4 not much bigger than GPT-3

    18:45 - Salesforce CodeT5

    19:45 - DeepMind Reinforcement Learning Lecture Series

    20:15 - WikiGraphs Dataset

    20:40 - LiveCell Dataset

    21:00 - SpeechBrain

    21:10 - AI-generated influencer gains 100 sponsorships

    22:20 - AI News Questions

    23:15 - AI hiring tools reject millions of valid applicants



    Sponsor: Weights & Biases

    https://wandb.me/start



    References:

    Facebook AI creates Textless NLP

    https://ai.facebook.com/blog/textless...

    https://speechbot.github.io/pgslm/?fb...



    Schmidhuber invented everything

    https://people.idsia.ch/~juergen/most...



    How TikTok's algorithm works

    https://www.wsj.com/video/series/insi...



    Roomba learns to avoid poop

    https://edition.cnn.com/2021/09/09/te...



    Amateur develops fake art detector

    https://blogs.nvidia.com/blog/2021/08...

    https://spectrum.ieee.org/this-ai-can...



    DeepMind's plan to break away from Google

    https://www.businessinsider.com/deepm...

    https://archive.ph/8s5IK



    Cohere raises USD 40M

    https://www.fastcompany.com/90670635/...

    https://cohere.ai/



    US judge refuses AI patent

    https://www.theregister.com/2021/09/0...



    Sam Altman on GPT-4

    https://www.reddit.com/r/OpenAI/comme...



    Salesforce releases CodeT5

    https://blog.einstein.ai/codet5/



    DeepMind RL lecture series

    https://deepmind.com/learning-resourc...



    WikiGraphs Dataset

    https://github.com/deepmind/deepmind-...



    LiveCell Dataset

    https://sartorius-research.github.io/...

    https://www.nature.com/articles/s4159...



    SpeechBrain Library

    https://speechbrain.github.io/



    AI generated influencer lands 100 sponsorships

    https://www.allkpop.com/article/2021/...



    AI News Questions

    https://www.forbes.com/sites/tomtaull...

    https://mindmatters.ai/2021/09/isnt-i...

    https://fortune.com/2021/09/07/deepmi...

    https://www.forbes.com/sites/anniebro...

    https://www.cnbctv18.com/views/view-a...

    https://www.kcrw.com/culture/shows/li...

    https://techcrunch.com/2021/09/07/ai-...

    https://www.forbes.com/sites/bernardm...



    AI hiring tools mistakenly reject millions of applicants

    https://www.theverge.com/2021/9/6/226...



    Links:

    TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

    YouTube: https://www.youtube.com/c/yannickilcher

    Twitter: https://twitter.com/ykilcher

    Discord: https://discord.gg/4H8xxDF

    BitChute: https://www.bitchute.com/channel/yann...

    Minds: https://www.minds.com/ykilcher

    Parler: https://parler.com/profile/YannicKilcher

    LinkedIn: https://www.linkedin.com/in/ykilcher

    BiliBili: https://space.bilibili.com/1824646584



    If you want to support me, the best thing to do is to share out the content :)

    • 25 min
    Celebrating 100k Subscribers! (w/ Channel Statistics)

    Celebrating 100k Subscribers! (w/ Channel Statistics)

    #yannickilcher #machinelearning #100k



    OUTLINE:

    0:00 - 100k!

    1:00 - Announcements & Thanks

    3:55 - Channel Statistics



    Links:

    TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

    YouTube: https://www.youtube.com/c/yannickilcher

    Twitter: https://twitter.com/ykilcher

    Discord: https://discord.gg/4H8xxDF

    BitChute: https://www.bitchute.com/channel/yann...

    Minds: https://www.minds.com/ykilcher

    Parler: https://parler.com/profile/YannicKilcher

    LinkedIn: https://www.linkedin.com/in/yannic-ki...

    BiliBili: https://space.bilibili.com/1824646584



    If you want to support me, the best thing to do is to share out the content :)



    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

    SubscribeStar: https://www.subscribestar.com/yannick...

    Patreon: https://www.patreon.com/yannickilcher

    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2

    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

    • 9 min
    [ML News] AI predicts race from X-Ray | Google kills HealthStreams | Boosting Search with MuZero

    [ML News] AI predicts race from X-Ray | Google kills HealthStreams | Boosting Search with MuZero

    #mlnews #schmidhuber #muzero



    Your regular updates on what's happening in the ML world!



    OUTLINE:

    0:00 - Intro

    0:15 - Sponsor: Weights & Biases

    1:45 - Google shuts down health streams

    4:25 - AI predicts race from blurry X-Rays

    7:35 - Facebook labels black men as primates

    11:05 - Distill papers on Graph Neural Networks

    11:50 - Jürgen Schmidhuber to lead KAUST AI Initiative

    12:35 - GitHub brief on DMCA notices for source code

    14:55 - Helpful Reddit Threads

    19:40 - Simple Tricks to improve Transformers

    20:40 - Apple's Unconstrained Scene Generation

    21:40 - Common Objects in 3D dataset

    22:20 - WarpDrive Multi-Agent RL framework

    23:10 - My new paper: Boosting Search Agents & MuZero

    25:15 - Can AI detect depression from speech?



    References:

    Google shuts down Health Streams

    https://techcrunch.com/2021/08/26/goo...



    AI predicts race from X-Rays

    https://www.iflscience.com/technology...

    https://arxiv.org/ftp/arxiv/papers/21...



    Facebook labels black men as primates

    https://www.nytimes.com/2021/09/03/te...

    https://en.wikipedia.org/wiki/Human



    Distill articles on GNNs

    https://distill.pub/2021/gnn-intro/

    https://distill.pub/2021/understandin...



    Jürgen Schmidhuber leads KAUST AI initiative

    https://people.idsia.ch/~juergen/kaus...



    GitHub issues court brief on code DMCAs

    https://github.blog/2021-08-31-vague-...



    Useful Reddit Threads

    https://www.reddit.com/r/MachineLearn...

    https://www.reddit.com/r/MachineLearn...

    https://www.reddit.com/r/MachineLearn...

    https://www.reddit.com/r/MachineLearn...



    Tricks to improve Transformers

    https://arxiv.org/pdf/2108.12284.pdf



    Unconstrained Scene Generation

    https://apple.github.io/ml-gsn/



    Common Objects in 3D dataset

    https://ai.facebook.com/blog/common-o...



    WarpDrive Multi-Agent RL framework

    https://blog.einstein.ai/warpdrive-fa...



    Boosting Search Engines / MuZero Code

    https://arxiv.org/abs/2109.00527

    https://github.com/google-research/go...

    https://github.com/google-research/la...



    Can AI detect depression?

    https://venturebeat.com/2021/08/31/ai...



    Links:

    TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

    YouTube: https://www.youtube.com/c/yannickilcher

    Twitter: https://twitter.com/ykilcher

    Discord: https://discord.gg/4H8xxDF

    BitChute: https://www.bitchute.com/channel/yann...

    Minds: https://www.minds.com/ykilcher

    Parler: https://parler.com/profile/YannicKilcher

    LinkedIn: https://www.linkedin.com/in/yannic-ki...

    BiliBili: https://space.bilibili.com/1824646584



    If you want to support me, the best thing to do is to share out the content :)



    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

    SubscribeStar: https://www.subscribestar.com/yannick...

    Patreon: https://www.patreon.com/yannickilcher

    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2

    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

    • 27 min
    ∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)

    ∞-former: Infinite Memory Transformer (aka Infty-Former / Infinity-Former, Research Paper Explained)

    #inftyformer #infinityformer #transformer



    Vanilla Transformers are excellent sequence models, but suffer from very harsch constraints on the length of the sequences they can process. Several attempts have been made to extend the Transformer's sequence length, but few have successfully gone beyond a constant factor improvement. This paper presents a method, based on continuous attention mechanisms, to attend to an unbounded past sequence by representing the past as a continuous signal, rather than a sequence. This enables the Infty-Former to effectively enrich the current context with global information, which increases performance on long-range dependencies in sequence tasks. Further, the paper presents the concept of sticky memories, which highlight past events that are of particular importance and elevates their representation in the long-term memory.



    OUTLINE:

    0:00 - Intro & Overview

    1:10 - Sponsor Spot: Weights & Biases

    3:35 - Problem Statement

    8:00 - Continuous Attention Mechanism

    16:25 - Unbounded Memory via concatenation & contraction

    18:05 - Does this make sense?

    20:25 - How the Long-Term Memory is used in an attention layer

    27:40 - Entire Architecture Recap

    29:30 - Sticky Memories by Importance Sampling

    31:25 - Commentary: Pros and cons of using heuristics

    32:30 - Experiments & Results



    Paper: https://arxiv.org/abs/2109.00301



    Sponsor: Weights & Biases

    https://wandb.me/start



    Abstract:

    Transformers struggle when attending to long contexts, since the amount of computation grows with the context length, and therefore they cannot model long-term memories effectively. Several variations have been proposed to alleviate this problem, but they all have a finite memory capacity, being forced to drop old information. In this paper, we propose the ∞-former, which extends the vanilla transformer with an unbounded long-term memory. By making use of a continuous-space attention mechanism to attend over the long-term memory, the ∞-former's attention complexity becomes independent of the context length. Thus, it is able to model arbitrarily long contexts and maintain "sticky memories" while keeping a fixed computation budget. Experiments on a synthetic sorting task demonstrate the ability of the ∞-former to retain information from long sequences. We also perform experiments on language modeling, by training a model from scratch and by fine-tuning a pre-trained language model, which show benefits of unbounded long-term memories.



    Authors: Pedro Henrique Martins, Zita Marinho, André F. T. Martins



    Links:

    TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

    YouTube: https://www.youtube.com/c/yannickilcher

    Twitter: https://twitter.com/ykilcher

    Discord: https://discord.gg/4H8xxDF

    BitChute: https://www.bitchute.com/channel/yann...

    Minds: https://www.minds.com/ykilcher

    Parler: https://parler.com/profile/YannicKilcher

    LinkedIn: https://www.linkedin.com/in/yannic-ki...

    BiliBili: https://space.bilibili.com/1824646584



    If you want to support me, the best thing to do is to share out the content :)



    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

    SubscribeStar: https://www.subscribestar.com/yannick...

    Patreon: https://www.patreon.com/yannickilcher

    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2

    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

    • 36 min
    [ML News] Blind Chess AI Competition | Graph NNs for traffic | AI gift suggestions

    [ML News] Blind Chess AI Competition | Graph NNs for traffic | AI gift suggestions

    #mlnews #chess #neurips



    OUTLINE:

    0:00 - Intro

    0:30 - Reconnaissance Blind Chess NeurIPS 2021 Competition

    3:40 - Colab Pro no longer top priority for GPUs

    4:45 - DeepMind uses Graph NNs to do traffic prediction

    6:00 - Helpful Libraries: Isaac Gym, Differentiable Human, LVIS, BEHAVIOR

    10:25 - Cerebras Wafer Scale Engine Cluster

    12:15 - AI Voice Synthesis for Val Kilmer

    14:20 - Can AI give thoughtful gifts?



    References:

    Reconnaissance Blind Chess NeurIPS 2021 Competition

    https://rbc.jhuapl.edu/

    https://rbc.jhuapl.edu/gameRules



    Colab Pro no longer top priority

    https://www.reddit.com/r/MachineLearn...



    Google Maps ETA prediction using Graph Neural Networks

    https://arxiv.org/pdf/2108.11482.pdf



    Isaac Gym: RL simulator on GPU

    https://arxiv.org/abs/2108.10470

    https://sites.google.com/view/isaacgy...

    https://developer.nvidia.com/isaac-gym



    Cerebras Cluster for massive AI models

    https://www.wired.com/story/cerebras-...



    Helpful Libraries / Datasets

    https://nimblephysics.org/docs/human-...

    https://www.lvisdataset.org/

    https://arxiv.org/pdf/2108.03332.pdf



    AI Voice Reconstruction

    https://www.washingtonpost.com/techno...



    Can AI make thoughtful gifts?

    https://www.forbes.com/sites/anniebro...



    Links:

    TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

    YouTube: https://www.youtube.com/c/yannickilcher

    Twitter: https://twitter.com/ykilcher

    Discord: https://discord.gg/4H8xxDF

    BitChute: https://www.bitchute.com/channel/yann...

    Minds: https://www.minds.com/ykilcher

    Parler: https://parler.com/profile/YannicKilcher

    LinkedIn: https://www.linkedin.com/in/yannic-ki...

    BiliBili: https://space.bilibili.com/1824646584



    If you want to support me, the best thing to do is to share out the content :)



    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

    SubscribeStar: https://www.subscribestar.com/yannick...

    Patreon: https://www.patreon.com/yannickilcher

    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2

    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

    • 17 min
    ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation

    ALiBi - Train Short, Test Long: Attention with linear biases enables input length extrapolation

    #alibi #transformers #attention



    Transformers are essentially set models that need additional inputs to make sense of sequence data. The most widespread additional inputs are position encodings or position embeddings, which add sequence index information in various forms. However, this has put a limit on the resulting model, which cannot run inference on sequences longer than it has been trained on, as it would encounter unfamiliar position encodings. ALiBi solves this by proposing simple linear fixed biases as position information, adding negligible overhead in time and memory, but surprisingly, the resulting model is able to handle inference on sequences many times as long as its training sequences.



    OUTLINE:

    0:00 - Intro & Overview

    1:40 - Position Encodings in Transformers

    4:55 - Sinusoidial Position Encodings

    11:50 - ALiBi Position Encodings

    20:50 - How to choose the slope parameter

    23:55 - Experimental Results

    29:10 - Comments & Conclusion



    Paper: https://ofir.io/train_short_test_long...

    Code: https://github.com/ofirpress/attentio...



    Abstract:

    Since the introduction of the transformer model by Vaswani et al. (2017), a fundamental question remains open: how to achieve extrapolation at inference time to longer sequences than seen during training? We first show that extrapolation can be improved by changing the position representation method, though we find that existing proposals do not allow efficient extrapolation. We introduce a simple and efficient method, Attention with Linear Biases (ALiBi), that allows for extrapolation. ALiBi does not add positional embeddings to the word embeddings; instead, it biases the query-key attention scores with a term that is proportional to their distance. We show that this method allows training a 1.3 billion parameter model on input sequences of length 1024 that extrapolates to input sequences of length 2048, achieving the same perplexity as a sinusoidal position embedding model trained on inputs of length 2048, 11% faster and using 11% less memory. ALiBi’s inductive bias towards recency allows it to outperform multiple strong position methods on the WikiText-103 benchmark. Finally, we provide analysis of ALiBi to understand why it leads to better performance.



    Authors: Ofir Press, Noah A. Smith, Mike Lewis



    Links:

    TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick

    YouTube: https://www.youtube.com/c/yannickilcher

    Twitter: https://twitter.com/ykilcher

    Discord: https://discord.gg/4H8xxDF

    BitChute: https://www.bitchute.com/channel/yann...

    Minds: https://www.minds.com/ykilcher

    Parler: https://parler.com/profile/YannicKilcher

    LinkedIn: https://www.linkedin.com/in/yannic-ki...

    BiliBili: https://space.bilibili.com/1824646584



    If you want to support me, the best thing to do is to share out the content :)



    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):

    SubscribeStar: https://www.subscribestar.com/yannick...

    Patreon: https://www.patreon.com/yannickilcher

    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2

    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m

    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

    • 31 min

Top Podcasts In Technology

Listeners Also Subscribed To