177 avsnitt

I make videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society.

Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar (preferred to Patreon): https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

Yannic Kilcher Videos (Audio Only‪)‬ Yannic Kilcher

    • Teknologi

I make videos about machine learning research papers, programming, and issues of the AI community, and the broader impact of AI in society.

Twitter: https://twitter.com/ykilcher
Discord: https://discord.gg/4H8xxDF

If you want to support me, the best thing to do is to share out the content :)

If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
SubscribeStar (preferred to Patreon): https://www.subscribestar.com/yannickilcher
Patreon: https://www.patreon.com/yannickilcher
Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq

    Efficient Streaming Language Models with Attention Sinks (Paper Explained)

    Efficient Streaming Language Models with Attention Sinks (Paper Explained)

    #llm #ai #chatgpt

    How does one run inference for a generative autoregressive language model that has been trained with a fixed context size? Streaming LLMs combine the performance of windowed attention, but avoid the drop in performance by using attention sinks - an interesting phenomenon where the token at position 0 acts as an absorber of "extra" attention.

    OUTLINE:
    0:00 - Introduction
    1:20 - What is the problem?
    10:30 - The hypothesis: Attention Sinks
    15:10 - Experimental evidence
    18:45 - Streaming LLMs
    20:45 - Semantics or position?
    22:30 - Can attention sinks be learned?
    27:45 - More experiments
    30:10 - Comparison to Big Bird


    Paper: https://arxiv.org/abs/2309.17453

    Abstract:
    Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes extensive memory. Secondly, popular LLMs cannot generalize to longer texts than the training sequence length. Window attention, where only the most recent KVs are cached, is a natural approach -- but we show that it fails when the text length surpasses the cache size. We observe an interesting phenomenon, namely attention sink, that keeping the KV of initial tokens will largely recover the performance of window attention. In this paper, we first demonstrate that the emergence of attention sink is due to the strong attention scores towards initial tokens as a ``sink'' even if they are not semantically important. Based on the above analysis, we introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence lengths without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. In addition, we discover that adding a placeholder token as a dedicated attention sink during pre-training can further improve streaming deployment. In streaming settings, StreamingLLM outperforms the sliding window recomputation baseline by up to 22.2x speedup. Code and datasets are provided at this https URL.

    Authors: Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis

    Links:
    Homepage: https://ykilcher.com
    Merch: https://ykilcher.com/merch
    YouTube: https://www.youtube.com/c/yannickilcher
    Twitter: https://twitter.com/ykilcher
    Discord: https://ykilcher.com/discord
    LinkedIn: https://www.linkedin.com/in/ykilcher

    If you want to support me, the best thing to do is to share out the content :)

    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: https://www.subscribestar.com/yannickilcher
    Patreon: https://www.patreon.com/yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

    • 32 min
    Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution (Paper Explained)

    Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution (Paper Explained)

    #ai #promptengineering #evolution

    Promptbreeder is a self-improving self-referential system for automated prompt engineering. Give it a task description and a dataset, and it will automatically come up with appropriate prompts for the task. This is achieved by an evolutionary algorithm where not only the prompts, but also the mutation-prompts are improved over time in a population-based, diversity-focused approach.

    OUTLINE:
    0:00 - Introduction
    2:10 - From manual to automated prompt engineering
    10:40 - How does Promptbreeder work?
    21:30 - Mutation operators
    36:00 - Experimental Results
    38:05 - A walk through the appendix

    Paper: https://arxiv.org/abs/2309.16797

    Abstract:
    Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strategies are often sub-optimal. In this paper, we present Promptbreeder, a general-purpose self-referential self-improvement mechanism that evolves and adapts prompts for a given domain. Driven by an LLM, Promptbreeder mutates a population of task-prompts, and subsequently evaluates them for fitness on a training set. Crucially, the mutation of these task-prompts is governed by mutation-prompts that the LLM generates and improves throughout evolution in a self-referential way. That is, Promptbreeder is not just improving task-prompts, but it is also improving the mutationprompts that improve these task-prompts. Promptbreeder outperforms state-of-the-art prompt strategies such as Chain-of-Thought and Plan-and-Solve Prompting on commonly used arithmetic and commonsense reasoning benchmarks. Furthermore, Promptbreeder is able to evolve intricate task-prompts for the challenging problem of hate speech classification.

    Authors: Chrisantha Fernando, Dylan Banarse, Henryk Michalewski, Simon Osindero, Tim Rocktäschel

    Links:
    Homepage: https://ykilcher.com
    Merch: https://ykilcher.com/merch
    YouTube: https://www.youtube.com/c/yannickilcher
    Twitter: https://twitter.com/ykilcher
    Discord: https://ykilcher.com/discord
    LinkedIn: https://www.linkedin.com/in/ykilcher

    If you want to support me, the best thing to do is to share out the content :)

    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: https://www.subscribestar.com/yannickilcher
    Patreon: https://www.patreon.com/yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

    • 46 min
    Retentive Network: A Successor to Transformer for Large Language Models (Paper Explained)

    Retentive Network: A Successor to Transformer for Large Language Models (Paper Explained)

    #ai #retnet #transformers

    Retention is an alternative to Attention in Transformers that can both be written in a parallel and in a recurrent fashion. This means the architecture achieves training parallelism while maintaining low-cost inference. Experiments in the paper look very promising.

    OUTLINE:
    0:00 - Intro
    2:40 - The impossible triangle
    6:55 - Parallel vs sequential
    15:35 - Retention mechanism
    21:00 - Chunkwise and multi-scale retention
    24:10 - Comparison to other architectures
    26:30 - Experimental evaluation

    Paper: https://arxiv.org/abs/2307.08621

    Abstract:
    In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows for training parallelism. The recurrent representation enables low-cost O(1) inference, which improves decoding throughput, latency, and GPU memory without sacrificing performance. The chunkwise recurrent representation facilitates efficient long-sequence modeling with linear complexity, where each chunk is encoded parallelly while recurrently summarizing the chunks. Experimental results on language modeling show that RetNet achieves favorable scaling results, parallel training, low-cost deployment, and efficient inference. The intriguing properties make RetNet a strong successor to Transformer for large language models. Code will be available at this https URL.

    Authors: Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, Furu Wei


    Links:
    Homepage: https://ykilcher.com
    Merch: https://ykilcher.com/merch
    YouTube: https://www.youtube.com/c/yannickilcher
    Twitter: https://twitter.com/ykilcher
    Discord: https://ykilcher.com/discord
    LinkedIn: https://www.linkedin.com/in/ykilcher

    If you want to support me, the best thing to do is to share out the content :)

    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: https://www.subscribestar.com/yannickilcher
    Patreon: https://www.patreon.com/yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

    • 28 min
    Reinforced Self-Training (ReST) for Language Modeling (Paper Explained)

    Reinforced Self-Training (ReST) for Language Modeling (Paper Explained)

    #ai #rlhf #llm

    ReST uses a bootsrap-like method to produce its own extended dataset and trains on ever higher-quality subsets of it to improve its own reward. The method allows for re-using the same generated data multiple times and thus has an efficiency advantage with respect to Online RL techniques like PPO.

    Paper: https://arxiv.org/abs/2308.08998

    Abstract:
    Reinforcement learning from human feedback (RLHF) can improve the quality of large language model's (LLM) outputs by aligning them with human preferences. We propose a simple algorithm for aligning LLMs with human preferences inspired by growing batch reinforcement learning (RL), which we call Reinforced Self-Training (ReST). Given an initial LLM policy, ReST produces a dataset by generating samples from the policy, which are then used to improve the LLM policy using offline RL algorithms. ReST is more efficient than typical online RLHF methods because the training dataset is produced offline, which allows data reuse. While ReST is a general approach applicable to all generative learning settings, we focus on its application to machine translation. Our results show that ReST can substantially improve translation quality, as measured by automated metrics and human evaluation on machine translation benchmarks in a compute and sample-efficient manner.

    Authors: Caglar Gulcehre, Tom Le Paine, Srivatsan Srinivasan, Ksenia Konyushkova, Lotte Weerts, Abhishek Sharma, Aditya Siddhant, Alex Ahern, Miaosen Wang, Chenjie Gu, Wolfgang Macherey, Arnaud Doucet, Orhan Firat, Nando de Freitas

    Links:
    Homepage: https://ykilcher.com
    Merch: https://ykilcher.com/merch
    YouTube: https://www.youtube.com/c/yannickilcher
    Twitter: https://twitter.com/ykilcher
    Discord: https://ykilcher.com/discord
    LinkedIn: https://www.linkedin.com/in/ykilcher

    If you want to support me, the best thing to do is to share out the content :)

    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: https://www.subscribestar.com/yannickilcher
    Patreon: https://www.patreon.com/yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

    • 53 min
    [ML News] LLaMA2 Released | LLMs for Robots | Multimodality on the Rise

    [ML News] LLaMA2 Released | LLMs for Robots | Multimodality on the Rise

    #mlnews #llama2 #openai

    Your regular irregular update on the world of Machine Learning.

    References:
    https://twitter.com/ylecun/status/1681336284453781505
    https://ai.meta.com/llama/
    https://about.fb.com/news/2023/07/llama-2-statement-of-support/
    https://247wallst.com/special-report/2023/08/12/this-is-the-biggest-social-media-platform-ranking-the-worlds-largest-networking-sites/4/
    https://github.com/Alpha-VLLM/LLaMA2-Accessory
    https://together.ai/blog/llama-2-7b-32k?s=09&utm_source=pocket_saves
    https://github.com/imoneoi/openchat
    https://twitter.com/lmsysorg/status/1686794639469371393?s=09&t=sS3awkbavmSMSmwp64Ef4A&utm_source=pocket_saves
    https://huggingface.co/lmsys/vicuna-13b-v1.5-16k
    https://blog.google/outreach-initiatives/public-policy/google-microsoft-openai-anthropic-frontier-model-forum/
    https://www.earthdata.nasa.gov/news/impact-ibm-hls-foundation-model?utm_source=pocket_reader
    https://huggingface.co/ibm-nasa-geospatial/Prithvi-100M
    https://ai.meta.com/blog/generative-ai-text-images-cm3leon/
    https://www.deepmind.com/blog/rt-2-new-model-translates-vision-and-language-into-action?utm_source=twitter&utm_medium=social&utm_campaign=rt2
    https://arxiv.org/abs/2307.14334
    https://sites.research.google/med-palm/
    https://open-catalyst.metademolab.com/?utm_source=twitter&utm_medium=organic_social&utm_campaign=opencatalyst&utm_content=card
    https://open-catalyst.metademolab.com/demo
    https://www.anthropic.com/index/claude-2?utm_source=pocket_reader
    https://claude.ai/login
    https://audiocraft.metademolab.com/?utm_source=pocket_saves
    https://venturebeat.com/programming-development/stability-ai-launches-stablecode-an-llm-for-code-generation/
    https://stability.ai/blog/stablecode-llm-generative-ai-coding
    https://twitter.com/JeffDean/status/1686806525862608896?s=09&t=LG2z9ok9QExHbSy0fvBsxA&utm_source=pocket_saves
    https://sites.research.google/open-buildings/
    https://twitter.com/deliprao/status/1687283117873106946?s=09&t=1NmC-B55Z8IuF_HTuGOo7w&utm_source=pocket_saves
    https://arxiv.org/pdf/2308.01320.pdf
    https://twitter.com/javilopen/status/1687795349719547905?utm_source=pocket_saves
    https://research.nvidia.com/labs/par/Perfusion/
    https://ar5iv.labs.arxiv.org/html/2307.14936
    https://www.linkedin.com/feed/update/urn:li:activity:7093463974750371840/?utm_source=pocket_saves
    https://huggingface.co/syzymon/long_llama_3b_instruct
    https://arxiv.org/abs/2307.03170
    https://dynalang.github.io/
    https://github.com/mlfoundations/open_flamingo
    https://twitter.com/akshay_pachaar/status/1687079353937698816?s=09&t=fos8QSCsGEEM6dMflhq0Mg&utm_source=pocket_saves
    https://github.com/OpenBMB/ToolBench
    https://llm-attacks.org/
    https://arstechnica.com/information-technology/2023/07/openai-discontinues-its-ai-writing-detector-due-to-low-rate-of-accuracy/
    https://sites.google.com/view/steve-1
    https://github.com/Shalev-Lifshitz/STEVE-1
    https://erichartford.com/dolphin
    https://huggingface.co/ehartford/dolphin-llama-13b
    https://www.mosaicml.com/blog/long-context-mpt-7b-8k
    https://twitter.com/camenduru/status/1688045780244848640?s=09&t=ubJ2Qtz-TG6Xo3_GMtt2Cw&utm_source=pocket_saves
    https://github.com/IDEA-Research/DWPose
    https://twitter.com/tri_dao/status/1680987577913065472?s=09&t=Q181vFmM6d3nDq-5BwfDeg&utm_source=pocket_saves
    https://tridao.me/publications/flash2/flash2.pdf
    https://thehackernews.com/2023/07/wormgpt-new-ai-tool-allows.html
    https://www.tomshardware.com/news/ai-steals-data-with-keystroke-audio
    https://arxiv.org/pdf/2308.01074.pdf
    https://www.foxnews.com/politics/ai-test-flight-air-force-unmanned-wingman-aircraft
    https://www.theverge.com/2023/8/2/23817406/white-castle-soundhound-ai-sliders
    https://www.google.com/search?sca_esv=556495916&q=food+delivery+bot+kicked&tbm=vid&source=lnms&sa=X&ved=2ahUKEwjZ6PDPrdmAAxUThf0HHWzrBGgQ0pQJegQIChAB&cshid=1691920142432720&biw=2327&bih=1180&dpr=2.2
    https://www.youtube.com/watch?v=--n_NhmXnfc
    https://www.thesun.co.uk/tech/20793591/coop-delivery-robots-cambridge-kicked-by-workers-tiktok/

    • 44 min
    How Cyber Criminals Are Using ChatGPT (w/ Sergey Shykevich)

    How Cyber Criminals Are Using ChatGPT (w/ Sergey Shykevich)

    #cybercrime #chatgpt #security

    An interview with Sergey Shykevich, Threat Intelligence Group Manager at Check Point, about how models like ChatGPT have impacted the realm of cyber crime.

    https://threatmap.checkpoint.com/

    Links:
    Homepage: https://ykilcher.com
    Merch: https://ykilcher.com/merch
    YouTube: https://www.youtube.com/c/yannickilcher
    Twitter: https://twitter.com/ykilcher
    Discord: https://ykilcher.com/discord
    LinkedIn: https://www.linkedin.com/in/ykilcher

    If you want to support me, the best thing to do is to share out the content :)

    If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this):
    SubscribeStar: https://www.subscribestar.com/yannickilcher
    Patreon: https://www.patreon.com/yannickilcher
    Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq
    Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2
    Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m
    Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

    • 29 min

Mest populära poddar inom Teknologi

Lex Fridman Podcast
Lex Fridman
Internetpionjärerna
Tele2
Acquired
Ben Gilbert and David Rosenthal
Hard Fork
The New York Times
Elbilsveckan
Peter Esse & Christoffer Gullin
Darknet Diaries
Jack Rhysider