Beyond ChatGPT and Gemini: Anthropic's Claude and the $4 billion Amazon investment. How AI industry benchmarks work, including LMSYS Arena Elo and MMLU (Measuring Massive Multitask Language Understanding). How benchmarks are constructed, what they measure, and how to use them to evaluate LLMs. Solo episode.
Anthropic's Claude
https://claude.ai [Note: I am not sponsored by Anthropic]
LMSYS Leaderboard
https://chat.lmsys.org/?leaderboard
To stay in touch, sign up for our newsletter at https://www.superprompt.fm
정보
- 프로그램
- 주기매주 업데이트
- 발행일2024년 5월 27일 오전 10:00 UTC
- 길이11분
- 시즌1
- 에피소드24
- 등급전체 연령 사용가
