2 DAYS AGO
8 MIN

[QA] Evaluation of Large Language Models via Coupled Token Generation

This paper argues for controlling randomization in evaluating large language models, showing that coupled autoregressive generation can yield different rankings than vanilla methods, despite fewer required samples.

https://arxiv.org/abs//2502.01754

YouTube: https://www.youtube.com/@ArxivPapers

TikTok: https://www.tiktok.com/@arxiv_papers

Apple Podcasts: https://podcasts.apple.com/us/podcast/arxiv-papers/id1692476016

Spotify: https://podcasters.spotify.com/pod/show/arxiv-papers

Episode Webpage

Show

Arxiv Papers
Frequency

Updated Daily
Published

February 5, 2025 at 5:00 AM UTC
Length

8 min
Rating

Clean

[QA] Evaluation of Large Language Models via Coupled Token Generation

Information