Keeping you up to date with the latest trends and best performing architectures in this fast evolving field in computer science.
Selecting papers by comparative results, citations and influence we educate you on the latest research.
Consider supporting us on Patreon.com/PapersRead for feedback and ideas.
Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers
Large pretrained language models have shown surprising In-Context Learning (ICL) ability. With a few demonstration input-label pairs, they can predict the label for an unseen input without additional parameter updates. Despite the great success in performance, the working mechanism of ICL still remains an open problem. In order to better understand how ICL works, this paper explains language models as meta-optimizers and understands ICL as a kind of implicit ﬁnetuning.
2022: Damai Dai, Yutao Sun, Li Dong, Y. Hao, Zhifang Sui, Furu Wei
How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection
The introduction of ChatGPT has garnered widespread attention in both academic and industrial communities. ChatGPT is able to respond effectively to a wide range of human questions, providing fluent and comprehensive answers that significantly surpass previous public chatbots in terms of security and usefulness. On one hand, people are curious about how ChatGPT is able to achieve such strength and how far it is from human experts. On the other hand, people are starting to worry about the potential negative impacts that large language models (LLMs) like ChatGPT could have on society, such as fake news, plagiarism, and social security issues. In this work, we collected tens of thousands of comparison responses from both human experts and ChatGPT, with questions ranging from open-domain, financial, medical, legal, and psychological areas. We call the collected dataset the Human ChatGPT Comparison Corpus (HC3). Based on the HC3 dataset, we study the characteristics of ChatGPT's responses, the differences and gaps from human experts, and future directions for LLMs. We conducted comprehensive human evaluations and linguistic analyses of ChatGPT-generated content compared with that of humans, where many interesting results are revealed.
2023: Biyang Guo, Xin Zhang, Ziyuan Wang, Minqi Jiang, Jinran Nie, Yuxuan Ding, Jianwei Yue, Yupeng Wu
Why do Nearest Neighbor Language Models Work?
Language models (LMs) compute the probability of a text by sequentially computing a representation of an already-seen context and using this representation to predict the next word. Currently, most LMs calculate these representations through a neural network consuming the immediate previous context. However recently, retrieval-augmented LMs have shown to improve over standard neural LMs, by accessing information retrieved from a large datastore, in addition to their standard, parametric, next-word prediction. In this paper, we set out to understand why retrieval-augmented language models, and speciﬁcally why k -nearest neighbor language models ( k NN-LMs) perform better than standard parametric LMs, even when the k -nearest neighbor component retrieves examples from the same training set that the LM was originally trained on.
2023: Frank F. Xu, Uri Alon, Graham Neubig
Text2Poster: Laying Out Stylized Texts on Retrieved Images
Poster generation is a significant task for a wide range of applications, which is often time-consuming and requires lots of manual editing and artistic experience. In this paper, we propose a novel data-driven framework, called Text2Poster, to automatically generate visually-effective posters from textual information. Imitating the process of manual poster editing, our framework leverages a large-scale pretrained visual-textual model to retrieve background images from given texts, lays out the texts on the images iteratively by cascaded autoencoders, and finally, stylizes the texts by a matching-based method. We learn the modules of the framework by weakly-and self-supervised learning strategies, mitigating the demand for labeled data. Both objective and subjective experiments demonstrate that our Text2Poster outperforms state-of-the-art methods, including academic research and commercial software, on the quality of generated posters.
2022: Chuhao Jin, H. Xu, Ruihua Song, Zhiwu Lu
Designing BERT for Convolutional Networks: Sparse and Hierarchical Masked Modeling
We identify and overcome two key obstacles in extending the success of BERT-style pre-training, or the masked image modeling, to convolutional networks (convnets). We validate it on both classical (ResNet) and modern (ConvNeXt) models. Improvements on object detection and instance segmentation are more substantial (up to +3.5%), verifying the strong transferability of features learned. We also find its favorable scaling behavior by observing more gains on larger models. All this evidence reveals a promising future of generative pre-training on convnets.
2023: Keyu Tian, Yi Jiang, Qishuai Diao, Chen Lin, Liwei Wang, Zehuan Yuan
Ranked #1 on Instance Segmentation on COCO 2017 val
Reversible Column Networks
We propose a new neural network design paradigm Reversible Column Network
(RevCol). The main body of RevCol is composed of multiple copies of subnetworks,
named columns respectively, between which multi-level reversible connections are
2022: Y. Cai, Yi Zhou, Qi Han, Jia-Ying Sun, Xiangwen Kong, Jun Yu Li, Xiangyu Zhang