Welcome to the NLP highlights podcast, where we invite researchers to talk about their work in various areas in natural language processing. The hosts are Matt Gardner, Pradeep Dasigi (research scientists at the Allen Institute for Artificial Intelligence) and Waleed Ammar (research scientist at Google). All views expressed belong to the hosts and guests and do not represent their employers.
122 - Statutory Reasoning in Tax Law, with Nils Holzenberger
We invited Nils Holzenberger, a PhD student at JHU to talk about a dataset involving statutory reasoning in tax law Holzenberger et al. released recently. This dataset includes difficult textual entailment and question answering problems that involve reasoning about how sections in tax law are applicable to specific cases. They also released a Prolog solver that fully solves the problems, and show that learned models using dense representations of text perform poorly. We discussed why this is the case, and how one can train models to solve these challenges.
Project webpage: https://nlp.jhu.edu/law/
121 - Language and the Brain, with Alona Fyshe
We invited Alona Fyshe to talk about the link between NLP and the human brain. We began by talking about what we currently know about the connection between representations used in NLP and representations recorded in the brain. We also discussed how different brain imaging techniques compare to each other. We then dove into experiments investigating how hidden states of LSTM language models correlate with EEG brain imaging data on three types of language inputs: well-formed grammatical sentences, pseudo-word sentences preserving syntax but not semantics, and word-lists preserving neither. We talk about the kinds of conclusions that can be drawn from these correlations and conclude by discussing avenues for future work.
120 - Evaluation of Text Generation, with Asli Celikyilmaz
We invited Asli Celikyilmaz for this episode to talk about evaluation of text generation systems. We discussed the challenges in evaluating generated text, and covered human and automated metrics, with a discussion of recent developments in learning metrics. We also talked about some open research questions, including the difficulties in evaluating factual correctness of generated text.
Asli Celikyilmaz is a Principal Researcher at Microsoft Research.
Link to a survey co-authored by Asli on this topic: https://arxiv.org/abs/2006.14799
119 - Social NLP, with Diyi Yang
In this episode, Diyi Yang gives us an overview of using NLP models for social applications, including understanding social relationships, processes, roles, and power. As NLP systems are getting used more and more in the real world, they additionally have increasing social impacts that must be studied. We talk about how to get started in this field, what datasets exist and are commonly used, and potential ethical issues. We additionally cover two of Diyi's recent papers, on neutralizing subjective bias in text, and on modeling persuasiveness in text.
Diyi Yang is an assistant professor in the School of Interactive Computing at Georgia Tech.
118 - Coreference Resolution, with Marta Recasens
In this episode, we talked about Coreference Resolution with Marta Recasens, a Research Scientist at Google. We discussed the complexity involved in resolving references in language, the simplification of the problem that the NLP community has focused on by talking about specific datasets, and the complex coreference phenomena that are not yet captured in those datasets. We also briefly talked about how coreference is handled in languages other than English, and how some of the notions we have about modeling coreference phenomena in English do not necessarily transfer to other languages. We ended the discussion by talking about large language models, and to what extent they might be good at handling coreference.
117 - Interpreting NLP Model Predictions, with Sameer Singh
We interviewed Sameer Singh for this episode, and discussed an overview of recent work in interpreting NLP model predictions, particularly instance-level interpretations. We started out by talking about why it is important to interpret model outputs and why it is a hard problem. We then dove into the details of three kinds of interpretation techniques: attribution based methods, interpretation using influence functions, and generating explanations. Towards the end, we spent some time discussing how explanations of model behavior can be evaluated, and some limitations and potential concerns in evaluation methods.
Sameer Singh is an Assistant Professor of Computer Science at the University of California, Irvine.
Some of the techniques discussed in this episode have been implemented in the AllenNLP Interpret framework (details and demo here: https://allennlp.org/interpret).