30 min

[S2-E9]LaMDA: Language Models for Dialog Applications SeaVoice Stories

    • Leisure

Source: https://arxiv.org/pdf/2201.08239.pdf
LaMDA: Language Models for Dialog Applications

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformerbased neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text.

While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding.

The first challenge, safety, involves ensuring that the model’s responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety.

The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible.

Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency

Figure 1: Impact of model pre-training alone vs. with fine-tuning in LaMDA on dialog quality (left), and safety and factual grounding (right). The quality metric (SSI) corresponds to sensibleness, specificity, and interestingness. See Section 4 for more details on these metrics.
1 Introduction
Language model pre-training is an increasingly promising research approach in NLP [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. As pre-training uses unlabeled text, it can be combined with scaling model and dataset sizes to achieve better performance or new capabilities [13]. For example, GPT-3 [12], a 175B parameter model trained on a large corpus of unlabeled text, shows an impressive ability in few-shot learning thanks to scaling. 
Dialog models [14, 15, 16], one of the most interesting applications of large language models, successfully take advantage of Transformers’ ability to represent long-term dependencies in text [17, 18]. Similar to general language models [13], Adiwardana et al. [17] show that dialog models are also well suited to model scaling. There is a strong correlation between model size and dialog quality. Inspired by these successes, we train LaMDA, a family of Transformer-based neural language models designed for dialog.
These models’ sizes range from 2B to 137B parameters, and they are pre-trained on a dataset of 1.56T words from public dialog data and other public web documents (Section 3). LaMDA makes use of a single model to perform multiple tasks: it generates potential responses, which are then filtered for safety, grounded on an external knowledge source, and re-ranked to find the highest-quality response. We study the benefits of model scaling with LaMDA on our three key metrics: quality, safety, and groundedness (Section 4).
We observe that: (a) model scaling alone improves quality, but its improvements on safety and groundedness are far behind human performance, and (b) combining scaling and fine-tuning improves LaMDA significantly on all metrics, and although the model’s performance remains below human levels in safety and groundedness, the quality gap to measured crowdworker levels can be narrowed (labeled ‘Human’ in Figure 1). The first metric, quality, is based on three components: sensibleness, specificity, and interestingness (Section 4).
We collect annotated d

Source: https://arxiv.org/pdf/2201.08239.pdf
LaMDA: Language Models for Dialog Applications

We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformerbased neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text.

While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding.

The first challenge, safety, involves ensuring that the model’s responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety.

The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible.

Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency

Figure 1: Impact of model pre-training alone vs. with fine-tuning in LaMDA on dialog quality (left), and safety and factual grounding (right). The quality metric (SSI) corresponds to sensibleness, specificity, and interestingness. See Section 4 for more details on these metrics.
1 Introduction
Language model pre-training is an increasingly promising research approach in NLP [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]. As pre-training uses unlabeled text, it can be combined with scaling model and dataset sizes to achieve better performance or new capabilities [13]. For example, GPT-3 [12], a 175B parameter model trained on a large corpus of unlabeled text, shows an impressive ability in few-shot learning thanks to scaling. 
Dialog models [14, 15, 16], one of the most interesting applications of large language models, successfully take advantage of Transformers’ ability to represent long-term dependencies in text [17, 18]. Similar to general language models [13], Adiwardana et al. [17] show that dialog models are also well suited to model scaling. There is a strong correlation between model size and dialog quality. Inspired by these successes, we train LaMDA, a family of Transformer-based neural language models designed for dialog.
These models’ sizes range from 2B to 137B parameters, and they are pre-trained on a dataset of 1.56T words from public dialog data and other public web documents (Section 3). LaMDA makes use of a single model to perform multiple tasks: it generates potential responses, which are then filtered for safety, grounded on an external knowledge source, and re-ranked to find the highest-quality response. We study the benefits of model scaling with LaMDA on our three key metrics: quality, safety, and groundedness (Section 4).
We observe that: (a) model scaling alone improves quality, but its improvements on safety and groundedness are far behind human performance, and (b) combining scaling and fine-tuning improves LaMDA significantly on all metrics, and although the model’s performance remains below human levels in safety and groundedness, the quality gap to measured crowdworker levels can be narrowed (labeled ‘Human’ in Figure 1). The first metric, quality, is based on three components: sensibleness, specificity, and interestingness (Section 4).
We collect annotated d

30 min

Top Podcasts In Leisure

Critical Role
Si Robertson & Justin Martin
Rooster Teeth
Kinda Funny
Chris Plante, Griffin McElroy, Justin McElroy, Russ Frushtick
The Car Mom LLC / tentwentytwo Projects