Human Language ModelingusingHaRT: Human-aware Recurrent Transformers

Language Modeling as a task grounded in the "natural" generators of language, people.

Objective

To model the probability of the next word w {t,i} in the current document t based on past words w {t,1:i-1} in the document and a user state U {1:t-1}

Background

Language modeling is fundamental to NLP, with many large transformer based models becoming widespread.

So, What's missing?

Large language models treat dependent inputs as independent even when they are not.

Additionally, the inherent higher order structure of language, which is words come from documents and documents come from humans, is not explicit in the language modeling tasks of large LMs.

Indeed different ways of incorporating human information into NLP models have recently been shown to improve accuracy on many NLP tasks. HuLM brings together ideas from human factor inclusion/adaptation and personalized modeling into the framework of large pre-trained models. While not the primary goal, human language modeling may yield effective approaches to extend the context during language modeling. So broadly, HuLM relates to 3 areas of prior work.

Integrating human context: Social and human factors

Hovy, 2015 Lynn et al., 2017 Huang and Paul, 2019 Hovy and Yang, 2021

Personalized Language Models

King & Cook, 2020 Jaech & Ostendorf, 2018 Matero et al., 2021 Delasalles et al., 2019

Longer context enabled LMs

Dai et al., 2018 Betalgy et al., 2020 Guu et al., 2020 Yoshida et al., 2020

Task: Human Language Modeling (HuLM)

To address the above gaps, we propose Human Language Modeling (HuLM), a language modeling task grounded in the "natural" generators of language, people.

Building from the traditional language task, that is

In HuLM, we also condition on a user state

But, human states are somewhat stable but not entirely static.

(Washington Outsider, 2014)

To account for this, we condition on a dynamic user state:

Method: Human-aware Recurrent Transformer (HaRT)

To address HuLM, we introduce HaRT: Human-aware Recurrent Transformer, an auto-regressive transformer with a recurrent user state. HaRT builds on the recurrent Transformer approaches from Yoshida et al., 2020 and Transformer-XL ( Dai et al., 2019).

Pre-training

We pre-train HaRT for the HuLM task on two datsets

HuLM Corpus (HLC) from the paper: Twitter and Facebook datasets which we cannot release for privacy considerations.
The Twitter dataset (HaRTTwt) from our paper to release the model publicly.

State-of-the-Art Results

For comparison, we evaluate HaRTTwt on the language modeling task (over test data from the paper and Twitter-only test data) and document-level fine-tuning task. HaRTTwt has a slight difference in the results but is in alignment with the full HaRT model (pre-trained on HLC). HaRTTwt training and evaluations were run on two DGX A100 GPUs.

Language Model Perplexity

Model	Test HLC (ppl)	Test Twt (ppl)
GPT-2 frozen	116.35	144.67
GPT-2 HLC	48.51	39.93
HaRT Twt	33.15	23.76
HaRT	26.11	24.70

Document-level Downstream Tasks

Model	Stance (F1)	Sentiment (F1)
GPT-2 HLC	68.60	76.75
HaRT Twt	70.53	77.01
HaRT	71.10	78.25