Human Language ModelingusingHaRT: Human-aware Recurrent Transformers

Language Modeling as a task grounded in the "natural" generators of language, people.

Objective

To model the probability of the next word w {t,i} in the current document t based on past words w {t,1:i-1} in the document and a user state U {1:t-1}

Goal

Background

Language modeling is fundamental to NLP, with many large transformer based models becoming widespread.

models

So, What's missing?

Large language models treat dependent inputs as independent even when they are not.

Additionally, the inherent higher order structure of language, which is words come from documents and documents come from humans, is not explicit in the language modeling tasks of large LMs.

Indeed different ways of incorporating human information into NLP models have recently been shown to improve accuracy on many NLP tasks. HuLM brings together ideas from human factor inclusion/adaptation and personalized modeling into the framework of large pre-trained models. While not the primary goal, human language modeling may yield effective approaches to extend the context during language modeling. So broadly, HuLM relates to 3 areas of prior work.

Task: Human Language Modeling (HuLM)

To address the above gaps, we propose Human Language Modeling (HuLM), a language modeling task grounded in the "natural" generators of language, people.

Building from the traditional language task, that is

Traditional Language task

In HuLM, we also condition on a user state

User state

But, human states are somewhat stable but not entirely static.

Stable

(Washington Outsider, 2014)

To account for this, we condition on a dynamic user state:

Dynamic User state

Method: Human-aware Recurrent Transformer (HaRT)

To address HuLM, we introduce HaRT: Human-aware Recurrent Transformer, an auto-regressive transformer with a recurrent user state. HaRT builds on the recurrent Transformer approaches from Yoshida et al., 2020 and Transformer-XL ( Dai et al., 2019).

Pre-training

We pre-train HaRT for the HuLM task on two datsets

  • HuLM Corpus (HLC) from the paper: Twitter and Facebook datasets which we cannot release for privacy considerations.
  • The Twitter dataset (HaRTTwt) from our paper to release the model publicly.

State-of-the-Art Results

For comparison, we evaluate HaRTTwt on the language modeling task (over test data from the paper and Twitter-only test data) and document-level fine-tuning task. HaRTTwt has a slight difference in the results but is in alignment with the full HaRT model (pre-trained on HLC). HaRTTwt training and evaluations were run on two DGX A100 GPUs.

Language Model Perplexity

ModelTest HLC (ppl)Test Twt (ppl)

GPT-2 frozen

116.35144.67

GPT-2 HLC

48.5139.93

HaRT Twt

33.1523.76
HaRT26.1124.70

Document-level Downstream Tasks

ModelStance (F1)Sentiment (F1)

GPT-2 HLC

68.6076.75

HaRT Twt

70.5377.01
HaRT71.1078.25