A language model is a machine learning model designed to represent the language domain. It can be used as a basis for a number of different language-based tasks, for instance:
Question answering
Semantic search
Summarization
and plenty of other tasks that operate on natural language.
In a domain like weather forecasting, it’s easy to see how past data helps a model to predict a future state. But how do you apply that to language?
How language modeling works
Language models determine word probability by analyzing text data. They interpret this data by feeding it through an algorithm that establishes rules for context in natural language. Then, the model applies these rules in language tasks to accurately predict or produce new sentences. The model essentially learns the features and characteristics of basic language and uses those features to understand new phrases.
Language Model Types
- N-gram
- Unigram
- Exponential
- Neural Network
Notable language models - Pathways Language Model (PaLM) 540 billion parameter model, from Google Research.
- Generalist Language Model (GLaM) 1 trillion parameter model, from Google Research
- Language Models for Dialog Applications (LaMDA) 137 billion parameter model from Google Research
- Megatron-Turing NLG 530 billion parameter model, from Microsoft/Nvidia
- DreamFusion/Imagen 3D image generation from Google Research
- Get3D from Nvidia
- MineClip from Nvidia
- BLOOM: BigScience Large Open-science Open-access Multilingual Language Model with 176 billion parameters.
- Generative pre-trained transformer (GPT)
- GPT-2: Generative Pre-trained Transformer 2 with 1.5 billion parameters.
- GPT-3: Generative Pre-trained Transformer 3, with the unprecedented size of 2048-token-long context and 175 billion parameters (requiring 800 GB of storage).
- GPT-3.5/ChatGPT/InstructGPT from OpenAI
- GPT-NeoX-20B: An Open-Source Autoregressive Language Model with 20 billion parameters.
- BERT: Bidirectional Encoder Representations from Transformers (BERT)
- OPT-175B by Meta AI: another 175-billion-parameter language model. It is available to the broader AI research community.
- Point-E by OpenAI: a 3D model generator.
- RT-1 by Google: a model for operating robots
- ERNIE-Code by Baidu: a 560m parameter multilingual coding model
- VALL-E text to speech synthesis based on 3-second speech sample. It was pre-trained on 60,000 hours of English speech from 7,000 unique speakers (dataset: LibriLight).