An explanation of masked language models

Kinza Yasar, Technical Writer

In this video, TechTarget editor Jen English talks about masked language models.

Masked language models take generative AI to the next level.

Masked language models, or MLMs, have emerged as a breakthrough in natural language processing, revolutionizing how machines understand and generate human language.

These models are specifically designed to train language models such as transformers and can grasp the intricate nuances of language by predicting missing words within a given context.

Hugging Face is well-known for its access to a wide range of pretrained models, including masked language models such as BERT.

At the core of masked language models lies the concept of masking tokens. During training, certain words -- AKA tokens -- are masked intentionally and the model is tasked with predicting the correct word based on its surrounding context. This enables the model to learn word relationships, semantics and grammatical structures. For example, in the sentence, "The cat [blank] the tree," the model might predict the word "climbed" as the masked token.

Traditional or causal language models, such as GPT-2, GPT-3, T5 and GPT-Neo, are unidirectional and can only predict the next token in a sequence of tokens and attend to words on only one side of the masked token. However, MLMs are bidirectional and can attend to both left and right sides of masked tokens for making predictions.

MLMs such as BERT excel in language-related tasks, including text classification, named entity recognition and sentiment analysis due to their extensive training on large sets of diverse data.

Here are the key advantages of masked language models:

They can handle ambiguous language. MLMs can contextualize words based on their surrounding context, disambiguating homonyms or words with multiple meanings. This improves their ability to comprehend natural language and generate more coherent responses.
They are bidirectional. Unlike conventional language models that only consider either the left or the right side of the masked tokens to make predictions, MLMs attend to the surrounding words on both sides of the tokens. Their ability to access both preceding and succeeding words allows them to have a deeper comprehension of the semantics and sentence structure.
They can be fine-tuned for specific tasks. This enables efficient and effective adaptation to new domains or languages. This transfer-learning approach reduces the need for large amounts of labeled data and conserves computing power.
They offer a wide range of applications. With all the above features, MLMs open doors to a wide range of applications, from virtual assistants and sentiment analysis to machine translation and language translation, and beyond. For example, they can help virtual assistants understand user intent by predicting the missing or masked words in user queries.

Masked language models are pushing the boundaries of AI with their wide range of use cases. Are you using masked language models to elevate your AI applications? Share your thoughts in the comments below and be sure to hit that like button and subscribe.

Kinza Yasar is a technical writer for WhatIs with a degree in computer networking.

View All Videos