Browse Definitions :

An explanation of masked language models

In this video, TechTarget editor Jen English talks about masked language models.

Masked language models take generative AI to the next level.

Masked language models, or MLMs, have emerged as a breakthrough in natural language processing, revolutionizing how machines understand and generate human language.

These models are specifically designed to train language models such as transformers and can grasp the intricate nuances of language by predicting missing words within a given context.

Hugging Face is well-known for its access to a wide range of pretrained models, including masked language models such as BERT.

At the core of masked language models lies the concept of masking tokens. During training, certain words -- AKA tokens -- are masked intentionally and the model is tasked with predicting the correct word based on its surrounding context. This enables the model to learn word relationships, semantics and grammatical structures. For example, in the sentence, "The cat [blank] the tree," the model might predict the word "climbed" as the masked token.

Traditional or causal language models, such as GPT-2, GPT-3, T5 and GPT-Neo, are unidirectional and can only predict the next token in a sequence of tokens and attend to words on only one side of the masked token. However, MLMs are bidirectional and can attend to both left and right sides of masked tokens for making predictions.

MLMs such as BERT excel in language-related tasks, including text classification, named entity recognition and sentiment analysis due to their extensive training on large sets of diverse data.

Here are the key advantages of masked language models:

  • They can handle ambiguous language. MLMs can contextualize words based on their surrounding context, disambiguating homonyms or words with multiple meanings. This improves their ability to comprehend natural language and generate more coherent responses.
  • They are bidirectional. Unlike conventional language models that only consider either the left or the right side of the masked tokens to make predictions, MLMs attend to the surrounding words on both sides of the tokens. Their ability to access both preceding and succeeding words allows them to have a deeper comprehension of the semantics and sentence structure.
  • They can be fine-tuned for specific tasks. This enables efficient and effective adaptation to new domains or languages. This transfer-learning approach reduces the need for large amounts of labeled data and conserves computing power.
  • They offer a wide range of applications. With all the above features, MLMs open doors to a wide range of applications, from virtual assistants and sentiment analysis to machine translation and language translation, and beyond. For example, they can help virtual assistants understand user intent by predicting the missing or masked words in user queries.

Masked language models are pushing the boundaries of AI with their wide range of use cases. Are you using masked language models to elevate your AI applications? Share your thoughts in the comments below and be sure to hit that like button and subscribe.

Kinza Yasar is a technical writer for WhatIs with a degree in computer networking.

  • subnet (subnetwork)

    A subnet, or subnetwork, is a segmented piece of a larger network. More specifically, subnets are a logical partition of an IP ...

  • secure access service edge (SASE)

    Secure access service edge (SASE), pronounced sassy, is a cloud architecture model that bundles together network and cloud-native...

  • Transmission Control Protocol (TCP)

    Transmission Control Protocol (TCP) is a standard protocol on the internet that ensures the reliable transmission of data between...

  • intrusion detection system (IDS)

    An intrusion detection system monitors (IDS) network traffic for suspicious activity and sends alerts when such activity is ...

  • cyber attack

    A cyber attack is any malicious attempt to gain unauthorized access to a computer, computing system or computer network with the ...

  • digital signature

    A digital signature is a mathematical technique used to validate the authenticity and integrity of a digital document, message or...

  • product development (new product development)

    Product development -- also called new product management -- is a series of steps that includes the conceptualization, design, ...

  • innovation culture

    Innovation culture is the work environment that leaders cultivate to nurture unorthodox thinking and its application.

  • technology addiction

    Technology addiction is an impulse control disorder that involves the obsessive use of mobile devices, the internet or video ...

  • organizational network analysis (ONA)

    Organizational network analysis (ONA) is a quantitative method for modeling and analyzing how communications, information, ...

  • HireVue

    HireVue is an enterprise video interviewing technology provider of a platform that lets recruiters and hiring managers screen ...

  • Human Resource Certification Institute (HRCI)

    Human Resource Certification Institute (HRCI) is a U.S.-based credentialing organization offering certifications to HR ...

Customer Experience
  • What is an outbound call?

    An outbound call is one initiated by a contact center agent to prospective customers and focuses on sales, lead generation, ...

  • What is lead-to-revenue management (L2RM)?

    Lead-to-revenue management (L2RM) is a set of sales and marketing methods focusing on generating revenue throughout the customer ...

  • What is relationship marketing?

    Relationship marketing is a facet of customer relationship management (CRM) that focuses on customer loyalty and long-term ...