computational linguistics (CL)
What is computational linguistics (CL)?
Computational linguistics (CL) is the application of computer science to the analysis and comprehension of written and spoken language. As an interdisciplinary field, CL combines linguistics with computer science and artificial intelligence (AI) and is concerned with understanding language from a computational perspective. Computers that are linguistically competent help facilitate human interaction with machines and software.
Computational linguistics is used in tools such as instant machine translation, speech recognition systems, parsers, text-to-speech synthesizers, interactive voice response systems, search engines, text editors and language instruction materials.
The term computational linguistics is also closely linked to natural language processing (NLP), and these two terms are often used interchangeably.
Applications of computational linguistics
Most work in computational linguistics -- which has both theoretical and applied elements -- is aimed at improving the relationship between computers and basic language. It involves building artifacts that can be used to process and produce language. Building such artifacts requires data scientists to analyze massive amounts of written and spoken language in both structured and unstructured formats.
This article is part of
A guide to artificial intelligence in the enterprise
Applications of CL typically include the following:
- Machine translation. This is the process of using AI to translate one human language to another.
- Application clustering. This is the process of turning multiple computer servers into a cluster.
- Sentiment analysis. Sentiment analysis is an important approach to NLP that identifies the emotional tone behind a body of text.
- Chatbots. These software or computer programs simulate human conversation or chatter through text or voice interactions.
- Information extraction. This is the creation of knowledge from structured and unstructured text.
- Natural language interfaces. These are computer-human interfaces where words, phrases or clauses act as user interface controls.
- Content filtering. This process blocks various language-based web content from reaching users.
- Text mining. Text mining is the process of extracting useful information from massive amounts of unstructured textual data. Tokenization, part-of-speech tagging -- named entity recognition and sentiment analysis -- are used to accomplish this process.
Approaches and methods of computational linguistics
There have been many different approaches and methods of computational linguistics since its beginning in the 1950s. Examples of some CL approaches include the following:
- The corpus-based approach, which is based on the language as it's practically used.
- The comprehension approach, which enables the NLP engine to interpret naturally written commands in a simple rule-governed environment.
- The developmental approach, which adopts the language acquisition strategy of a child by acquiring language over time. The developmental process has a statistical approach to studying language and doesn't take grammatical structure into account.
- The structural approach, which takes a theoretical approach to the structure of a language. This approach uses large samples of a language run through computational models to gain a better understanding of the underlying language structures.
- The production approach focuses on a CL model to produce text. This has been done in a number of ways, including the construction of algorithms that produce text based on example texts from humans. This approach can be broken down into the following two approaches:
- The text-based interactive approach uses text from a human to generate a response by an algorithm. A computer can recognize different patterns and reply based on user input and specified keywords.
- The speech-based interactive approach works similarly to the text-based approach, but user input is made through speech recognition. The user's speech input is recognized as sound waves and is interpreted as patterns by the CL system.
CL vs. NLP
Computational linguistics and natural language processing are similar concepts, as both fields require formal training in computer science, linguistics and machine learning (ML). Both use the same tools, such as ML and AI, to accomplish their goals and many NLP tasks need an understanding or interpretation of language.
Where NLP deals with the ability of a computer program to understand human language as it's spoken and written and to provide sentiment analysis, CL focuses on the computational description of languages as a system. Computational linguistics also leans more toward linguistics and answering linguistic questions with computational tools; NLP, on the other hand, involves the application of processing language.
NLP plays an important role in creating language technologies, including chatbots, speech recognition systems and virtual assistants, such as Siri, Alexa and Cortana. Meanwhile, CL lends its expertise to topics such as preserving languages, analyzing historical documents and building dialogue systems, such as Google Translate.
For more on artificial intelligence in the enterprise, read the following articles:
Artificial intelligence vs. human intelligence: How are they different?
AI vs. machine learning vs. deep learning: Key differences
4 main types of artificial intelligence: Explained
History of computational linguistics
Although the concept of computational linguistics is often associated with AI, CL predates AI's development, according to the Association for Computational Linguistics. One of the first instances of CL came from an attempt to translate text from Russian to English. The thought was that computers could make systematic calculations faster and more accurately than a person, so it wouldn't take long to process a language. However, the complexities found in languages were underestimated, taking much more time and effort to develop a working program.
Two programs were developed in the early 1970s that had more complicated syntax and semantic mapping rules. SHRDLU was a primary language parser developed by computer scientist Terry Winograd at the Massachusetts Institute of Technology. It combined human linguistic models with reasoning methods. This was a major accomplishment for natural language understanding and processing research.
In 1971, the NASA developed Lunar and demonstrated it at a space convention. The Lunar Sciences Natural Language Information System answered convention attendees' questions about the composition of the rocks returned from the Apollo moon missions.
Translating languages was a difficult task before this, as the system had to understand grammar and the syntax in which words were used. Since then, strategies to execute CL began moving away from procedural approaches to ones that were more linguistic, understandable and modular. In the late 1980s, computing processing power increased, which led to a shift to statistical methods when considering CL. This is also around the time when corpus-based statistical approaches were developed.
Modern CL relies on many of the same tools and processes as NLP. These systems use a variety of tools, including AI, ML, deep learning and cognitive computing. As an example, GPT-3, or the third-generation Generative Pre-trained Transformer, is a neural network ML model that produces text based on user input. It was released by OpenAI in 2020 and was trained using internet data to generate any type of text. The program requires a small amount of input text to generate large relevant volumes of text. GPT-3 is a model with more than 175 billion ML parameters. Compared to the largest trained language model before this, Microsoft's Turing-NLG model only had 17 billion parameters. The latest version of GPT, GPT-4, launched in March 2023. Compared to its predecessors, this model is capable of handling more sophisticated tasks, thanks to improvements in its design and capabilities.
What computational linguists do and how to become one
Typically, computational linguists are employed in universities, governmental research labs or large enterprises. In the private sector, vertical companies typically use computational linguists to authenticate the accurate translation of technical manuals. Tech software companies, such as Microsoft, typically hire computational linguists to work on NLP, helping programmers create voice user interfaces that let humans communicate with computing devices as if they were another person. Some common job titles for computational linguists include natural language processing engineer, speech scientist and text analyst.
In terms of skills, computational linguists must have a strong background in computer science and programming, as well as expertise in ML, deep learning, AI, cognitive computing, neuroscience and language analysis. These individuals should also be able to handle large data sets, possess advanced analytical and problem-solving capabilities, and be comfortable interacting with both technical and nontechnical professionals.
Individuals pursuing a job as a linguist generally need a master's or doctoral degree in a computer science-related field or a bachelor's degree with work experience developing natural language software. The ultimate goal of computational linguistics is to enhance communication, revolutionize language technology and elevate human-computer interaction. Common business goals of computational linguistics include the following:
- Create grammatical and semantic frameworks for characterizing languages.
- Translate text from one language to another.
- Offer text and information retrieval that relates to a specific topic.
- Analyze text or spoken language for context, sentiment or other affective qualities.
- Answer questions, including those that require inference and descriptive or discursive answers.
- Summarize text.
- Build dialogue agents capable of completing complex tasks such as making a purchase, planning a trip or scheduling maintenance.
- Create chatbots capable of passing the Turing Test.
- Explore and identify the learning characteristics and processing techniques that constitute both the statistical and structural elements of a language.
Learn about 20 different courses for studying AI, including programs at Cornell University, Harvard University and the University of Maryland, which offer content on computational linguistics.