Browse Definitions :
Definition

fuzzy search

What is a fuzzy search?

A fuzzy search is a technique that uses search algorithms to find strings that match patterns approximately. It's particularly useful for helping users find webpages without having to know exactly what they're looking for or how a word is spelled. Fuzzy searches are also used for Structured Query Language lookups to help database users find records without having to be sure of the exact spelling of the value they're looking for.

A fuzzy search is performed using a fuzzy matching algorithm, which returns a list of results based on likely relevance even though search argument words and spellings may not be an exact match. For web lookups, exact and highly relevant matches appear near the top of the list. Subjective relevance ratings may be given, usually as percentages.

How do fuzzy searches work?

A fuzzy matching algorithm can operate like a spell checker and spelling-error corrector. For example, if a user types "Misissippi" into Yahoo or Google -- both of which use fuzzy matching -- a list of hits is returned along with the question, "Did you mean Mississippi?" Alternative spellings and words that sound the same but are spelled differently are given.

A fuzzy matching algorithm can compensate for common input typing errors, as well as errors introduced by optical character recognition scanning of printed documents. The algorithm can return hits with content that contains a specified base word along with prefixes and suffixes. For example, if planet is entered as a search word, hits occur for sites containing words such as protoplanet or planetary. The program can also find synonyms and related terms, working like an online thesaurus or encyclopedic cross-reference tool.

Fuzzy matching algorithms usually return irrelevant hits, as well as relevant ones. Superfluous results are likely to occur for terms with multiple meanings, only one of which is the meaning the user intends. If the user has only a vague or general idea of the topic or doesn't know exactly what to look for, the ratio of relevant hits to irrelevant hits tends to be low. However, the ratio is even lower when an exact matching program is used in this situation.

Fuzzy searching is much more powerful than exact searching when used for research and investigation and is especially useful when researching unfamiliar, foreign-language or sophisticated terms for which the proper spellings aren't widely known. Fuzzy searching can also be used to locate individuals based on incomplete or partially accurate identifying information.

Many search engines enable users to specifically request a fuzzy search in the search query by using a tilde (~) at the end of the word or term they want to search with fuzziness.

What is the Levenshtein distance?

How close the search term is to the exact match is measured in terms of edit distance -- a metric that represents the cost of converting one string to another. The most widely used method for computing the edit distance is to take the minimum number of single-character changes needed to convert the search term to the exact match. This computation results in the Levenshtein distance, which is named after the Russian scientist who invented the method, Vladimir Levenshtein.

The following single-character changes are used to compute edit distance. Some algorithms use weights to place a higher value on one or more of the primitive operations:

  • Insertion: fuzzy serch → fuzzy search
  • Deletion: fuzzey search → fuzzy search
  • Substitution: fuzzy seerch → fuzzy search
  • Transposition: fuzzy saerch → fuzzy search

What are the pros and cons of fuzzy matching?

While fuzzy matching can be helpful to researchers who don't know the exact spelling of what they're looking for, it also can be a source of frustration, as it may return irrelevant results that the user must then filter.

Fuzzy matching offers the following benefits:

  • Online visitors can find products or product categories without knowing the correct spellings.
  • Tourists can locate cities without knowing exact spellings or by using an approximate spelling in cases where the city name is listed in a different language.
  • Scientists can find articles without having to know the exact titles.
  • Users can find the names of films or books without knowing the exact titles.
  • Users can search for people without knowing the exact spelling of their names.

Fuzzy matching also has the following drawbacks:

  • It sometimes returns too many results, requiring the user to sift through all the suggestions to find the one that's most appropriate. Sometimes, the most appropriate result is toward the end of the list, several pages down; and sometimes, none of the results are appropriate.
  • It only compensates for typos or misspellings and must be complemented with a semantic algorithm to also consider synonyms and semantics.

What is the difference between a fuzzy search and a wildcard search?

A fuzzy search is either applied by default or specifically requested by the user -- frequently by typing a tilde at the end of the search term. The fuzzy search returns matches within a certain edit distance by assuming that any letters in the search term can be inserted, deleted, substituted or transposed.

A wildcard search can only be specifically requested by the user by using wildcards, which are characters that indicate different kinds of matches. The search engine only considers certain letters can be altered, assuming all the other letters are to be matched exactly. The following are two of the most commonly used wildcards along with their meanings:

  1. A question mark (?) is used for single-character replacement. A search on f?zzy might yield fuzzy and fizzy.
  2. An asterisk (*) is used for multiple-character matching. A search on fuzz* might yield fuzzie and fuzzy.

Learn how focused specialty search engines differ from all-purpose search engines, such as Google.

This was last updated in August 2022

Continue Reading About fuzzy search

Networking
  • subnet (subnetwork)

    A subnet, or subnetwork, is a segmented piece of a larger network. More specifically, subnets are a logical partition of an IP ...

  • Transmission Control Protocol (TCP)

    Transmission Control Protocol (TCP) is a standard protocol on the internet that ensures the reliable transmission of data between...

  • secure access service edge (SASE)

    Secure access service edge (SASE), pronounced sassy, is a cloud architecture model that bundles together network and cloud-native...

Security
  • intrusion detection system (IDS)

    An intrusion detection system monitors (IDS) network traffic for suspicious activity and sends alerts when such activity is ...

  • cyber attack

    A cyber attack is any malicious attempt to gain unauthorized access to a computer, computing system or computer network with the ...

  • digital signature

    A digital signature is a mathematical technique used to validate the authenticity and integrity of a digital document, message or...

CIO
  • What is data privacy?

    Data privacy, also called information privacy, is an aspect of data protection that addresses the proper storage, access, ...

  • product development (new product development)

    Product development -- also called new product management -- is a series of steps that includes the conceptualization, design, ...

  • innovation culture

    Innovation culture is the work environment that leaders cultivate to nurture unorthodox thinking and its application.

HRSoftware
  • organizational network analysis (ONA)

    Organizational network analysis (ONA) is a quantitative method for modeling and analyzing how communications, information, ...

  • HireVue

    HireVue is an enterprise video interviewing technology provider of a platform that lets recruiters and hiring managers screen ...

  • Human Resource Certification Institute (HRCI)

    Human Resource Certification Institute (HRCI) is a U.S.-based credentialing organization offering certifications to HR ...

Customer Experience
  • What is an outbound call?

    An outbound call is one initiated by a contact center agent to prospective customers and focuses on sales, lead generation, ...

  • What is lead-to-revenue management (L2RM)?

    Lead-to-revenue management (L2RM) is a set of sales and marketing methods focusing on generating revenue throughout the customer ...

  • What is relationship marketing?

    Relationship marketing is a facet of customer relationship management (CRM) that focuses on customer loyalty and long-term ...

Close