Pike v8.0 release 1738

Method String.levenshtein_distance()


Method levenshtein_distance

int levenshtein_distance(string a, string b)

Description

This function calculates the Levenshtein distance between two strings a and b. The Levenshtein distance describes the minimal number of character additions, removals or substitutions to apply to convert a to b.

Mathematically, the Levenshtein distance between two strings a, b is given by lev_a,b(|a|,|b|) where

lev_a,b(i, j) == max(i, j), if min(i, j) == 0 lev_a,b(i, j) == min( lev_a,b(i, j-1)+1, lev_a,b(i-1, j)+1, lev_a,b(i-1, j-1) + a_i!=b_j ), else

Note that the first element in the minimum corresponds to inserting a character to a (or deleting a character from b), the second to deleting a character from a and the third to match or mismatch, depending on whether the respective characters are equal.

Example: For example, the Levenshtein distance between "pike" and "bikes" is 2, since the following two edits change one into the other, and there is no way to do it with fewer than two edits: - "pike" -> "bike" (substitute "p" with "b") - "bike" -> "bikes" (add "s" at the end)

Note that the cost to compute the Levenshtein distance is roughly proportional to the product of the two string lengths. So this function is usually used to aid in fuzzy string matching, when at least one of the strings is short.