Essays/Levenshtein Distance
Introduction
The Levenshtein distance between two strings is the minimum number of edits (insertions, deletions, or single-character substitutions) needed to transform one string to the other. We derive a non-looping calculation of this distance.
A Looping Solution
First, a translation of the pseudo-code in the Wikipedia article into a looping function in J:
LevenshteinMatrix=: 4 : 0 d=. (1+(#x),#y)$0 for_i. i.1+#x do. d=. i (<i,0)}d end. NB. deletion for_j. i.1+#y do. d=. j (<0,j)}d end. NB. insertion for_j. i.#y do. for_i. i.#x do. if. (i{x) = j{y do. d=. d (<1+i,j)}~ (<i,j){d else. m=. 1 + (<i,1+j){d NB. deletion m=. m <. 1 + (<(1+i),j){d NB. insertion m=. m <. 1 + (<i,j){d NB. substitution d=. m (<1+i,j)}d end. end. end. )
The function returns the entire matrix rather than just the Levenshtein distance; the actual distance is the number in the lower right corner. It reproduces the two test cases in the Wikipedia article:
'sitting' LevenshteinMatrix 'kitten' 0 1 2 3 4 5 6 1 1 2 3 4 5 6 2 2 1 2 3 4 5 3 3 2 1 2 3 4 4 4 3 2 1 2 3 5 5 4 3 2 2 3 6 6 5 4 3 3 2 7 7 6 5 4 4 3 'Sunday' LevenshteinMatrix 'Saturday' 0 1 2 3 4 5 6 7 8 1 0 1 2 3 4 5 6 7 2 1 1 2 2 3 4 5 6 3 2 2 2 3 3 4 5 6 4 3 3 3 3 4 3 4 5 5 4 3 4 4 4 4 3 4 6 5 4 4 5 5 5 4 3
A Non-Looping Solution
A non-looping version obtains through a series of algebraic manipulations, replacing the inner loop with an insert (f/) and the outer loop with an application of the power operator (^:). The result of this function is the transpose of the matrix produced by LevenshteinMatrix.
lmy=: 4 : 'y,(0{x){(1+(1{x)<.{:y),2{x' lmx=: 4 : 0 b=. (_1+#y){x d=. {:y m=. (}.<.}:) d y , > lmy&.>/ (|.<"1 b,.m,.}:d),<#y ) LM=: =/~ lmx^:(#@[) ,:@i.@>:@#@[
Testing against the looping version:
'sitting' (|:@LM -: LevenshteinMatrix) 'kitten' 1 'Sunday' (|:@LM -: LevenshteinMatrix) 'Saturday' 1
Version Producing Only The Scalar Distance
By Henry Rich.
levdist=: 4 : 0 " 1 'a b'=. (/: #&>)x;y z=. >: iz =. i.#b for_j. a do. z=. <./\&.(-&iz) (>: <. (j ~: b) + |.!.j_index) z end. {:z )
Collected Definitions
LevenshteinMatrix=: 4 : 0 d=. (1+(#x),#y)$0 for_i. i.1+#x do. d=. i (<i,0)}d end. NB. deletion for_j. i.1+#y do. d=. j (<0,j)}d end. NB. insertion for_j. i.#y do. for_i. i.#x do. if. (i{x) = j{y do. d=. d (<1+i,j)}~ (<i,j){d else. m=. 1 + (<i,1+j){d NB. deletion m=. m <. 1 + (<(1+i),j){d NB. insertion m=. m <. 1 + (<i,j){d NB. substitution d=. m (<1+i,j)}d end. end. end. ) lmy=: 4 : 'y,(0{x){(1+(1{x)<.{:y),2{x' lmx=: 4 : 0 b=. (_1+#y){x d=. {:y m=. (}.<.}:) d y , > lmy&.>/ (|.<"1 b,.m,.}:d),<#y ) LM=: =/~ lmx^:(#@[) ,:@i.@>:@#@[ levdist=: 4 : 0 " 1 'a b'=. (/: #&>)x;y z=. >: iz =. i.#b for_j. a do. z=. <./\&.(-&iz) (>: <. (j ~: b) + |.!.j_index) z end. {:z )
Contributed by Roger Hui.