f(k) = MaxOverlap ( "p0 p1 ... pk" ) where: "p0 p1 ... pk" = the prefix of length k+1 of pattern P |
Graphically:
Given P = "p0 p1 ... pm-1" Given k = 1, 2, ..., m-1 (k = 0 ==> f(0) = 0) 1. Extract the sub-pattern: "p0 p1 ... pk" 2. Find the first (= largest) overlap: Try: (p0) p1 p2 ... pk-1 p0 p1 ... pk-1 pk If (no match) Try: (p0) p1 p2 ... pk-1 p0 p1 ... pk-1 pk And so on... The first overlap is the longest ! |
|
In other words: the longest overlapping suffix and prefix in "p0 p1 ... pk-1" has x characters:
f(k-1) = x characters <-----------------------> p1 p2 p3 ... pk-x-2 pk-x-3 pk-x-4 .... pk-1 ^ ^ ^ ^ | | | equal | v v v v p0 p1 p2 .... px-1 px ... pk-1 |
$64,000 question:
|
Yes, because f(k) is computed using a similar prefix as f(k−1):
prefix used to compute f(k-1) +--------------------------------+ | | p0 p1 p2 .... px-1 ... pk-1 pk | | +------------------------------------+ prefix used to compute f(k) |
We will next learn how to exploit the similarity to compute f(k)
|
Proof: by contradiction
|
|
Proof:
|
k = 0123456 Pattern = aaabaaa |
k = 0123456 Pattern = aaabaaa |
(This trick is a bit tricky :))
|
Worked out further:
|
|
|
public static int[] KMP_failure_function( P ) { int k, i, x, m; int f[] = new int[P.length()]; // f[] stores the function values m = P.length(); f[0] = 0; // f[0] is always 0... for ( k = 1; k < m; k++ ) { // Compute f(k) and store in f[k] i = k-1; // Try use f(k-1) to compute f(k) x = f[i]; // Character position to match agains P[k] if ( P[k] == P[x] ) // Note: make sure x is valid { f[k] = f[i] + 1; continue; // Compute next f(k) value } else { i = x-1; // Try next prefix (and next f(i)) to compute f(k) x = f[i]; // Character position to match agains P[k] } if ( P[k] == P[x] ) // Note: make sure x is valid { f[k] = f[i] + 1; continue; // Compute next f(k) value } else { i = x-1; // Try next prefix (and next f(i)) to compute f(k) x = f[i]; // Character position to match agains P[k] } .... (obviously we will make this into a loop !!!) } } |
public static int[] KMP_failure_function(String P) { int k, i, x, m; int f[] = new int[P.length()]; m = P.length(); f[0] = 0; // f(0) is always 0 for ( k = 1; k < m; k++ ) { // Compute f[k] i = k-1; // First try to use f(k-1) to compute f(k) x = f[i]; while ( P.charAt(x) != P.charAt(k) ) { i = x-1; // Try the next candidate f(.) to compute f(k) if ( i < 0 ) // Make sure x is valid break; // STOP the search !!! x = f[i]; } if ( i < 0 ) f[k] = 0; // No overlap at all: max overlap = 0 characters else f[k] = f[i] + 1; // We can compute f(k) using f(i) } return(f); } |
How to run the program:
|
Example:
>>> java ComputeF P = ababyababa ----------------------------------------------- Prefix = ab --- Computing f(1): =================================== Try using: f(0) = 0 ===================================== Matching: i = 1, j = 0 01 ab ab 01 ^ | No overlap possible... --> f[1] = 0 ----------------------------------------------- ....... ----------------------------------------------- Prefix = ababyababa --- Computing f(9): =================================== Try using: f(8) = 4 ===================================== Matching: i = 9, j = 4 0123456789 ababyababa ababyababa 0123456789 ^ | =================================== Try using: f(3) = 2 ===================================== Matching: i = 9, j = 2 0123456789 ababyababa ababyababa 0123456789 ^ | Overlap found ... --> f[9] = 3 |