f(k) = MaxOverlap ( "p0 p1 ... pk" )
where:
"p0 p1 ... pk" = the prefix of length k+1 of pattern P
|
Graphically:
|
Given P = "p0 p1 ... pm-1"
Given k = 1, 2, ..., m-1 (k = 0 ==> f(0) = 0)
1. Extract the sub-pattern: "p0 p1 ... pk"
2. Find the first (= largest) overlap:
Try: (p0) p1 p2 ... pk-1
p0 p1 ... pk-1 pk
If (no match)
Try: (p0) p1 p2 ... pk-1
p0 p1 ... pk-1 pk
And so on... The first overlap is the longest !
|
|
|
In other words: the longest overlapping suffix and prefix in "p0 p1 ... pk-1" has x characters:
f(k-1) = x characters
<----------------------->
p1 p2 p3 ... pk-x-2 pk-x-3 pk-x-4 .... pk-1
^ ^ ^ ^
| | | equal |
v v v v
p0 p1 p2 .... px-1 px ... pk-1
|
$64,000 question:
|
Yes, because f(k) is computed using a similar prefix as f(k−1):
prefix used to compute f(k-1)
+--------------------------------+
| |
p0 p1 p2 .... px-1 ... pk-1 pk
| |
+------------------------------------+
prefix used to compute f(k)
|
We will next learn how to exploit the similarity to compute f(k)
|
Proof: by contradiction
|
|
Proof:
|
k = 0123456
Pattern = aaabaaa
|
k = 0123456
Pattern = aaabaaa
|
(This trick is a bit tricky :))
|
Worked out further:
|
|
|
public static int[] KMP_failure_function( P )
{
int k, i, x, m;
int f[] = new int[P.length()]; // f[] stores the function values
m = P.length();
f[0] = 0; // f[0] is always 0...
for ( k = 1; k < m; k++ )
{
// Compute f(k) and store in f[k]
i = k-1; // Try use f(k-1) to compute f(k)
x = f[i]; // Character position to match agains P[k]
if ( P[k] == P[x] ) // Note: make sure x is valid
{
f[k] = f[i] + 1;
continue; // Compute next f(k) value
}
else
{
i = x-1; // Try next prefix (and next f(i)) to compute f(k)
x = f[i]; // Character position to match agains P[k]
}
if ( P[k] == P[x] ) // Note: make sure x is valid
{
f[k] = f[i] + 1;
continue; // Compute next f(k) value
}
else
{
i = x-1; // Try next prefix (and next f(i)) to compute f(k)
x = f[i]; // Character position to match agains P[k]
}
.... (obviously we will make this into a loop !!!)
}
}
|
public static int[] KMP_failure_function(String P)
{
int k, i, x, m;
int f[] = new int[P.length()];
m = P.length();
f[0] = 0; // f(0) is always 0
for ( k = 1; k < m; k++ )
{
// Compute f[k]
i = k-1; // First try to use f(k-1) to compute f(k)
x = f[i];
while ( P.charAt(x) != P.charAt(k) )
{
i = x-1; // Try the next candidate f(.) to compute f(k)
if ( i < 0 ) // Make sure x is valid
break; // STOP the search !!!
x = f[i];
}
if ( i < 0 )
f[k] = 0; // No overlap at all: max overlap = 0 characters
else
f[k] = f[i] + 1; // We can compute f(k) using f(i)
}
return(f);
}
|
How to run the program:
|
Example:
>>> java ComputeF
P = ababyababa
-----------------------------------------------
Prefix = ab --- Computing f(1):
===================================
Try using: f(0) = 0
=====================================
Matching: i = 1, j = 0
01
ab
ab
01
^
|
No overlap possible... --> f[1] = 0
-----------------------------------------------
.......
-----------------------------------------------
Prefix = ababyababa --- Computing f(9):
===================================
Try using: f(8) = 4
=====================================
Matching: i = 9, j = 4
0123456789
ababyababa
ababyababa
0123456789
^
|
===================================
Try using: f(3) = 2
=====================================
Matching: i = 9, j = 2
0123456789
ababyababa
ababyababa
0123456789
^
|
Overlap found ... --> f[9] = 3
|