对于给定的字符串s,如果存在子串k,当k重复一定次数n后n * k可以包含s或者和s相等,我们把k叫做s的覆盖子串,其中长度最短的k,其为最小覆盖子串。我们先给结论,对input string s run KMP算法,求得next数组,假设s的长度为len,最小覆盖子串的就为从头开始长度len - next[len]的子串k。
首先我们证明k肯定是覆盖子串,假设next[len] <= len / 2:
也就是说对于长度为len的s,我们有长度为next[len]的公共前后缀s2和s3,那么长度为len - next[len]的s3就是最小覆盖子串,因为我们s3重复的时候其对应的s1前缀肯定会覆盖s2,所以s3是覆盖子串。
假设next[len] > len / 2:
对于长度为len的s,我们有长度为next[len]的公共前后缀s1和s2。那么长度为len - next[len]的s1是覆盖子串,假设我们将s3平移与s2对其,我们有:
- s3.p2 = s2.p1 = s3.p1 = s1
- s3.p3 = s2.p2 = s3.p2 = s1
- ...
- s3.pk = s2.pk-1 = s3.pk-1 = s1
下面我们继续证明其为最小的覆盖子串。假设存在比k更短的覆盖子串a,那么s可以表示为 n* a + b,b为a的prefix。那么我们其最长的公共前后缀为(n - 1) * a + b,其长度显然大于 next[len],因为a.length < k.length = len - next[len],那么len - a.length > next[len]。但是kmp求的结论一定最长的公共前后缀,所以矛盾。
时间复杂度就是KMP的复杂度,O(n),代码如下:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
string MCS(string str) | |
{ | |
//kmp to find minimum substring cover | |
int k = -1, i = 0, len = str.size(); | |
vector<int> next(len + 1, -1); | |
while(i < len) | |
{ | |
if(k == -1 || str[k] == str[i]) | |
next[++i] = ++k; | |
else k = next[k]; | |
} | |
int subLen = len - next[len]; | |
return str.substr(0, subLen); | |
} |
No comments:
Post a Comment