Saturday, April 7, 2018

[LeetCode]Encode String with Shortest Length



区间DP的问题,我们用dp[i][j]表示子串s[i:j]的最小压缩串,我们有如下递推方程:

  • dp[i][j] = min(dp[i][k] + dp[k + 1][j]), where 1 <= k < j, '+' means concatenate here
  • dp[i][j] = min(dp[i][j], compress(s[i:j])), compress is a function to find if we can shorten input string by find a repeating pattern in itself
这里compress的功能是,找到string自身的重复串来压缩string。比如对于abcabcabc,我们只进行区间dp是没有办法找到3[abc]的,所以我们必须对自身进行compress。对自身进行压缩的就是寻找自身最小覆盖子串的过程。有多个覆盖子串我们肯定是取最小的,对于d[str]的形式,如果我们double str的长度最多也就使得d减少一位,所以str越短越好。
时间复杂度O(n^3),寻找自身最小覆盖子串需要线性时间,代码如下:


class Solution {
public:
string encode(string s) {
int len = s.size();
//dp represents minimum encoded stirng for substring s[i:j]
//dp[i][j] = min(min(dp[i][k] + dp[k + 1][j]), compress(s[i:j])), where i <= k < j
//compress(s) represent compressed s by finding a repeating pattern in s itself
//e.g., for s = "abcabcabc" compress(s) = 3[abc]
//for s = "abcabcab", compress(s) = abcabcab since we can't find a repeating
//pattern which can reconstruct s
vector<vector<string>> dp(len, vector<string>(len));
for(int j = 0; j < len; ++j)
{
for(int i = j; i >= 0 ; --i)
{
dp[i][j] = s.substr(i, j - i + 1);
if(j - i >= 4)
{
int subLen = compress(dp[i][j]);
if(subLen)
{
string compressed = to_string((j - i + 1) / subLen) + "[" + dp[i][i + subLen - 1] + "]";
if(compressed.size() < dp[i][j].size())dp[i][j] = compressed;
}
for(int k = i; k < j; ++k)
{
string str = dp[i][k] + dp[k + 1][j];
if(str.size() < dp[i][j].size())
dp[i][j] = str;
}
}
}
}
return dp[0][len - 1];
}
private:
//if current string itself can find a repeating pattern
int compress(string str)
{
//kmp to find minimum substring cover
//same as https://leetcode.com/problems/repeated-substring-pattern/description/
int k = -1, i = 0, len = str.size();
vector<int> next(len + 1, -1);
while(i < len)
{
if(k == -1 || str[k] == str[i])
next[++i] = ++k;
else k = next[k];
}
int subLen = len - next[len];
return len % subLen? 0: subLen;
}
};

No comments:

Post a Comment