Sunday, October 14, 2018

[POJ]1816 Wild Words


A word is a string of lowercases. A word pattern is a string of lowercases, '?'s and '*'s. In a pattern, a '?' matches any single lowercase, and a '*' matches none or more lowercases. 

There are many word patterns and some words in your hand. For each word, your task is to tell which patterns match it. 

这道题是正则匹配的题,需要我们处理的是?和*两个字符。关于正则表达式的匹配,我们之前介绍过很多不同的方法,包括DFS,DP和NFA:

上面几种方法对于解决单串模式串和输入串的匹配都很不错,但是这道题需要我们对输入字符串找出所有匹配的模式串。上面的方法需要我们对每一个模式串去跑DFS,DP或者生成NFA去进行匹配,并不是很方便。我们可以考虑用Trie来缩减一些重复的工作,我们把所有的模式串存入Trie,之后对于每一个输入的待匹配串假设当前位为i:
  • 如果当前是节点是普通的字母,我们直接和待匹配串的当前位进行比较
  • 如果当前节点是?,待匹配串的这一位可以直接匹配
  • 如果当前节点是*,因为*可以匹配任意多的字符,所以我们需要遍历[i, len - 1]所有的可能
当我们到达待匹配串的末尾时:
  • 如果我们正好也到达了某个模式串的末尾,说明我们匹配了某个模式串
  • 如果我们停在普通的字母或者?表示的节点,说明模式串还有字符没有被match完,不匹配
  • 如果我们停在*表示的节点,我们还需要继续dfs,因为如果存在之后全为***的模式串还是可以继续匹配的
值得注意的是,相同的匹配串可能有多个,所以trie的节点我们需要一个vector来记录所有的。代码如下:


//POJ 1816
/*
A word is a string of lowercases. A word pattern is a string of lowercases, '?'s and '*'s. In a pattern, a '?' matches any single lowercase, and a '*' matches none or more lowercases.
There are many word patterns and some words in your hand. For each word, your task is to tell which patterns match it.
Input
The first line of input contains two integers N (0 < N <= 100000) and M (0 < M <=100), representing the number of word patterns and the number of words. Each of the following N lines contains a word pattern, assuming all the patterns are numbered from 0 to N-1. After those, each of the last M lines contains a word.
You can assume that the length of patterns will not exceed 6, and the length of words will not exceed 20.
Output
For each word, print a line contains the numbers of matched patterns by increasing order. Each number is followed by a single blank. If there is no pattern that can match the word, print "Not match".
*/
#include <cstdio>
#include <vector>
#include <cstring>
#include <string>
#define MAXN 100005
#define MAXM 105
using namespace std;
int ans[MAXN], n, m;
int getId(char c)
{
switch (c) {
case '?':
return 'z' - 'a' + 1;
case '*':
return 'z' - 'a' + 2;
default:
return c - 'a';;
}
}
struct Node
{
Node* children[28];
vector<int> ids;//we may have duplicate patterns
Node()
{
memset(children, 0, sizeof(children));
}
};
class Trie
{
public:
Trie()
{
root = new Node();
}
~Trie()
{
clear(root);
}
void insert(string s, int id)
{
Node* curr = root;
for(int i = 0; i < s.size(); ++i)
{
int idx = getId(s[i]);
if(!curr->children[idx])
curr->children[idx] = new Node();
curr = curr->children[idx];
}
curr->ids.push_back(id);
}
void query(string s)
{
memset(ans, 0, sizeof(ans));
bool matched = false;
dfs(root, s, 0);
for(int i = 0; i < n; i++)
{
if(ans[i])
{
matched = true;
printf("%d ", i);
}
}
if(!matched)
printf("Not match");
printf("\n");
}
private:
Node* root;
void clear(Node* curr)
{
if(!curr)return;
for(int i = 0; i < 28; ++i)
clear(curr->children[i]);
delete curr;
}
void dfs(Node* curr, string s, int i)
{
if(!curr)return;
if(i == s.size())
{
for(int j = 0; j < curr->ids.size(); ++j)
ans[curr->ids[j]] = 1;
if(curr->children[getId('*')])
dfs(curr->children[getId('*')], s, i);
return;
}
int idx = getId(s[i]);
if(curr->children[idx])dfs(curr->children[idx], s, i + 1);
if(curr->children[getId('?')])dfs(curr->children[getId('?')], s, i + 1);
if(curr->children[getId('*')])
{
for(int j = i; j <= s.size(); ++j)
dfs(curr->children[getId('*')], s, j);
}
}
};
int main()
{
scanf("%d%d",&n,&m);
Trie trie;
//construct trie based on patterns
for(int i = 0; i < n; i++)
{
char s[25];
scanf("%s", s);
string str(s);
trie.insert(str,i);
}
for(int i = 0; i < m; i++)
{
char s[25];
scanf("%s",s);
string str(s);
trie.query(str);
}
return 0;
}

No comments:

Post a Comment