Playing with RMMSeg
原文地址:http://www.blogkid.net/archives/1334.html
As I mentioned before, RMMSeg is a great tool to analyze Chinese contents. Today I did some test, only for fun.
To install RMMSeg, just type in shell:
gem install rmmseg
Or, it you get the “uninitialized constant Gem::GemRunner (NameError)” error, try:
gem1.8 install rmmseg
Once finished, we can easily call the powerful analyzer like this:
root@:~# echo “我爱北京天安门” | rmmseg
我爱 北京 天安门
root@:~# echo “blogkid爱北京天安门” | rmmseg
blogkid 爱 北京 天安门
root@:~# echo “2005年进入杭州电子科技大学软件工程专业” | rmmseg
2005 年 进入 杭州 电子 科技 大学 软件 工程 专业
Hmmm, RMMSeg’s dictionary do not contain the word “软件工程” (so it was splitted to “软件” and “工程”), but we can add it by hand (Not recommended).
vim /path_to_ruby/gems/1.8/gems/rmmseg-0.1.6/data/words.dic
You’ll see a list of words. Just add “软件工程” as a new line, save and exit.
root@:~# echo “2005年进入杭州电子科技大学软件工程专业” | rmmseg
2005 年 进入 杭州 电子 科技 大学 软件工程 专业
Now the whole “软件工程” comes.
Thanks to pluskid.


0 Responses to “Playing with RMMSeg”