有时在淘宝搜索商品的时候, 会发现使用汉字, 拼音, 或者拼音混合汉字都会出来想要的搜索结果, 今天找了一下, 是通过拼音搜索插件实现的:
ik的安装及使用:https://blog.csdn.net/wwd0501/article/details/78271279
5.2.2版本 拼音分词 的安装:
1, 下载(https://github.com/medcl/elasticsearch-analysis-pinyin/releases 选择对应的版本,版本与ES版本一致,建议直接下载编译后的zip包;若是下载源码包,则需要自己编码打包mvn clean package生成zip包)
https://github.com/medcl/elasticsearch-analysis-pinyin
mvn package
打包成功后, 在 target/releases 下, 可以找到 elasticsearch-analysis-ik-5.2.2.zip
2, 将打包后的zip文件放在 {ES_HOME}/plugins/pinyin/ 目录下, 并解压根目录;重启ES服务。
3, 测试:
curl -XPUT http://localhost:9200/medcl/ -d'
{
"index" : {
"analysis" : {
"analyzer" : {
"pinyin_analyzer" : {
"tokenizer" : "my_pinyin"
}
},
"tokenizer" : {
"my_pinyin" : {
"type" : "pinyin",
"keep_separate_first_letter" : false,
"keep_full_pinyin" : true,
"keep_original" : true,
"limit_first_letter_length" : 16,
"lowercase" : true,
"remove_duplicated_term" : true
}
}
}
}
}'
POST medcl/_analyze
{
"analyzer": "pinyin_analyzer",
"text": "刘德华"
}
分词结果为:
{
"tokens" : [
{
"token" : "liu",
"start_offset" : 0,
"end_offset" : 1,
"type" : "word",
"position" : 0
},
{
"token" : "de",
"start_offset" : 1,
"end_offset" : 2,
"type" : "word",
"position" : 1
},
{
"token" : "hua",
"start_offset" : 2,
"end_offset" : 3,
"type" : "word",
"position" : 2
},
{
"token" : "刘德华",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 3
},
{
"token" : "ldh",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 4
}
]
}
看到此结果说明拼音分词安装完成。
4、IK分词器安装以及使用:https://blog.csdn.net/wwd0501/article/details/78258274
5, 配置 IK + pinyin 分词配置
settings设置:
curl -XPUT "http://localhost:9200/medcl/" -d'
{
"index": {
"analysis": {
"analyzer": {
"default": {
"tokenizer": "ik_max_word"
},
"pinyin_analyzer": {
"tokenizer": "shopmall_pinyin"
}
},
"tokenizer": {
"shopmall_pinyin": {
"keep_joined_full_pinyin": "true",
"keep_first_letter": "true",
"keep_separate_first_letter": "false",
"lowercase": "true",
"type": "pinyin",
"limit_first_letter_length": "16",
"keep_original": "true",
"keep_full_pinyin": "true"
}
}
}
}
}'
上面参数含义,可以参考官网查询:https://github.com/medcl/elasticsearch-analysis-pinyin
创建mapping:
curl -XPOST http://localhost:9200/medcl/folks/_mapping -d'
{
"folks": {
"properties": {
"name": {
"type": "text",
"analyzer": "ik_max_word",
"include_in_all": true,
"fields": {
"pinyin": {
"type": "text",
"analyzer": "pinyin_analyzer"
}
}
}
}
}
}'
添加测试文档:
curl -XPOST http://localhost:9200/medcl/folks/ -d'{"name":"刘德华"}'
curl -XPOST http://localhost:9200/medcl/folks/ -d'{"name":"中华人民共和国国歌"}'
测试分词效果:
拼音分词效果:
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:liu"
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:de"
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:hua"
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name.pinyin:ldh"
中文分词测试:
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name:刘"
curl -XPOST "http://localhost:9200/medcl/folks/_search?q=name:刘德"
注意:用户输入搜索内容,根据正则匹配分成中文、拼音、中文+拼音、中文+拼音+数字+特殊符号等情况进行搜索,如下:
1、若是汉字搜索,没有搜索结果,转化为拼音再搜索一次,按拼音搜索还是无结果,则按模糊搜索再搜一次,若是还无结果,可考虑推荐
2、若是拼音搜索,没有搜索结果,则按模糊搜索再搜一次
3、若是汉字+拼音搜索,暂且按拼音处理
4、汉字、拼音、数字、特殊字符,暂且按拼音处理
参照: http://blog.csdn.net/napoay/article/details/53907921
http://www.jianshu.com/p/653f7b33e63c
https://github.com/medcl/elasticsearch-analysis-pinyin
https://my.oschina.net/xiaohui249/blog/214505