elasticsearch常用命令总结(三)

elasticsearch常用命令总结(三)

1 标准分词器

POST _analyze
{
  "analyzer": "standard",
  "text":     "The quick brown fox."
}


# 输出:
{
  "tokens" : [
    {
      "token" : "上",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "海",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "大",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    }
  ]
}

标准分词器只对英文支持比较好
但是对汉语支持性不好,所以我们要用 IK分词器

2. 安装IK分词器

2.1 下载ik分词器插件

https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.4.2/elasticsearch-analysis-ik-7.4.2.zip

2.2 将分词器解压到 es的plugins/ik 文件夹内

由于我们之前时用docker 安装的es, 且映射了/data/elasticsearch/plugins

# 新建目录
mkdir -p /data/elasticsearch/plugins/ik

# 将elasticsearch-analysis-ik-7.4.2.zip 
# 解压到/data/elasticsearch/plugins/ik 内
unzip elasticsearch-analysis-ik-7.4.2.zip -d /data/elasticsearch/plugins/ik/

# 将elasticsearch-analysis-ik-7.4.2.zip

2.3 测试ik 是否安装成功

# 进入到docker中
docker exec -it ebbb6ee33542 /bin/bash

# 执行检测是否安装plugins 
/usr/share/elasticsearch/bin/elasticsearch-plugin list

2.4 重启elasticsearch

docker restart ebbb6ee33542

3. 使用 IK分词器

3.1 智能分词器 ik_smart

POST _analyze
{
  "analyzer": "ik_smart",
  "text":     "上海大"
}

#  输出

{
  "tokens" : [
    {
      "token" : "上海",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "大",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "CN_CHAR",
      "position" : 1
    }
  ]
}

3.2 智能分词器 ik_max_word

POST _analyze
{
  "analyzer": "ik_max_word",
  "text":     "我是中国人"
}

#  输出

{
  "tokens" : [
    {
      "token" : "我",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "CN_CHAR",
      "position" : 0
    },
    {
      "token" : "是",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "CN_CHAR",
      "position" : 1
    },
    {
      "token" : "中国人",
      "start_offset" : 2,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    },
    {
      "token" : "中国",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "CN_WORD",
      "position" : 3
    },
    {
      "token" : "国人",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 4
    }
  ]
}

4. 自定义词库扩展

4.1 安装nginx

docker 安装nginx

4.2 配置nginx

新建es目录

mkdir -p /data/nginx/html/es

新建分词器的自定义文件

vi /data/nginx/html/es/fenci.txt

内容如下:

小黄人
尚硅谷

访问测试
http://192.168.103.129/es/fenci.txt

4.3 配置IK

# 打开文件
vi /data/elasticsearch/plugins/ik/config/IKAnalyzer.cfg.xml

# 原配置:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 -->
        <entry key="ext_dict"></entry>
         <!--用户可以在这里配置自己的扩展停止词字典-->
        <entry key="ext_stopwords"></entry>
        <!--用户可以在这里配置远程扩展字典 -->
        <!-- <entry key="remote_ext_dict">words_location</entry> -->
        <!--用户可以在这里配置远程扩展停止词字典-->
        <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>


修改配置

# 修改 将该行注释打开并配置
<entry key="remote_ext_dict">http://192.168.103.129/es/fenci.txt</entry>

4.4 重启es

docker restart ebbb6ee33542

4.5 测试

POST _analyze
{
  "analyzer": "ik_max_word",
  "text":     "尚硅谷项目"
}

输出结果:

{
  "tokens" : [
    {
      "token" : "尚硅谷",
      "start_offset" : 0,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 0
    },
    {
      "token" : "硅谷",
      "start_offset" : 1,
      "end_offset" : 3,
      "type" : "CN_WORD",
      "position" : 1
    },
    {
      "token" : "项目",
      "start_offset" : 3,
      "end_offset" : 5,
      "type" : "CN_WORD",
      "position" : 2
    }
  ]
}

  • 0
    点赞
  • 0
    收藏
    觉得还不错?