es-字段类型详解与选型建议

draymond7107

已于 2025-04-23 14:54:17 修改

阅读量1k

点赞数 11

CC 4.0 BY-SA版权

分类专栏： elasticSearch 文章标签： elasticsearch

于 2025-04-01 13:45:57 首次发布

本文链接：https://blog.csdn.net/Draymond_feng/article/details/146139616

elasticSearch 专栏收录该内容

5 篇文章

订阅专栏

字段类型	用途	示例
Text	全文搜索的字符串字段。	`json { "type": "text" }`
Keyword	精确匹配的字符串字段。	`json { "type": "keyword" }`
Numeric	数值字段（如 `integer`、`long`、`float` 等）。	`json { "type": "integer" }`
Date	日期和时间字段。	`json { "type": "date" }`
Boolean	布尔值字段。	`json { "type": "boolean" }`
Object	JSON 对象字段。	`json { "type": "object" }`
Nested	JSON 对象数组字段。	`json { "type": "nested" }`
Geo-point	经纬度坐标字段。	`json { "type": "geo_point" }`
IP	IP 地址字段。	`json { "type": "ip" }`
Completion	自动补全字段。	`json { "type": "completion" }`
Runtime Fields	动态计算字段值。	`json { "type": "runtime" }`

字段类型详解

1.Text

特点：用于全文搜索的字符串类型，会被分词，支持模糊搜索。

使用案例：存储文章内容、产品描述等。

创建Mapping：

PUT /my_index
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text"
      }
    }
  }
}

插入数据：

POST /my_index/_doc/1
{
  "content": "This is a sample text for Elasticsearch."
}

查找数据：

GET /my_index/_search
{
  "query": {
    "match": {
      "content": "sample text"
    }
  }
}

2. Keyword

特点：用于精确匹配的字符串类型，不会被分词，适合存储ID、标签；适合精确查询、聚合和排序。

使用案例：存储用户ID、产品SKU、标签等。

创建Mapping：

PUT /my_index   
{
   "mappings": {
    "properties": {
      "tag": {
        "type": "keyword"
      }
      }
      }      
}

插入数据：

POST /my_index/_doc/1   
{
   "tag": "elasticsearch"   
}

查找数据：

GET /my_index/_search
{
  "query": {
    "term": {
      "tag": "elasticsearch"
    }
  }
}

tag本身就是

3. Numeric

特点：包括long, integer, short, byte, double, float等，用于存储数值数据。

使用案例：

long：存储大整数，如用户ID。
float,double：存储带小数的数值，如价格、评分。

创建Mapping：

PUT /my_index   
{
   "mappings": {
    "properties": {
      "age": {
        "type": "integer"
      }
    }
  }
}

插入数据：

POST /my_index/_doc/1
{
  "age": 25   
}

查找数据：

GET /my_index/_search   
{
   "query": {
    "range": {
      "age": {
        "gte": 20,
        "lte": 30
        }
        }
        }        
}

4. Date

特点：用于存储日期和时间，支持多种日期格式。

使用案例：存储订单日期、日志时间戳等。

创建Mapping：

PUT /my_index   
{
   "mappings": {
    "properties": {
      "timestamp": {
        "type": "date"
      }
      }
      }      
}

插入数据：

POST /my_index/_doc/1      
{
      "timestamp": "2023-10-01T12:00:00Z"      
}

查找数据：

GET /my_index/_search      
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "2023-10-01T00:00:00Z",
        "lte": "2023-10-01T23:59:59Z"
        }
        }
        }        
}

5. Boolean

特点：用于存储布尔值（true或false）。

使用案例：存储是否已付款、是否激活等。

创建Mapping：

PUT /my_index   
{
   "mappings": {
    "properties": {
      "is_active": {
        "type": "boolean"
      }
    }
   }   
}

插入数据：

POST /my_index/_doc/1   
{
   "is_active": true   
}

查找数据：

GET /my_index/_search   
{
   "query": {
    "term": {
      "is_active": true
    }
   }   
}

6. Object

特点：用于存储JSON对象。

扁平化存储：Object 类型会将嵌套对象的字段"扁平化"存储到父文档中
无独立性：数组中的对象不会保持独立关系，会被合并处理
简单查询：查询语法比 Nested 类型简单，性能更好

使用案例：存储嵌套的用户信息，如用户地址（不需要独立查询子对象）。

创建Mapping：

订单mapping，含有物品列表、买家、卖家信息

PUT /object_orders   
{
   "mappings": {
    "properties": {
      "order_id": {
        "type": "keyword"
      },
      "order_date": {
        "type": "date"
      },
      "total_amount": {
        "type": "double"
      },
      "status": {
        "type": "keyword"
      },
      "payment_method": {
        "type": "keyword"
      },
      "items": {
        "type": "object",
        "properties": {
          "product_id": {
            "type": "keyword"
          },
          "product_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "quantity": {
            "type": "integer"
          },
          "price": {
            "type": "double"
          },
          "category": {
            "type": "keyword"
          }
        }
      },
      "buyer": {
        "type": "object",
        "properties": {
          "user_id": {
            "type": "keyword"
          },
          "name": {
            "type": "text"
          },
          "email": {
            "type": "keyword"
          },
          "shipping_address": {
            "type": "text"
          }
        }
      },
      "seller": {
        "type": "object",
        "properties": {
          "seller_id": {
            "type": "keyword"
          },
          "name": {
            "type": "text"
          },
          "rating": {
            "type": "float"
          }
        }
      }
    }
   }   
}

插入数据：

POST /object_orders/_doc/1   
{
   "order_id": "ORD-2023-002",
   "order_date": "2023-10-16T14:45:00Z",
   "total_amount": 89.98,
   "status": "processing",
   "payment_method": "paypal",
   "items": [
    {
      "product_id": "P-1003",
      "product_name": "Bluetooth Speaker",
      "quantity": 1,
      "price": 59.99,
      "category": "electronics"
    },
    {
      "product_id": "P-3001",
      "product_name": "Screen Protector",
      "quantity": 1,
      "price": 29.99,
      "category": "accessories"
    }
   ],
   "buyer": {
    "user_id": "U-10002",
    "name": "Alice Johnson",
    "email": "alice.j@example.com",
    "shipping_address": "456 Oak Ave, Los Angeles, CA 90001"
   },
   "seller": {
    "seller_id": "S-5002",
    "name": "GadgetWorld",
    "rating": 4.5
  }
}

查找数据：

GET /object_orders/_search   
{
   "query": {
    "term": {
      "status": "processing"
    }
  }   
}

Object 类型的局限性

数组对象关系丢失：当 items 是对象数组时，数组中的对象会失去彼此间的边界
交叉匹配问题：无法支持查询product_name是Bluetooth，同时price是29
无法单独查询：不能单独查询数组中的某个特定对象

何时选择 Object 类型

当子对象不需要保持独立关系时
当查询性能比关系精确性更重要时
当数据结构简单，不需要复杂查询时
当数据中的数组通常只包含单个对象时

交叉匹配问题case：

一条订单数据有2个商品：

    {
      "product_name": "Bluetooth Speaker",
      "price": 59.99
    },
    {
      "product_name": "Screen Protector",
      "price": 29.99
    }

搜索 Bluetooth Speaker and 29.99，该订单会被搜索出来（如果需要精准搜索，需要使用nested类型）

7. Nested

特点：用于存储数组中的JSON对象，每个嵌套对象是独立的，支持单独查询。

使用案例：订单中的商品列表、用户地址列表等。

创建Mapping：

订单的mapping，含有订单信息，订单中的物品信息、买家信息、卖家信息

PUT /orders   
{
   "mappings": {
    "properties": {
      "order_id": {
        "type": "keyword"
      },
      "order_date": {
        "type": "date"
      },
      "total_amount": {
        "type": "double"
      },
      "status": {
        "type": "keyword"
      },
      "payment_method": {
        "type": "keyword"
      },
      "items": {
        "type": "nested",
        "properties": {
          "product_id": {
            "type": "keyword"
          },
          "product_name": {
            "type": "text",
            "fields": {
              "keyword": {
                "type": "keyword"
              }
            }
          },
          "quantity": {
            "type": "integer"
          },
          "price": {
            "type": "double"
          },
          "category": {
            "type": "keyword"
          }
        }
      },
      "buyer": {
        "type": "nested",
        "properties": {
          "user_id": {
            "type": "keyword"
          },
          "name": {
            "type": "text"
          },
          "email": {
            "type": "keyword"
          },
          "shipping_address": {
            "type": "text"
          }
        }
      },
      "seller": {
        "type": "nested",
        "properties": {
          "seller_id": {
            "type": "keyword"
          },
          "name": {
            "type": "text"
          },
          "rating": {
            "type": "float"
          }
        }
      }
    }
   }   
}

插入数据：

POST /orders/_doc/1   
{
   "order_id": "ORD-2023-001",
   "order_date": "2023-10-15T10:30:00Z",
   "total_amount": 125.99,
   "status": "completed",
   "payment_method": "credit_card",
   "items": [
    {
      "product_id": "P-1001",
      "product_name": "Wireless Headphones",
      "quantity": 1,
      "price": 99.99,
      "category": "electronics"
    },
    {
      "product_id": "P-2002",
      "product_name": "USB-C Cable",
      "quantity": 2,
      "price": 13.00,
      "category": "accessories"
    }
   ],
   "buyer": {
    "user_id": "U-10001",
    "name": "John Smith",
    "email": "john.smith@example.com",
    "shipping_address": "123 Main St, New York, NY 10001"
   },
   "seller": {
    "seller_id": "S-5001",
    "name": "TechGadgets Inc.",
    "rating": 4.8
  }
}

查找数据：

查询包含特定商品的订单

GET /orders/_search   
{
   "query": {
    "nested": {
      "path": "items",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "items.product_name": "Wireless Headphones"
              }
            },
            {
              "range": {
                "items.price": {
                  "gte": 50
                }
              }
            }
          ]
        }
      }
    }
    }    
}

关键字：nested

如果要做筛选，nested中的path要写正确

8. Geo_point

特点：

用于存储地理位置（经纬度）。
支持地理距离查询。

使用案例：

用户位置、商家地址等。

创建 Mapping：

一个店铺的位置：

PUT /geo_shops
{
  "mappings": {
    "properties": {
      "shop_id": {
        "type": "keyword"
      },
      "shop_name": {
        "type": "text",
        "fields": {
          "keyword": {
            "type": "keyword"
          }
        }
      },
      "location": {
        "type": "geo_point"
      },
      "address": {
        "type": "text"
      },
      "category": {
        "type": "keyword"
      }
    }
  }
}

插入数据：

POST /geo_shops/_doc/1   
{
   "shop_id": "S001",
   "shop_name": "Central Coffee",
   "location": {
    "lat": 39.9042,
    "lon": 116.4074
   },
   "address": "1 Wangfujing Street, Beijing",
   "category": "cafe"
}

查找数据：

1公里范围查询方法

1.geo_distance 查询（最常用）

GET /geo_shops/_search   
{
   "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "geo_distance": {
          "distance": "1km",
          "location": {
            "lat": 39.9087,
            "lon": 116.3975
            }
            }
            }
            }
            }            
}

2.带排序的距离查询（按距离由近到远）

GET /geo_shops/_search            
{
  "query": {
    "bool": {
      "filter": {
        "geo_distance": {
          "distance": "1km",
          "location": {
            "lat": 39.9087,
            "lon": 116.3975
          }
        }
      }
    }
  },
  "sort": [
    {
      "_geo_distance": {
        "location": {
          "lat": 39.9087,
          "lon": 116.3975
        },
        "order": "asc",
        "unit": "km",
        "mode": "min",
        "distance_type": "arc"
      }
    }
  ]
}

3. 返回距离信息的查询

GET /geo_shops/_search
{
  "query": {
    "bool": {
      "filter": {
        "geo_distance": {
          "distance": "1km",
          "location": {
            "lat": 39.9087,
            "lon": 116.3975
          }
        }
      }
    }
  },
  "script_fields": {
    "distance_km": {
      "script": {
        "source": "doc['location'].arcDistance(params.lat, params.lon)",
        "params": {
          "lat": 39.9087,
          "lon": 116.3975
        }
      }
    }
  }
}

4. 组合查询（特定类别+1公里范围）

GET /geo_shops/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "category": "cafe"
          }
        }
      ],
      "filter": {
        "geo_distance": {
          "distance": "1km",
          "location": {
            "lat": 39.9087,
            "lon": 116.3975
          }
        }
      }
    }
  }
}

高级参数说明

distance_type：
- arc（默认）：最精确，使用球面计算
- plane：更快但精度稍低，适用于小范围
mode（多位置文档）：
- min（默认）：使用最近的点
- max：使用最远的点
- avg：使用平均距离
- median：使用中位数距离

性能优化建议

为地理字段使用 doc_values（默认启用）
对于静态数据，考虑使用 indexed_shape 替代动态计算
合理设置 precision 和 tree_levels 参数
使用过滤器(filter)而非查询(query)进行地理过滤，可以利用缓存

创建 Mapping：

插入数据：

查找数据：

字段选型的优化建议

`keyword、number 选型`

使用场景对比：

keyword 类型的适用场景

精确匹配查询：当需要精确匹配（term query）时

聚合操作：进行 terms aggregation 等聚合分析时

不需要范围查询或数学运算：如订单编号、身份证号等标识性数据

包含前导零的数字：如邮政编码"00123"需要保留格式

numeric 类型的适用场景

范围查询：需要做大于、小于等范围查询时

数学运算：需要进行求和、平均值等计算时

排序需求：需要按数值大小排序时

空间效率：大数值时比 keyword 更节省空间

性能对比分析

索引性能

指标

keyword

numeric

索引速度

稍慢

较快

索引大小

较大

较小

内存占用

较高

较低

查询性能

查询类型

keyword 优势

numeric 优势

精确匹配

⭐️⭐️⭐️⭐️⭐️

⭐️⭐️⭐️

范围查询

❌ 不支持

⭐️⭐️⭐️⭐️⭐️

聚合操作

⭐️⭐️⭐️⭐️

⭐️⭐️⭐️⭐️

排序操作

按字典序

按数值大小⭐️

指标	keyword	numeric
索引速度	稍慢	较快
索引大小	较大	较小
内存占用	较高	较低

查询类型	keyword 优势	numeric 优势
精确匹配	⭐️⭐️⭐️⭐️⭐️	⭐️⭐️⭐️
范围查询	❌ 不支持	⭐️⭐️⭐️⭐️⭐️
聚合操作	⭐️⭐️⭐️⭐️	⭐️⭐️⭐️⭐️
排序操作	按字典序	按数值大小⭐️

举例

应该选择 keyword 的情况

标识性数字：如用户ID、订单号等，即使全是数字也应作为 keyword

固定长度数字：如银行卡号、电话号码等需要保留前导零

高基数枚举值：如状态码、错误码等

应该选择 numeric 的情况

需要计算的值：如统计求和、平均值等

连续变化的数值：如温度、时间戳等

度量值：如价格、数量、年龄等

总结：

numeric 的term匹配，因为精度问题会存在匹配错误的问题，因此精准匹配应当用keyword
数字之间有关系的（求和/平均值/趋势有意义），使用numeric类型，其余的使用keyword。
如果同时需要精准匹配+范围匹配，可以同时定义keyword和numeric子字段，兼顾各种查询场景

`keyword、text` `选型`

场景	推荐类型	理由	示例
精确匹配	`keyword`	不分词，完整匹配	订单号、ID、标签
全文搜索	`text` + `keyword`多字段	既支持分词搜索又支持精确匹配	产品名称、文章标题
多语言内容	`text` + 指定分析器	支持语言特性分词	中文用ik，英文用standard

nested、Object 对象选型

nested、Object 对比与适用场景

特性	Object (普通对象)	Nested (嵌套结构)
定义	标准的 JSON 对象	特殊设计的嵌套对象，通常用于表示一对多关系
存储方式	作为文档的一部分存储	作为独立文档存储，但逻辑上属于父文档
查询性能	适合简单查询	适合复杂关联查询
更新效率	更新整个对象	可以独立更新嵌套元素
适用场景	简单的一对一关系	一对多关系，需要保持数据完整性

操作	Object 性能	Nested 性能
读取整个文档	快	中等
更新整个文档	快	慢
更新部分字段	中等	快
复杂查询	慢	快
索引效率	中等	高

查询类型	Object 支持	Nested 支持
简单字段查询	优秀	优秀
跨对象关联查询	有限	优秀
数组元素独立查询	不支持	支持
数组元素聚合	有限	优秀

object优化-扁平化

object类型适合1对1的模型，因此可以扁平化处理

{
   "order": {
    "order_id": "ORD20231115001",
    "order_date": "2023-11-15T10:30:00Z",
    "seller": {
      "seller_id": "SELLER001",
      "seller_name": "超时代数码旗舰店",
      "contact": {
        "phone": "0755-12345678",
        "email": "seller@chaoshidai.com"
      }
    }
}

扁平化

{
   "order": {
    "order_id": "ORD20231115001",
    "order_date": "2023-11-15T10:30:00Z",
    "seller_seller_id": "SELLER001",
    "seller_seller_name": "超时代数码旗舰店",
    "seller_contact_phone": "0755-12345678",
    "seller_contact_email": "seller@chaoshidai.com"
   }   
}

1. 存储效率对比

维度	嵌套对象(Object)	扁平化结构
存储空间	通常更节省空间(重复字段名只存储一次)	可能多消耗5-15%空间(字段名重复存储)
序列化/反序列化	处理时间较长(需解析嵌套结构)	处理速度快20-30%(线性结构)
压缩率	压缩效果更好(嵌套结构重复模式多)	压缩率低5-10%

2. 写入性能对比

操作类型	嵌套对象	扁平结构
全量更新	性能相当(Object)	性能相当
部分更新	可能需要重写整个嵌套对象	可单独更新字段(快30-50%)
并发写入	容易产生写冲突	字段级锁更高效

3.查询性能

嵌套结构在复杂查询时可能慢10-20%(需要解析多级路径)
扁平结构索引效率更高

总结：扁平化处理占用更多储存空间，但是查询、查询、更新效果更好

建议：使用折中方案，不需要支持搜索，更新不频繁的，可以使用object存储（避免空间占用）；需要支持搜索的，扁平化处理
{
   "order": {
    "order_id": "ORD20231115001",
    "order_date": "2023-11-15T10:30:00Z",
    "seller_seller_id": "SELLER001",       // 扁平化处理
    "seller_seller_name": "超时代数码旗舰店",

    "seller": {   // 保持object的类型
      "contact": {
        "phone": "0755-12345678",
        "email": "seller@chaoshidai.com"
      }
    }
}

nested优化-扁平化

如果nested中明确数量（学生的成绩分为：语文、数学、英语、政治、历史..），可以将对象设置为如下形式（若是大学的课程，课程种类会变化，则不适合）

-- 原始数据 --
{
  "student_name": "zhangsan",
  "score": [
    {
      "type": "chinese",
      "score": 120
    },
    {
      "type": "english",
      "score": 130
    },
    {
      "type": "math",
      "score": 140
    }
  ]
}

扁平化处理后的数据

-- 变更后数据 -- 
{
  "student_name":"zhangsan",
  "score_chinese":120,
  "score_english": 130,
  "score_math": 140
}

es-字段类型详解与选型建议

字段类型详解

1.Text

2. Keyword

3. Numeric

4. Date

5. Boolean

6. Object

7. Nested

8. Geo_point

字段选型的优化建议

keyword、number 选型

keyword、text 选型

nested、Object 对象选型

nested、Object 对比与适用场景

object优化-扁平化

nested优化-扁平化

`keyword、number 选型`

`keyword、text` `选型`