Advanced Settings and Usage
The Couchbase Elasticsearch plug-in is highly configurable and provides many setting to customize how your documents are indexed.
Configuration for the plug-in is specified as part of the Elasticsearch configuration file (usually elasticsearch.yml) and is currently only read when Elasticsearch starts.
Basic Settings
-
couchbase.port
—The port the plug-in will listen on. The default port is 9091. -
couchbase.username
—The username for HTTP basic authentication. The default username is Administrator. -
couchbase.password
—The password for HTTP basic authentication. No default password. -
couchbase.num_vbuckets
—The number of vBuckets that Elasticsearch should pretend to have. The default value is 64 vBuckets on OS X and 1024 vBuckets on all other platforms). This setting must match the number of vBuckets on the source Couchbase cluster. -
couchbase.maxConcurrentRequests
—The number of concurrent requests that the plug-in will allow. The default is 1024 requests. If the load on the machine gets too high, you can lower the number of requests allowed.
Advanced Settings
-
couchbase.ignoreFailures
—Enabling this flag will cause the plug-in to return a success status to Couchbase even if it cannot index some of the documents. This will prevent the XDCR replication from being stalled due to indexing errors in Elasticsearch, for example when a schema change breaks some of the ES type mappings. The default is false. -
couchbase.ignoreDeletes
—Specifying one or more index names here will cause the plug-in to ignore document deletion and expiration for those indexes. This can be used to turn Elasticsearch into a sort of searchable archive for a Couchbase bucket. This also means that the index will continue to grow indefinitely. -
couchbase.wrapCounters
—Enabling this flag will cause the plug-in to wrap integer values from Couchbase, which are not valid JSON documents, in a simple document before indexing them in Elasticsearch. The resulting document is in the format{ "value" : <value> }
and is stored under the ID of the original value from Couchbase.
Mapping Couchbase Documents to Elasticsearch Types
-
couchbase.typeSelector
—The type selector class to use for mapping documents to types.-
org.elasticsearch.transport.couchbase.capi.DefaultTypeSelector
—Maps all documents to the specified type. As the name implies, this is the default type selector and can be omitted from the configuration file.-
couchbase.typeSelector.defaultDocumentType
—The document type to which the DefaultTypeSelector will map all documents. Defaults to "couchbaseDocument". -
couchbase.typeSelector.checkpointDocumentType
—The document type to which replication checkpoint documents will be mapped. Defaults to "couchbaseCheckpoint".
-
-
org.elasticsearch.transport.couchbase.capi.DelimiterTypeSelector
—If the document ID is of the format<type><delimiter><*>
, this type selector will map these documents to the type<type>
, otherwise it will use theDefaultTypeSelector
for the type mapping. The default delimiter is:
, so for example a document with the IDuser:123
will be indexed under the typeuser
.-
couchbase.typeSelector.documentTypeDelimiter
—Optional. The delimiter to use for theDelimiterTypeSelector
. Default is:
.
-
-
org.elasticsearch.transport.couchbase.capi.GroupRegexTypeSelector
—Maps documents that match the specified regular expression with a capture group namedtype
. If the document doesn’t match the regular expression, or the regular expression doesn’t define a capture group namedtype
, theDefaultTypeSelector
is used instead.-
couchbase.typeSelector.documentTypesRegex
—Specified the regular expression for mapping Couchbase document IDs to Elasticsearch types. Example:^(?<type>\w+)::.+$
will map document IDs of the format<type>::<stuff>
to the type<type>
, so the IDuser::123
will be indexed under the typeuser
.
-
-
org.elasticsearch.transport.couchbase.capi.RegexTypeSelector
—Maps document IDs that match the specified regular expressions to the named types. If the ID doesn’t match any of the specified expressions,DefaultTypeSeletor
is used to select the type.-
couchbase.typeSelector.documentTypesRegex.*
—Specifies a regular expression with a named type. For example,couchbase.typeSelector.documentTypesRegex.users: ^user-.+$
will map all document IDs that start with the stringuser-
to the typeusers
.
-
-
Mapping Parent-child Relationships
-
couchbase.parentSelector —The parent selector class to use for mapping child documents to parents. Note that because of the nature of XDCR, it’s possible that the child document will be replicated before the parent, leading to unpredictable behavior on the Elasticsearch side.
-
org.elasticsearch.transport.couchbase.capi.DefaultParentSelector
—Maps documents to parents according to a predefined map of types to field names.-
couchbase.parentSelector.documentTypeParentFields.*
—Specifies which document field contains the ID of the parent document for that particular type. For example,couchbase.parentSelector.documentTypeParentFields.order: doc.user_id
will set the parent ID of all documents in the typeorder
to the value of theuser_id
field.
-
-
org.elasticsearch.transport.couchbase.capi.RegexParentSelector
—Maps documents to parents according to a specified regular expression with the capture groupparent
. Optionally lets you specify the format for the parent document ID.-
couchbase.parentSelector.documentTypesParentRegex.*
—A named regular expression for matching the parent document ID. For example,couchbase.documentTypesParentRegex.typeA: ^typeA::(?<parent>.+)
with the document IDtypeA::123
will use123
as the parent document ID. -
couchbase.parentSelector.documentTypesParentFormat.*
—Specifies an optional format for the parent document ID matched by the regular expression above. Uses<parent>
as the placeholder for the matched ID. For example,couchbase.documentTypesParentFormat.typeA: parentType::<parent>
with the previous example will produce the parent document IDparentType::123
.
-
-
Specifying Custom Document Routing
-
couchbase.documentTypeRoutingFields.*
—A mapping of types to custom document routing paths. For example, specifyingcouchbase.documentTypeRoutingFields.users: user_id
will use the fielduser_id
as the custom routing path for typeusers
.
Filtering Documents on the Elasticsearch Side
-
couchbase.keyFilter
—The document filter class to use for filtering documents on the plug-in side. Couchbase sends all documents through XDCR no matter what, and the document filter simply chooses whether to index or ignore certain documents according to their ID.-
org.elasticsearch.transport.couchbase.capi.DefaultKeyFilter
—The default filter, which lets all documents through. Can be omitted from the configuration file. -
org.elasticsearch.transport.couchbase.capi.RegexKeyFilter
-
couchbase.keyFilter.type
—[include|exclude]
Specifies whether the filter will include or exclude the documents based on the matched regular expression. If you chooseinclude
, then only documents with IDs that match one of the regular expressions will be indexed. Ifexclude
, then only documents that do not match any of the regular expressions will be indexed. -
couchbase.keyFilter.keyFiltersRegex.
—Specifies one or more regular expressions to match against the document ID before indexing them in Elasticsearch. For example,couchbase.keyFilter.type: exclude
+couchbase.keyFilter.keyFiltersRegex.temp: ^temp.$
will cause the plug-in to ignore any documents whose IDs start withtemp
.
-
-
Understanding Metadata
As you get more advanced in your usage of the Couchbase Elasticsearch plug-in, it might be helpful for you to understand what is actually sent via the plug-in and how Elasticsearch uses it. When you send a JSON document to Couchbase Server to store, it looks similar to the following:
{ "name": "Green Monsta Ale", "abv": 7.3, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "wachusetts_brewing_company", "updated": "2010-07-22 20:00:20", "description": "A BIG PALE ALE with an awesome balance of Belgian malts with Fuggles and East Kent Golding hops.", "style": "American-Style Strong Pale Ale", "category": "North American Ale" }
Here we have a JSON document with all the information for a beer in our application. When Couchbase stores this document, it adds metadata about the document so that we now have JSON in Couchbase that looks like this:
{ { "id": "wachusetts_brewing_company-green_monsta_ale", "rev": "1-00000005ce01e6210000000000000000", "expiration": 0, "flags": 0, "type": "json" }, { "name": "Green Monsta Ale", "abv": 7.3, "ibu": 0, "srm": 0, "upc": 0, "type": "beer", "brewery_id": "wachusetts_brewing_company", "updated": "2010-07-22 20:00:20", "description": "A BIG PALE ALE with an awsome balance of Belgian malts with Fuggles and East Kent Golding hops.", "style": "American-Style Strong Pale Ale", "category": "North American Ale" } }
The metadata that Couchbase Server stores with our beer document contains the key for the document, an internal revision number, expiration, flags and the type of document. When Couchbase Server replicates data to Elasticsearch via the plug-in, it sends this entire JSON including the metadata. Elasticsearch will then index the document and will store the following JSON with document metadata:
{ "id": "wachusetts_brewing_company-green_monsta_ale", "rev": "1-00000005ce01e6210000000000000000", "expiration": 0, "flags": 0, "type": "json" }
And finally when you query Elasticsearch and get a result set, it will contain the document metadata only:
{ took: 22 timed_out: false _shards: { total: 5 successful: 5 failed: 0 }, hits: { total: 1 max_score: 0.18642133 hits: [ { _index: beer-sample _type: couchbaseDocument _id: wachusetts_brewing_company-green_monsta_ale _score: 0.18642133 _source: { meta: { id: wachusetts_brewing_company-green_monsta_ale rev: 1-00000005ce01e6210000000000000000 flags: 0 expiration: 0 } } } ] } }