A newer version of this documentation is available.

View Latest

MapReduce Views Using the Python SDK with Couchbase Server

You can use MapReduce views to create queryable indexes in Couchbase Server.

The normal CRUD methods allow you to look up a document by its ID. A MapReduce (view query) allows you to look up one or more documents based on various criteria. MapReduce views are comprised of a map function that is executed once per document (this is done incrementally, so this is not run each time you query the view) and an optional reduce function that performs aggregation on the results of the map function. The map and reduce functions are stored on the server and written in JavaScript.

MapReduce queries can be further customized during query time to allow only a subset (or range) of the data to be returned.

See the Incremental MapReduce Views and Querying Data with Views sections of the general documentation to learn more about views and their architecture.

The following example is the definition of a by_name view in a "beer" design document. This view checks whether a document is a beer and has a name. If it does, it emits the beer’s name into the index. This view allows beers to be queried for by name. For example, it’s now possible to ask the question "What beers start with A?"

function (doc, meta) {
    if (doc.type && doc.type == "beer" && doc.name) {
        emit(doc.name, null);
    }
}

A Spatial View can instead be queried with a range or bounding box. For example, let’s imagine we have stored landmarks with coordinates for their home city (eg. Paris, Vienna, Berlin and New York) under geo, and each city’s coordinates is represented as two attributes, lon and lat. The following spatial view map function could be used to find landmarks within Europe, as a "by_location" view in a "spatial" design document:

function (doc, meta) {
    if (doc.type && doc.type == "landmark" && doc.geo) {
        emit([doc.geo.lon, doc.geo.lat], null);
    }
}

Querying Views with the Python SDK

Querying a view through the Python client is performed through the query() method on the Bucket class. This method returns a Python iterator that yields the results of the query (in the form of a ViewRow object). The ViewRow object contains the key and value properties (which are the first and second arguments to the view’s emit() function, respectively) as well as the docid property, which may be passed to the get() method to return the actual document.

bkt = Bucket('couchbase://192.168.33.101/beer-sample')
resiter = bkt.query('beer', 'by_name')
for row in resiter:
    print row

You can also manually construct a couchbase.views.params.Query object, which allows you to set various properties on the query:

from couchbase.bucket import Bucket
from couchbase.views.params import Query

bkt = Bucket('couchbase://192.168.33.101/beer-sample')
q = Query()
q.limit = 5 # Limit to 5 results
q.mapkey_range = ('A','A'+Query.STRING_RANGE_END)

rows = bkt.query('beer', 'by_name', query=q)
for row in rows:
    print row

Here’s some sample output for the previous query:

ViewRow(key=u'A. LeCoq Imperial Extra Double Stout 1999', value=None,
docid=u'harvey_son_lewes-a_lecoq_imperial_extra_double_stout_1999', doc=None)
ViewRow(key=u'A. LeCoq Imperial Extra Double Stout 2000', value=None,
docid=u'harvey_son_lewes-a_lecoq_imperial_extra_double_stout_2000', doc=None)
ViewRow(key=u'Aass Brewery', value=None, docid=u'aass_brewery', doc=None)
ViewRow(key=u'Abana Amber Ale', value=None, docid=u'mickey_finn_s_brewery-abana_amber_ale', doc=None)
ViewRow(key=u'Abbaye de Floreffe Double', value=None,
docid=u'brasserie_lefebvre-abbaye_de_floreffe_double', doc=None)

You can also pass the query string directly. This is useful if you’re using the Couchbase Web UI, or if you’re more familiar with the direct Apache CouchDB interface:

rows = bkt.query('beer', 'by_name',
                 query=Query.from_any("startkey=%22A%22&endkey=%22A%5Cuefff%22&limit=5"))

Querying Geospatial Views

To query a geospatial view, you will need to construct a SpatialQuery object (couchbase.views.params.SpatialQuery). Spatial queries accept a start_range and an end_range parameter which allow you to limit the enclosing bounding boxes of the result. The arguments to these parameters are Python lists or tuples, with each element corresponding to a component emitted by the key (the first two components implicitly being the longitude and latitude of the result itself).

On output, spatial queries yield instances of SpatialRow classes. A SpatialRow is similar to a ViewRow, with an added geometry property.

Querying a spatial view
from couchbase.views.params import SpatialQuery

q = SpatialQuery(start_range=[0, -90, None], end_range=[180, 90, None])
for row in bkt.query('geodesign', 'geoview', query=q):
    print "Key:", row.key
    print "Value:", row.value
    print "Geometry", row.geometry

Querying views using the Twisted API

Because the normal couchbase.bucket's query() interfaces uses a blocking interface centered around iterators, the txcouchbase provides two different methods for querying.

The first is the queryAll() method which returns a Deferred object, which has its callback invoked with a list of all rows:

from twisted.internet import reactor
from txcouchbase.bucket import Bucket
from couchbase.exceptions import CouchbaseError

def on_view_rows(rows):
    for row in rows:
        print row

bkt = Bucket('couchbase://192.168.33.101/beer-sample')
d = bkt.queryAll("beer", "brewery_beers", limit=5)
d.addCallback(on_view_rows)
reactor.run()

A more advanced way to retrieve view results is by using a RowHandler. The RowHandler is a class that should be subclassed by the user. The subclass must implement one method that is invoked when new results are available and another method that is invoked when all results have been completed:

from twisted.internet import reactor
from txcouchbase.bucket import Bucket
from couchbase.async.view import AsyncViewBase

class MyRowHandler(AsyncViewBase):
    def on_rows(self, rows):
        print "Got new set of rows"
        for row in rows:
            print "   ROW:", row

    def on_done(self):
        print "All rows received!"

bkt = Bucket('couchbase://192.168.33.101/beer-sample')
bkt.queryEx(MyRowHandler, "beer", "brewery_beers", limit=50)
reactor.run()