MapReduce Views Using the Python SDK with Couchbase Server
You can use MapReduce views to create queryable indexes in Couchbase Server.
The normal CRUD methods allow you to look up a document by its ID. A MapReduce (view query) allows you to look up one or more documents based on various criteria. MapReduce views are comprised of a map function that is executed once per document (this is done incrementally, so this is not run each time you query the view) and an optional reduce function that performs aggregation on the results of the map function. The map and reduce functions are stored on the server and written in JavaScript.
MapReduce queries can be further customized during query time to allow only a subset (or range) of the data to be returned.
See the Incremental MapReduce Views and Querying Data with Views sections of the general documentation to learn more about views and their architecture. |
The following example is the definition of a by_name
view in a "beer" design document.
This view checks whether a document is a beer and has a name.
If it does, it emits the beer’s name into the index.
This view allows beers to be queried for by name.
For example, it’s now possible to ask the question "What beers start with A?"
function (doc, meta) {
if (doc.type && doc.type == "beer" && doc.name) {
emit(doc.name, null);
}
}
A Spatial View can instead be queried with a range
or bounding box.
For example, let’s imagine we have stored landmarks with coordinates for their home city (eg.
Paris, Vienna, Berlin and New York) under geo
, and each city’s coordinates is represented as two attributes, lon
and lat
.
The following spatial view map function could be used to find landmarks within Europe, as a "by_location" view in a "spatial" design document:
function (doc, meta) {
if (doc.type && doc.type == "landmark" && doc.geo) {
emit([doc.geo.lon, doc.geo.lat], null);
}
}
Querying Views with the Python SDK
Querying a view through the Python client is performed through the query()
method on the Bucket
class.
This method returns a Python iterator that yields the results of the query (in the form of a ViewRow
object).
The ViewRow
object contains the key
and value
properties (which are the first and second arguments to the view’s emit()
function, respectively) as well as the docid
property, which may be passed to the get()
method to return the actual document.
bkt = Bucket('couchbase://192.168.33.101/beer-sample')
resiter = bkt.query('beer', 'by_name')
for row in resiter:
print row
You can also manually construct a couchbase.views.params.Query
object, which allows you to set various properties on the query:
from couchbase.bucket import Bucket
from couchbase.views.params import Query
bkt = Bucket('couchbase://192.168.33.101/beer-sample')
q = Query()
q.limit = 5 # Limit to 5 results
q.mapkey_range = ('A','A'+Query.STRING_RANGE_END)
rows = bkt.query('beer', 'by_name', query=q)
for row in rows:
print row
Here’s some sample output for the previous query:
ViewRow(key=u'A. LeCoq Imperial Extra Double Stout 1999', value=None, docid=u'harvey_son_lewes-a_lecoq_imperial_extra_double_stout_1999', doc=None) ViewRow(key=u'A. LeCoq Imperial Extra Double Stout 2000', value=None, docid=u'harvey_son_lewes-a_lecoq_imperial_extra_double_stout_2000', doc=None) ViewRow(key=u'Aass Brewery', value=None, docid=u'aass_brewery', doc=None) ViewRow(key=u'Abana Amber Ale', value=None, docid=u'mickey_finn_s_brewery-abana_amber_ale', doc=None) ViewRow(key=u'Abbaye de Floreffe Double', value=None, docid=u'brasserie_lefebvre-abbaye_de_floreffe_double', doc=None)
You can also pass the query string directly. This is useful if you’re using the Couchbase Web UI, or if you’re more familiar with the direct Apache CouchDB interface:
rows = bkt.query('beer', 'by_name',
query=Query.from_any("startkey=%22A%22&endkey=%22A%5Cuefff%22&limit=5"))
Querying Geospatial Views
To query a geospatial view, you will need to construct a SpatialQuery
object (couchbase.views.params.SpatialQuery
).
Spatial queries accept a start_range
and an end_range
parameter which allow you to limit the enclosing bounding boxes of the result.
The arguments to these parameters are Python lists or tuples, with each element corresponding to a component emitted by the key (the first two components implicitly being the longitude and latitude of the result itself).
On output, spatial queries yield instances of SpatialRow
classes.
A SpatialRow
is similar to a ViewRow
, with an added geometry
property.
from couchbase.views.params import SpatialQuery
q = SpatialQuery(start_range=[0, -90, None], end_range=[180, 90, None])
for row in bkt.query('geodesign', 'geoview', query=q):
print "Key:", row.key
print "Value:", row.value
print "Geometry", row.geometry
Querying views using the Twisted API
Because the normal couchbase.bucket
's query()
interfaces uses a blocking interface centered around iterators, the txcouchbase
provides two different methods for querying.
The first is the queryAll()
method which returns a Deferred
object, which has its callback invoked with a list of all rows:
from twisted.internet import reactor
from txcouchbase.bucket import Bucket
from couchbase.exceptions import CouchbaseError
def on_view_rows(rows):
for row in rows:
print row
bkt = Bucket('couchbase://192.168.33.101/beer-sample')
d = bkt.queryAll("beer", "brewery_beers", limit=5)
d.addCallback(on_view_rows)
reactor.run()
A more advanced way to retrieve view results is by using a RowHandler
.
The RowHandler
is a class that should be subclassed by the user.
The subclass must implement one method that is invoked when new results are available and another method that is invoked when all results have been completed:
from twisted.internet import reactor
from txcouchbase.bucket import Bucket
from couchbase.async.view import AsyncViewBase
class MyRowHandler(AsyncViewBase):
def on_rows(self, rows):
print "Got new set of rows"
for row in rows:
print " ROW:", row
def on_done(self):
print "All rows received!"
bkt = Bucket('couchbase://192.168.33.101/beer-sample')
bkt.queryEx(MyRowHandler, "beer", "brewery_beers", limit=50)
reactor.run()