jugad2 - Vasudev Ram on software innovation: bsddb

Thursday, January 23, 2014

Publish Berkeley DB data to PDF with xtopdf

Berkeley DB (sometimes called BDB or BSD DB) is an embedded (*) key-value database with a long history and a huge user base. It is quite fast and supports very large data sizes. Berkeley DB was developed by Sleepycat Software which was acquired by Oracle some years ago.

(*) "embedded", in the sense of, not client-server, it is a database library that gets linked with your application; not "embedded" in the sense of software embedded in hardware devices, although Berkeley DB can also be embedded in the second sense, since it is small in size.

Excerpt from the Wikipedia article about Berkeley DB linked above:

[ Berkeley DB (BDB) is a software library that provides a high-performance embedded database for key/value data. Berkeley DB is written in C with API bindings for C++, C#, PHP, Java, Perl, Python, Ruby, Tcl, Smalltalk, and many other programming languages. BDB stores arbitrary key/data pairs as byte arrays, and supports multiple data items for a single key. Berkeley DB is not a relational database.[1]
BDB can support thousands of simultaneous threads of control or concurrent processes manipulating databases as large as 256 terabytes,[2] on a wide variety of operating systems including most Unix-like and Windows systems, and real-time operating systems. ]

[ Incidentally, Mike Olson, the former CEO of Sleepycat Software, is now the Chief Strategy Officer of Cloudera, which I blogged about here today:

Cloudera's Impala engine - SQL querying of Hadoop data. ]

I've used Berkeley DB off and on, from before Sleepycat Software was acquired by Oracle, and including via at least C, Python and Ruby.

Today I thought of writing a program that enables a user to publish the data in a Berkeley DB database to PDF, using my xtopdf toolkit for PDF creation. Here is the program, BSDDBToPDF.py:

# BSDDBToPDF.py

# Program to convert Berkeley DB (BSD DB) data to PDF.
# Uses Python's bsdd library (deprecated in Python 3),
# and xtopdf.
# Author: Vasudev Ram - http://www.dancingbison.com

import sys
import bsddb
from PDFWriter import PDFWriter

try:
    # Flag 'c' opens the DB read/write and doesn't delete it if it exists.
    fruits_db = bsddb.btopen('fruits.db', 'c')
    fruits = [
            ('apple', 'The apple is a red fruit.'),
            ('banana', 'The banana is a yellow fruit.'),
            ('cherry', 'The cherry is a red fruit.'),
            ('durian', 'The durian is a yellow fruit.')
            ]
    # Add the key/value fruit records to the DB.
    for fruit in fruits:
        fruits_db[fruit[0]] = fruit[1]
    fruits_db.close()

    # Read the key/value fruit records from the DB and write them to PDF.
    with PDFWriter("fruits.pdf") as pw:
        pw.setFont("Courier", 12)
        pw.setHeader("BSDDBToPDF demo: fruits.db to fruits.pdf")
        pw.setFooter("Generated by xtopdf")
        fruits_db = bsddb.btopen('fruits.db', 'c')
        print "FRUITS"
        print
        pw.writeLine("FRUITS")
        pw.writeLine(" ")
        for key in fruits_db.keys():
            print key
            print fruits_db[key]
            print
            pw.writeLine(key)
            pw.writeLine(fruits_db[key])
            pw.writeLine(" ")
        fruits_db.close()

except Exception, e:
    sys.stderr.write("ERROR: Caught exception: " + repr(e) + "\n")
    sys.exit(1)

And here is a screenshot of the PDF output of the program:

- Vasudev Ram - Dancing Bison Enterprises

Contact Page

Share |

Friday, April 19, 2013

sqlite3dbm, an SQLite-backed dbm module

By Vasudev Ram

Saw this today. It seems to be on the Github account of Yelp.com.

They created it as a tool to help with Hadoop work on Amazon EMR (Elastic Map Reduce).

sqlite3dbm provides a SQLite-backed dictionary conforming to the dbm interface, along with a shelve class that wraps the dict and provides serialization for it.

I tried it out, and it worked as advertised.

How to use sqlite3dbm:

Import the module, use its open() method to create an SQLite database, getting back a handle to it, let's call it "db", then use Python dict syntax on db to store data.

Then, either in the same or another program later, you can again fetch and/or modify that data, with dict syntax.

Interesting idea. dbm modules, which implement key-value stores, are less powerful than relational databases (SQL), and were probably developed earlier (think ISAM, etc.), so it looks a bit backwards to implement a dbm-type store on top of SQLite. But the sqlite3dbm project page gives at least some justification for that:

[ This module was born to provide random-access extra data for Hadoop jobs on Amazon’s Elastic Map Reduce (EMR) cluster. We used to use bsddb for this because of its dead-simple dict interface. Unfortunately, bsddb is deprecated for removal from the standard library and also has inter-version compatability problems that make it not work on EMR. sqlite3 is the obvious alternative for a persistent store, but its powerful SQL interface can be too complex when you just want a dict. Thus, sqlite3dbm was born to provide a simple dictionary API on top of the ubiquitous and easily available sqlite3.

This module requres no setup or configuration once installed. Its goal is a stupid-simple solution whenever a persistent dictionary is desired. ]

- Vasudev Ram - Dancing Bison Enterprises

Share |

jugad2 - Vasudev Ram on software innovation

Pages

Thursday, January 23, 2014

Publish Berkeley DB data to PDF with xtopdf

Friday, April 19, 2013

sqlite3dbm, an SQLite-backed dbm module

Blog Archive

Labels