Friday, July 23, 2010

Thesis abstract

The storage and retrieval of chemical graphical datatypes such as structures and reactions in relational database systems is a common technique used in academia and industry alike. Due to the computationally intensive algorithms used for (sub)graphisomorphism detection, such systems commonly use faster screening mechanisms in order to reduce the set of potentional match positives before applying aforementioned algorithms.

Widely used screening mechanisms are based on numerical and binary vectors, called fingerprints, with a clear dominance of binary fingerprints due to the raw speed advantage of bitwise operations and compactness in storage. The two most commonly used types of binary fingerprints are path-generated and substructure-generated, both of which have specific shortcomings, especially blind spots.

To overcome this shortcomings, the Pgchem::Tigress chemistry extension to the PostgreSQL object-relational database management system uses a hybrid binary fingerprint, consisting of an invariant path-generated part and an substructure-generated part which is externally configurable through a dictionary of substructure patterns.

This thesis presents a novel approach of using dynamic discrete optimization to find an optimized dictionary configuration for the substructure-generated part of the fingerprint for arbitrary sets of structural data.

By means of applying the method developed in this thesis, the computational power neccessary to run a chemical information system can be reduced by 42 percent on average. By improving the query throughput, upgrading the server hardware to the next level of computational power can be avoided and thus opportunity revenues of the operating costs are realized.

Update: It is now readable online.

Monday, July 19, 2010

Thesis defence

The date of my defence is 21.07.2010, 10:00-11:15 UTC+2.

If somebody's willing to cross their fingers for me, it'll be appreciated. :-)

Friday, July 9, 2010

ChemSpider Web API, anyone?

I'm trying to do a structure search using the published ChemSpider Web API:  http://www.chemspider.com/WebAPI.aspx

I thought it would be sufficient to get the form they provide, put a molfile into the designated area, add an submit button and it works.

But all I get back are blank pages with no content. No error messages, nothing useful.

How is the darn thing meant to be used?

UPDATE: Adding a method="post" to the form did the trick, but apparently only substructure searches with unlimited result set are supported. This results in very poor performance with structures of low selectivity. Still not good

UPDATE 2:

  1. I had an extra line in my molfile.
  2. There is really a bug in the API. Substructure search triggers exact search and vice versa. Since the API is heavily in use with workarounds to this bug, fixing this would break many applications.


Monday, July 5, 2010

The curious case of the infinite canvas

After some fruitless attempts to extend the usable canvas of DCE ChemPad with some kind of self-made view port, in order to allow drawing of larger-than-screen structures, I think I now found a more painless way.

Simply wrapping the DCE Control in a HorizontalScrollView was not enough, because the ScrollView intercepts too much MotionEvents, but after subclassing HorizontalScrollView and overriding onInterceptTouchEvent(), it seems to work as expected. The code needs some more polishing though to call it working though.

































As you can see on the screenshot, the structure now can be scrolled left and right as needed to draw more than would fit on the screen. The thin line on the bottom separates the drawing from the scrolling sensitive area, because otherwise the whole screen would be scroll sensitive and drawing while scrolling is not very precise. The screenshot is from a NexusOne with Android 2.2.

Wednesday, June 2, 2010

DCE ChemPad Update 1.1

A few bugs have been fixed (sorry, no autolayout yet :-)), all controls have been moved into the menu and no titlebar anymore to save screen estate. The vibration feedback for fusing atoms works now as expected and you can send the molfile via the phone's messaging systems. Since it seems that attachment handling is broken not fully implemented on Android, the molfile is embedded as text in the message itself. A localized help function was added to the application.

Wednesday, May 26, 2010

DCE ChemPad 1.0 is out in the wild

DCE ChemPad is released for free on the Android Market!

It shows the capabilities of the DCE Chemistry Editor Control to add a chemical editor to arbitrary Android applications.

It was tested on the HTC Nexus One, ACER Liquid S 100, Motorola Milestone/Droid and the Emulator and should work with all Android versions >= 1.5. It does reportedly not work on the Motorola Cliq.

If you have acceess to the Market, please try it and tell me what you think...

Ah, screenshots:

Thursday, April 22, 2010

Solving the 'big finger vs. tiny screen' problem

  • Learn how to make custom View controls - check
  • Learn to make compound controls - check
  • Learn how to paint on the canvas - check
  • Design a effective 2D rendering pipeline for undirected graphs- check
  • Remember basic planar trigonometry - check
  • Design a fuzzy lock-on selection method for the touchscreen - check
  • Design a robust backing model - check
  • Design an effective UI for the touchscreen - check
  • Implement all the nasty details - check

Wednesday, April 14, 2010

NexusOne: Big finger vs. tiny screen

I now have a NexusOne at hand.

And while it is much faster than the Emulator, a very profane problem has come up. It is impossible to precisely draw a chemical structure with an editor designed for the mouse! It is just to sensitive to use it with a finger on a tiny screen. And a pen won't work with a capacitive touchscreen...

Maybe i have to take an intensive look at the android.graphics package.

Thursday, April 1, 2010

JavaScript molecule editor roundup

Next to JsDraw which had it's 1.0 release recently, I've found two more pure JavaScript molecule editors:

WebCME which has a lot of features, notably a large library of molecules but is quite painful to use. This is because it's developers have chosen a system of 'select two atoms, add bond, deselect them, select another two atoms, add bond, oh wrong one, delete bond, add correct bond...' for drawing.

The ChemDoodle web components on the other hand have a totally minimalistic, yet powerful UI. Atoms are drawn by mouseclick, bonds drawn by mouse drag. Atom types can be changed via keyboard, bond types by mouseclick and delete is done by the Backspace or Delete key.

Unfortunately none of the three works on mobile browsers. Either they don't work at all or only by half.

In contrast, the jsMolEditor does work even in Android WebViews, but seems to be not under development anymore. I suspect that it was abandoned in favour of JsDraw.

What a pity. Having a (even simple) but working JavaScript molecule editor, that works in Smartphone WebViews would open a whole new world of applications for those devices. They now have the computational power to handle chemical data, but who wants to enter SMILES strings by hand...

pgchem::tigress 1.2 is out

http://pgfoundry.org/projects/pgchem/

Built and tested against PostgreSQL 8.4.2 with OpenBabel 2.2.3 on XP 32 bit, Windows 7 64 bit and Ubuntu 8.04 LTS 32 bit.

MACCS166 binary fingerprints.

Dice and Tversky similarity.

Small bug fixes.

Tuesday, March 16, 2010

How to run mx on Android

mx runs on Android!

To make it work, you need to build from sources and remove all references to javax.swing (which doesn't seem to break the rest of the code btw.), since Android does not contain AWT or Swing.

Then repackage the jar and it can be used in Android applications.

Chemistry on the smartphone, yay. :-)

Wednesday, March 10, 2010

Changing the public API between minor versions of PostgreSQL

I hate when they do this.

Between 8.3 and 8.4, the API for CREATE OPERATOR CLASS has changed. Now the RECHECK flag is obsolete, letting the index dynamically decide if it is lossy or not.

While this in itself is an improvement, it generates an incompatibility between GiST C code and scripts written for 8.3 and 8.4. Fortunately, the fix seems to be an easy one...

Wednesday, March 3, 2010

Wrapping native libraries with JNA: Dingo

Even with the advent of pure Java chemoinformatics toolkits like MX or the CDK, there is a lot of interesting native code floating around on the net. Unfortunately, wrapping native code with JNI is no real fun.

JNA comes to the rescue. It does all the neccessary loading and marshalling stuff dynamically in the background for you. All you need is a declaration of the native interface, the rest is magic.

Here's an incomplete but working example for Dingo 1.0:

package your_package_here;
import com.sun.jna.Native;
public class NativeDingoWrapper {

  static {
    Native.register("dingo");
  }

  public static native int dingoSetOutputFormat(String anOutputFormat);
  public static native int dingoSetColoring(int aColoringFlag);
  public static native int dingoSetHighlightColorEnabled(int aHighlightFlag);
  public static native int dingoSetHighlightThicknessEnabled(int aHighlightThicknessFlag);
  public static native int dingoSetStereoOldStyle(int aStereoFlag);
  public static native int dingoSetImageSize(int aWidth, int aHeight);
  public static native int dingoLoadMolFromString(String aMol);
  public static native int dingoLoadMolFromFile(String aFile);
  public static native int dingoSetOutputFile(String anOutputFile);
  public static native int dingoRender();
  public static native int dingoMoleculeIsEmpty();
}


And that's it.

The only drawback of JNA is that it needs a glue DLL specific to the operating system, so theoretically it is more platform limited than JNI.

But since "JNA has been built and tested on OSX (ppc, x86, x86_64), linux (x86, amd64), FreeBSD/OpenBSD (x86, amd64), Solaris (x86, amd64, sparc, sparcv9) and Windows (x86, amd64). It has also been built for windows/mobile and Linux/ppc64, although those platforms are not included in the distribution." this is a quite limited limitation for most cases.

I have successfully wrapped Dingo and Barsoi with JNA and up to now it just works as advertised.

Monday, February 15, 2010

Chemoinformatics in the browser: Fingerprint similarity calculation

Well there are other things that can be done in JavaScript beyond substructure search. For example, Tanimoto binary fingerprint similarity calculation needs just two short functions:

function popcount(b) {
var c, bi3b = 0xE994;
     c  = 3 & (bi3b >> ((b << 1) & 14));
     c += 3 & (bi3b >> ((b >> 2) & 14));
     c += 3 & (bi3b >> ((b >> 5) & 6));
return c;
}

function tanimoto(fp1, fp2) {
var a=0;
var b=0;
var c=0;

for (var i=fp1.length-1; i>=0; i--) {
    var block_fp1=fp1[i];
    var block_fp2=fp2[i];
    a += popcount(block_fp1);
    b += popcount(block_fp2);
    c += popcount(block_fp1 & block_fp2);
}
return c/(a+b-c);
}

The fingerprints have to be converted into JavaScript arrays of equal length containing signed numbers:

onclick="alert(tanimoto(new Array('1','-1073741825'),new Array('3','2147483647')));"

0.9

Friday, February 12, 2010

Chemoinformatics in the browser: Firefox catches up

>That's a big difference. Which version of Firefox? If 3.5, have you tried 3.6?

Yes, today. Chrome 4 is not faster than Chrome 3 but Firefox 3.6 now allows jobs of about 50 structures.

Browsermax. job size
Chrome 3.x100
Chrome 4.x100
Firefox 3.5.x25
Firefox 3.6.x50
IE 6.x5
IE 8.x10

Those batch sizes allow for script execution times of about 1 second. The idea behind this is, that this does not interfere with other scripts on a page if the job is running embedded, e.g. in an invisible iframe.

If the page is dedicated, much larger jobs might be possible up to the limit of the browser that triggers the 'A script is not responding' error message.

Update: IE 8.x is twice as fast as IE 6.x, but still slow compared to the competitiors.

Thursday, February 11, 2010

Chemoinformatics in the browser: Chrome finishes first

While developing my little demo in the previous article, I found that different browsers could handle different job sizes depending how fast their JavaScript engines are.

Chrome 3.x finishes first before Firefox 3.x and ye olde IE 6.x is almost unusably slow for substructure searching with JavaScript.

The possible job sizes are:

Browsermax. job size
Chrome 3.x100
Firefox 3.x25
IE 6.x5

Thus, the server sizes jobs according to the user-agent header sent:
if (uatype.find('Firefox/3') != -1):
timeout = 500
maxsize = 25
elif (uatype.find('Chrome/3') != -1):
timeout = 200
maxsize = 100
elif (uatype.find('MSIE') != -1):
maxsize = 5
timeout = 1000
else:
return

While I knew that Chrome's JavaScript engine is fast, I didn't expect it to be that dominant.

Monday, February 8, 2010

Browsers of the world: Map! Reduce! Map! Reduce!

This article about the idea of collaborative map/reduce in the browser and this one on Depth-First gave me the idea to try something other than distributed word counting: distributed substructure matching.

The server was quickly written in Python, the backend in this case is Postgresql with a table holding the structures as V2000 molfiles in plain text format. No magic so far.

Here's the code of the server.

More interesting might be, how the substructure matching itself is done with 100% JavaScript. Thanks to JSDraw a pure JavaScript structure editor, which on closer inspection has some more interesting tricks up it's sleeve, notably a substructure matching capability, this is doable now.

The server schedules a job of maxsize random molecules from the database and constructs a page containing those molecules as molfiles. After the page has completely loaded in the browser, the matching is done and the page is posted back to the server which parses the result. Once manually started by opening http://:8080/get, the pages keep reloading automatically by means of a meta http-equiv="refresh" in the result page.

Of course, the server is very basic. It notably lacks keeping track of the results and housekeeping to restart broken jobs and uses a hardcoded substructure as search argument.

But it can be done.