The storage and retrieval of chemical graphical datatypes such as structures and
reactions in relational database systems is a common technique used in academia and
industry alike. Due to the computationally intensive algorithms used for (sub)graphisomorphism
detection, such systems commonly use faster screening mechanisms in
order to reduce the set of potentional match positives before applying aforementioned
algorithms.
Widely used screening mechanisms are based on numerical and binary vectors, called
fingerprints, with a clear dominance of binary fingerprints due to the raw speed
advantage of bitwise operations and compactness in storage. The two most commonly
used types of binary fingerprints are path-generated and substructure-generated, both
of which have specific shortcomings, especially blind spots.
To overcome this shortcomings, the Pgchem::Tigress chemistry extension to the
PostgreSQL object-relational database management system uses a hybrid binary
fingerprint, consisting of an invariant path-generated part and an substructure-generated
part which is externally configurable through a dictionary of substructure patterns.
This thesis presents a novel approach of using dynamic discrete optimization to find an
optimized dictionary configuration for the substructure-generated part of the fingerprint
for arbitrary sets of structural data.
By means of applying the method developed in this thesis, the computational power
neccessary to run a chemical information system can be reduced by 42 percent on
average. By improving the query throughput, upgrading the server hardware to the
next level of computational power can be avoided and thus opportunity revenues of
the operating costs are realized.
Update: It is now readable online.
Friday, July 23, 2010
Monday, July 19, 2010
Thesis defence
The date of my defence is 21.07.2010, 10:00-11:15 UTC+2.
If somebody's willing to cross their fingers for me, it'll be appreciated. :-)
If somebody's willing to cross their fingers for me, it'll be appreciated. :-)
Friday, July 9, 2010
ChemSpider Web API, anyone?
I'm trying to do a structure search using the published ChemSpider Web API: http://www.chemspider.com/WebAPI.aspx
I thought it would be sufficient to get the form they provide, put a molfile into the designated area, add an submit button and it works.
I thought it would be sufficient to get the form they provide, put a molfile into the designated area, add an submit button and it works.
Monday, July 5, 2010
The curious case of the infinite canvas
After some fruitless attempts to extend the usable canvas of DCE ChemPad with some kind of self-made view port, in order to allow drawing of larger-than-screen structures, I think I now found a more painless way.
Simply wrapping the DCE Control in a HorizontalScrollView was not enough, because the ScrollView intercepts too much MotionEvents, but after subclassing HorizontalScrollView and overriding onInterceptTouchEvent(), it seems to work as expected. The code needs some more polishing though to call it working though.
As you can see on the screenshot, the structure now can be scrolled left and right as needed to draw more than would fit on the screen. The thin line on the bottom separates the drawing from the scrolling sensitive area, because otherwise the whole screen would be scroll sensitive and drawing while scrolling is not very precise. The screenshot is from a NexusOne with Android 2.2.
Simply wrapping the DCE Control in a HorizontalScrollView was not enough, because the ScrollView intercepts too much MotionEvents, but after subclassing HorizontalScrollView and overriding onInterceptTouchEvent(), it seems to work as expected. The code needs some more polishing though to call it working though.
As you can see on the screenshot, the structure now can be scrolled left and right as needed to draw more than would fit on the screen. The thin line on the bottom separates the drawing from the scrolling sensitive area, because otherwise the whole screen would be scroll sensitive and drawing while scrolling is not very precise. The screenshot is from a NexusOne with Android 2.2.
Wednesday, June 2, 2010
DCE ChemPad Update 1.1
A few bugs have been fixed (sorry, no autolayout yet :-)), all controls have been moved into the menu and no titlebar anymore to save screen estate. The vibration feedback for fusing atoms works now as expected and you can send the molfile via the phone's messaging systems. Since it seems that attachment handling is broken not fully implemented on Android, the molfile is embedded as text in the message itself. A localized help function was added to the application.
Wednesday, May 26, 2010
DCE ChemPad 1.0 is out in the wild
DCE ChemPad is released for free on the Android Market!
It shows the capabilities of the DCE Chemistry Editor Control to add a chemical editor to arbitrary Android applications.
It was tested on the HTC Nexus One, ACER Liquid S 100, Motorola Milestone/Droid and the Emulator and should work with all Android versions >= 1.5. It does reportedly not work on the Motorola Cliq.
If you have acceess to the Market, please try it and tell me what you think...
Ah, screenshots:
It shows the capabilities of the DCE Chemistry Editor Control to add a chemical editor to arbitrary Android applications.
It was tested on the HTC Nexus One, ACER Liquid S 100, Motorola Milestone/Droid and the Emulator and should work with all Android versions >= 1.5. It does reportedly not work on the Motorola Cliq.
If you have acceess to the Market, please try it and tell me what you think...
Ah, screenshots:
Friday, May 7, 2010
Thursday, April 22, 2010
Solving the 'big finger vs. tiny screen' problem
- Learn how to make custom View controls - check
- Learn to make compound controls - check
- Learn how to paint on the canvas - check
- Design a effective 2D rendering pipeline for undirected graphs- check
- Remember basic planar trigonometry - check
- Design a fuzzy lock-on selection method for the touchscreen - check
- Design a robust backing model - check
- Design an effective UI for the touchscreen - check
- Implement all the nasty details - check
Wednesday, April 14, 2010
NexusOne: Big finger vs. tiny screen
I now have a NexusOne at hand.
And while it is much faster than the Emulator, a very profane problem has come up. It is impossible to precisely draw a chemical structure with an editor designed for the mouse! It is just to sensitive to use it with a finger on a tiny screen. And a pen won't work with a capacitive touchscreen...
Maybe i have to take an intensive look at the android.graphics package.
And while it is much faster than the Emulator, a very profane problem has come up. It is impossible to precisely draw a chemical structure with an editor designed for the mouse! It is just to sensitive to use it with a finger on a tiny screen. And a pen won't work with a capacitive touchscreen...
Maybe i have to take an intensive look at the android.graphics package.
Thursday, April 1, 2010
JavaScript molecule editor roundup
Next to JsDraw which had it's 1.0 release recently, I've found two more pure JavaScript molecule editors:
WebCME which has a lot of features, notably a large library of molecules but is quite painful to use. This is because it's developers have chosen a system of 'select two atoms, add bond, deselect them, select another two atoms, add bond, oh wrong one, delete bond, add correct bond...' for drawing.
The ChemDoodle web components on the other hand have a totally minimalistic, yet powerful UI. Atoms are drawn by mouseclick, bonds drawn by mouse drag. Atom types can be changed via keyboard, bond types by mouseclick and delete is done by the Backspace or Delete key.
Unfortunately none of the three works on mobile browsers. Either they don't work at all or only by half.
In contrast, the jsMolEditor does work even in Android WebViews, but seems to be not under development anymore. I suspect that it was abandoned in favour of JsDraw.
What a pity. Having a (even simple) but working JavaScript molecule editor, that works in Smartphone WebViews would open a whole new world of applications for those devices. They now have the computational power to handle chemical data, but who wants to enter SMILES strings by hand...
WebCME which has a lot of features, notably a large library of molecules but is quite painful to use. This is because it's developers have chosen a system of 'select two atoms, add bond, deselect them, select another two atoms, add bond, oh wrong one, delete bond, add correct bond...' for drawing.
The ChemDoodle web components on the other hand have a totally minimalistic, yet powerful UI. Atoms are drawn by mouseclick, bonds drawn by mouse drag. Atom types can be changed via keyboard, bond types by mouseclick and delete is done by the Backspace or Delete key.
Unfortunately none of the three works on mobile browsers. Either they don't work at all or only by half.
In contrast, the jsMolEditor does work even in Android WebViews, but seems to be not under development anymore. I suspect that it was abandoned in favour of JsDraw.
What a pity. Having a (even simple) but working JavaScript molecule editor, that works in Smartphone WebViews would open a whole new world of applications for those devices. They now have the computational power to handle chemical data, but who wants to enter SMILES strings by hand...
pgchem::tigress 1.2 is out
http://pgfoundry.org/projects/pgchem/
Built and tested against PostgreSQL 8.4.2 with OpenBabel 2.2.3 on XP 32 bit, Windows 7 64 bit and Ubuntu 8.04 LTS 32 bit.
MACCS166 binary fingerprints.
Dice and Tversky similarity.
Small bug fixes.
Built and tested against PostgreSQL 8.4.2 with OpenBabel 2.2.3 on XP 32 bit, Windows 7 64 bit and Ubuntu 8.04 LTS 32 bit.
MACCS166 binary fingerprints.
Dice and Tversky similarity.
Small bug fixes.
Tuesday, March 16, 2010
How to run mx on Android
mx runs on Android!
To make it work, you need to build from sources and remove all references to javax.swing (which doesn't seem to break the rest of the code btw.), since Android does not contain AWT or Swing.
Then repackage the jar and it can be used in Android applications.
Chemistry on the smartphone, yay. :-)
To make it work, you need to build from sources and remove all references to javax.swing (which doesn't seem to break the rest of the code btw.), since Android does not contain AWT or Swing.
Then repackage the jar and it can be used in Android applications.
Chemistry on the smartphone, yay. :-)
Wednesday, March 10, 2010
Changing the public API between minor versions of PostgreSQL
I hate when they do this.
Between 8.3 and 8.4, the API for CREATE OPERATOR CLASS has changed. Now the RECHECK flag is obsolete, letting the index dynamically decide if it is lossy or not.
While this in itself is an improvement, it generates an incompatibility between GiST C code and scripts written for 8.3 and 8.4. Fortunately, the fix seems to be an easy one...
Between 8.3 and 8.4, the API for CREATE OPERATOR CLASS has changed. Now the RECHECK flag is obsolete, letting the index dynamically decide if it is lossy or not.
While this in itself is an improvement, it generates an incompatibility between GiST C code and scripts written for 8.3 and 8.4. Fortunately, the fix seems to be an easy one...
Wednesday, March 3, 2010
Wrapping native libraries with JNA: Dingo
Even with the advent of pure Java chemoinformatics toolkits like MX or the CDK, there is a lot of interesting native code floating around on the net. Unfortunately, wrapping native code with JNI is no real fun.
JNA comes to the rescue. It does all the neccessary loading and marshalling stuff dynamically in the background for you. All you need is a declaration of the native interface, the rest is magic.
Here's an incomplete but working example for Dingo 1.0:
package your_package_here;
import com.sun.jna.Native;
public class NativeDingoWrapper {
static {
Native.register("dingo");
}
public static native int dingoSetOutputFormat(String anOutputFormat);
public static native int dingoSetColoring(int aColoringFlag);
public static native int dingoSetHighlightColorEnabled(int aHighlightFlag);
public static native int dingoSetHighlightThicknessEnabled(int aHighlightThicknessFlag);
public static native int dingoSetStereoOldStyle(int aStereoFlag);
public static native int dingoSetImageSize(int aWidth, int aHeight);
public static native int dingoLoadMolFromString(String aMol);
public static native int dingoLoadMolFromFile(String aFile);
public static native int dingoSetOutputFile(String anOutputFile);
public static native int dingoRender();
public static native int dingoMoleculeIsEmpty();
}
And that's it.
The only drawback of JNA is that it needs a glue DLL specific to the operating system, so theoretically it is more platform limited than JNI.
But since "JNA has been built and tested on OSX (ppc, x86, x86_64), linux (x86, amd64), FreeBSD/OpenBSD (x86, amd64), Solaris (x86, amd64, sparc, sparcv9) and Windows (x86, amd64). It has also been built for windows/mobile and Linux/ppc64, although those platforms are not included in the distribution." this is a quite limited limitation for most cases.
I have successfully wrapped Dingo and Barsoi with JNA and up to now it just works as advertised.
JNA comes to the rescue. It does all the neccessary loading and marshalling stuff dynamically in the background for you. All you need is a declaration of the native interface, the rest is magic.
Here's an incomplete but working example for Dingo 1.0:
package your_package_here
import com.sun.jna.Native;
public class NativeDingoWrapper {
static {
Native.register("dingo");
}
public static native int dingoSetOutputFormat(String anOutputFormat);
public static native int dingoSetColoring(int aColoringFlag);
public static native int dingoSetHighlightColorEnabled(int aHighlightFlag);
public static native int dingoSetHighlightThicknessEnabled(int aHighlightThicknessFlag);
public static native int dingoSetStereoOldStyle(int aStereoFlag);
public static native int dingoSetImageSize(int aWidth, int aHeight);
public static native int dingoLoadMolFromString(String aMol);
public static native int dingoLoadMolFromFile(String aFile);
public static native int dingoSetOutputFile(String anOutputFile);
public static native int dingoRender();
public static native int dingoMoleculeIsEmpty();
}
And that's it.
The only drawback of JNA is that it needs a glue DLL specific to the operating system, so theoretically it is more platform limited than JNI.
But since "JNA has been built and tested on OSX (ppc, x86, x86_64), linux (x86, amd64), FreeBSD/OpenBSD (x86, amd64), Solaris (x86, amd64, sparc, sparcv9) and Windows (x86, amd64). It has also been built for windows/mobile and Linux/ppc64, although those platforms are not included in the distribution." this is a quite limited limitation for most cases.
I have successfully wrapped Dingo and Barsoi with JNA and up to now it just works as advertised.
Monday, February 15, 2010
Chemoinformatics in the browser: Fingerprint similarity calculation
Well there are other things that can be done in JavaScript beyond substructure search. For example, Tanimoto binary fingerprint similarity calculation needs just two short functions:
function popcount(b) {
var c, bi3b = 0xE994;
c = 3 & (bi3b >> ((b << 1) & 14));
c += 3 & (bi3b >> ((b >> 2) & 14));
c += 3 & (bi3b >> ((b >> 5) & 6));
return c;
}
function tanimoto(fp1, fp2) {
var a=0;
var b=0;
var c=0;
for (var i=fp1.length-1; i>=0; i--) {
var block_fp1=fp1[i];
var block_fp2=fp2[i];
a += popcount(block_fp1);
b += popcount(block_fp2);
c += popcount(block_fp1 & block_fp2);
}
return c/(a+b-c);
}
The fingerprints have to be converted into JavaScript arrays of equal length containing signed numbers:
onclick="alert(tanimoto(new Array('1','-1073741825'),new Array('3','2147483647')));"
0.9
function popcount(b) {
var c, bi3b = 0xE994;
c = 3 & (bi3b >> ((b << 1) & 14));
c += 3 & (bi3b >> ((b >> 2) & 14));
c += 3 & (bi3b >> ((b >> 5) & 6));
return c;
}
function tanimoto(fp1, fp2) {
var a=0;
var b=0;
var c=0;
for (var i=fp1.length-1; i>=0; i--) {
var block_fp1=fp1[i];
var block_fp2=fp2[i];
a += popcount(block_fp1);
b += popcount(block_fp2);
c += popcount(block_fp1 & block_fp2);
}
return c/(a+b-c);
}
The fingerprints have to be converted into JavaScript arrays of equal length containing signed numbers:
onclick="alert(tanimoto(new Array('1','-1073741825'),new Array('3','2147483647')));"
0.9
Friday, February 12, 2010
Chemoinformatics in the browser: Firefox catches up
>That's a big difference. Which version of Firefox? If 3.5, have you tried 3.6?
Yes, today. Chrome 4 is not faster than Chrome 3 but Firefox 3.6 now allows jobs of about 50 structures.
Those batch sizes allow for script execution times of about 1 second. The idea behind this is, that this does not interfere with other scripts on a page if the job is running embedded, e.g. in an invisible iframe.
If the page is dedicated, much larger jobs might be possible up to the limit of the browser that triggers the 'A script is not responding' error message.
Update: IE 8.x is twice as fast as IE 6.x, but still slow compared to the competitiors.
Yes, today. Chrome 4 is not faster than Chrome 3 but Firefox 3.6 now allows jobs of about 50 structures.
Browser | max. job size |
Chrome 3.x | 100 |
Chrome 4.x | 100 |
Firefox 3.5.x | 25 |
Firefox 3.6.x | 50 |
IE 6.x | 5 |
IE 8.x | 10 |
Those batch sizes allow for script execution times of about 1 second. The idea behind this is, that this does not interfere with other scripts on a page if the job is running embedded, e.g. in an invisible iframe.
If the page is dedicated, much larger jobs might be possible up to the limit of the browser that triggers the 'A script is not responding' error message.
Update: IE 8.x is twice as fast as IE 6.x, but still slow compared to the competitiors.
Thursday, February 11, 2010
Chemoinformatics in the browser: Chrome finishes first
While developing my little demo in the previous article, I found that different browsers could handle different job sizes depending how fast their JavaScript engines are.
Chrome 3.x finishes first before Firefox 3.x and ye olde IE 6.x is almost unusably slow for substructure searching with JavaScript.
The possible job sizes are:
Thus, the server sizes jobs according to the user-agent header sent:
While I knew that Chrome's JavaScript engine is fast, I didn't expect it to be that dominant.
Chrome 3.x finishes first before Firefox 3.x and ye olde IE 6.x is almost unusably slow for substructure searching with JavaScript.
The possible job sizes are:
Browser | max. job size |
Chrome 3.x | 100 |
Firefox 3.x | 25 |
IE 6.x | 5 |
Thus, the server sizes jobs according to the user-agent header sent:
if (uatype.find('Firefox/3') != -1): timeout = 500 maxsize = 25 elif (uatype.find('Chrome/3') != -1): timeout = 200 maxsize = 100 elif (uatype.find('MSIE') != -1): maxsize = 5 timeout = 1000 else: return
While I knew that Chrome's JavaScript engine is fast, I didn't expect it to be that dominant.
Monday, February 8, 2010
Browsers of the world: Map! Reduce! Map! Reduce!
This article about the idea of collaborative map/reduce in the browser and this one on Depth-First gave me the idea to try something other than distributed word counting: distributed substructure matching.
The server was quickly written in Python, the backend in this case is Postgresql with a table holding the structures as V2000 molfiles in plain text format. No magic so far.
Here's the code of the server.
More interesting might be, how the substructure matching itself is done with 100% JavaScript. Thanks to JSDraw a pure JavaScript structure editor, which on closer inspection has some more interesting tricks up it's sleeve, notably a substructure matching capability, this is doable now.
The server schedules a job of maxsize random molecules from the database and constructs a page containing those molecules as molfiles. After the page has completely loaded in the browser, the matching is done and the page is posted back to the server which parses the result. Once manually started by opening http://:8080/get, the pages keep reloading automatically by means of a meta http-equiv="refresh" in the result page.
Of course, the server is very basic. It notably lacks keeping track of the results and housekeeping to restart broken jobs and uses a hardcoded substructure as search argument.
But it can be done.
The server was quickly written in Python, the backend in this case is Postgresql with a table holding the structures as V2000 molfiles in plain text format. No magic so far.
Here's the code of the server.
More interesting might be, how the substructure matching itself is done with 100% JavaScript. Thanks to JSDraw a pure JavaScript structure editor, which on closer inspection has some more interesting tricks up it's sleeve, notably a substructure matching capability, this is doable now.
The server schedules a job of maxsize random molecules from the database and constructs a page containing those molecules as molfiles. After the page has completely loaded in the browser, the matching is done and the page is posted back to the server which parses the result. Once manually started by opening http://:8080/get, the pages keep reloading automatically by means of a meta http-equiv="refresh" in the result page.
Of course, the server is very basic. It notably lacks keeping track of the results and housekeeping to restart broken jobs and uses a hardcoded substructure as search argument.
But it can be done.
Subscribe to:
Posts (Atom)