02 March 2013
Galaxy has some initial support for build-in multiprocessing. In the datatype definition you can define merge() and split() functions to devide & conquer your input datasets. After processing every splitted file all results will be merged automatically.
I commited today an extended definition of SMILES, InChI and SDF datatypes and enabled the multiprocessing feature in a few tools, like QED and Converters. Depending on your galaxy configuration and the available computer cores your calculations can be X times faster now.
During processing 50.000.000 SMILES I found a small bug in the concatination routine of galaxy. Using cat to merge 1000 of files seems not to be suitable, because the shell has a commandline limit defined in ARG_MAX:
|
The solution is to use python's shutil module and iterate over all splitted files.
|
That solution is not slower than cat, pythonic and it works for a unlimited amount of files. Patch and Pull request is submitted as #141.
Donec libero libero, bibendum non condimentum ac, ullamcorper at sapien.\ Duis feugiat urna vel justo cursus facilisis. Vivamus ligula dui, convalli\ s a varius vitae, facilisis eget magna.