DataNaut
     Company | Approach | Services | Careers | Contact | Sitemap | Home     
Services
Articles & Whitepapers
The best way to understand what we do is to learn what we’ve done for other businesses and how we did it.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19  contents | back | next


Life on the Grid

Grid computing is becoming an increasingly popular architecture and research indicated that the technology was an excellent fit for TIGR MEV. There are several grid frameworks from Sun and Globus Project amoung others but after our evaluation Parallel Virtual Machine (PVM) was selected.

From the PVM web site, PVM is open source software that enables a collection of heterogeneous computers to be used as a coherent and flexible concurrent computational resource. PVM software executes on each machine in a user-configurable pool, and presents a unified, general, and powerful computational environment of concurrent applications. The PVM system transparently handles message routing, data conversion for incompatible architectures, and other tasks that are necessary for operation in a heterogeneous, network environment.

PVM provides a unified framework within which parallel programs can be developed in an efficient and straightforward manner using existing hardware. This framework enables a collection of heterogeneous computer systems to be viewed as a single parallel virtual machine. We used the PVM framework to build the server implementations of MEV analysis algorithms.

TIGR MEV features parallel versions of HCL, Relevance Networks and SVM and KMC are partially working as of this writing. The Master-Worker design pattern was used to implement these algorithms. The concept is that there is master computer on the network with many worker computers and all are available to lend a hand crunching data. The Master-Worker pattern employs a job metaphor to represent a unit of work on the grid were a job represents a request to perform some analysis function on a provided data set. The Master assigns jobs to the workers and at job completion a worker sends a result back to the master. There is a direct dependency between the size of a job being sent to a worker and workers computation speed a faster worker receives larger jobs. Using this scheduling rule we guarantee proportional workload across all the workers.

After benchmarking our approach to Relevance Network algorithm the empirical formula for calculation time can be stated as T = To/(K-1), where To is the time to calculate Relevance Network on single machine, K is the number of equal computers on the grid. For example if a Relevance Network task takes 1 hr on the single computer (To = 1 hour) how much time will it take run on 4 computers in the grid (K = 4)? The answer can be given by the formula: T = To/(K-1) = (1 hour)/(4-1) = (1 hour)/3 = 20 minutes.

For more information about how the Relevance Network was implemented see the section A Closer Look at the Relevance Network Algorithm. The approach for the algorithm and how it was parallelized is discussed in detail and is an primer for those looking to build parallel software.

Page 14 of 19 contents | back | next



TIGR MEV is an open source bioinformatics system used for computational microarray analysis. Portions of this software were developed by DataNaut Inc.; however, all rights and title in and to this software are owned and retained by The Institute for Genomic Research. If you are interested in obtaining the software visit the TIGR web site.

DataNaut provides software development consulting services with extensive expertise with microarray technologies. Organizations that are interested in using DataNaut consulting services or having TIGR MEV customized for specific research applications can send email to info@datanaut.com.

     Company | Approach | Services | Careers | Contact | Sitemap | Home   © 2012 Datanaut, Inc. All Rights Reserved.