
The best way to understand what we do is to learn what we’ve done for
other businesses and how we did it. |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19
contents |
back |
next
|
Life on the Grid
Grid
computing is becoming an increasingly popular architecture and research indicated that the technology was
an excellent fit for TIGR MEV. There are several grid frameworks from
Sun and Globus
Project amoung others but after our evaluation Parallel Virtual Machine (PVM) was selected.
From the PVM web site, PVM is open source software that
enables a collection of heterogeneous computers to be used as a coherent and flexible concurrent computational
resource. PVM software executes on each machine in a user-configurable pool, and presents a unified, general,
and powerful computational environment of concurrent applications. The PVM system transparently handles message
routing, data conversion for incompatible architectures, and other tasks that are necessary for operation in a
heterogeneous, network environment.
PVM provides a unified framework within which parallel programs can be developed in an efficient and
straightforward manner using existing hardware. This framework enables a collection of heterogeneous
computer systems to be viewed as a single parallel virtual machine. We used the PVM framework to build
the server implementations of MEV analysis algorithms.
TIGR
MEV features parallel versions of HCL, Relevance Networks and SVM and KMC are partially working as of this
writing. The Master-Worker design pattern was used to implement these algorithms. The concept is that
there is master computer on the network with many worker computers and all are available to lend a hand
crunching data. The Master-Worker pattern employs a job metaphor to represent a unit of work on the grid
were a job represents a request to perform some analysis function on a provided data set. The Master
assigns jobs to the workers and at job completion a worker sends a result back to the master. There is a
direct dependency between the size of a job being sent to a worker and workers computation speed a faster
worker receives larger jobs. Using this scheduling rule we guarantee proportional workload across all the workers.
After benchmarking our approach to
Relevance Network algorithm the empirical formula for
calculation time can be stated as T = To/(K-1), where To is
the time to calculate Relevance Network on single machine, K
is the number of equal
computers on the grid. For example if a Relevance Network task takes 1 hr
on the single computer (To =
1 hour) how much time will it take run on 4 computers in the
grid (K = 4)? The answer can be given by the formula: T =
To/(K-1) = (1 hour)/(4-1) = (1 hour)/3 = 20 minutes.
For more information about how the Relevance Network was implemented see the section
A Closer Look at the Relevance Network Algorithm. The approach for the algorithm and how it was
parallelized is discussed in detail and is an primer for those looking to build parallel software.
|
Page 14 of 19
contents |
back |
next
TIGR MEV is an open source bioinformatics system used for computational microarray analysis. Portions of
this software were developed by DataNaut Inc.; however, all rights and title in and to this software
are owned and retained by The Institute for Genomic Research. If you are interested in obtaining the
software visit the TIGR web site.
DataNaut provides software development consulting services with extensive expertise with microarray
technologies. Organizations that are interested in using DataNaut consulting services or having
TIGR MEV customized for specific research applications can send email to info@datanaut.com.
|
|
|