Behind
the TIGR MEV client application is a powerful computational grid that allows
researchers to perform analysis on data sets that could not run
on a single PC. However before select a grid architecture for TIGR MEV extensive
analysis and benchmarking was performed to determine the best approach.
First
the analysis. In TIGR MEV microarray experiments are grouped into single matrix.
In the matrix the number of columns is equal to the number of experiments and the
number of rows is equal to the number of probes (genes) on the microarray chip.
Taking into account the most complicated case the number of probes is equal to 32768
and the number of experiments is equal to 100 we get a combined experiment matrix
that has 32768 rows and 100 columns.
From mathematical point of view there are 32768 vectors in 100-dimensional space. Now
let’s estimate the computational complexity of Relevance Networks. Our implementation
of this algorithm has the complexity ~ O(N2)., where N is the number of vectors
being clustered. In this case the value could be a maximum of 32768. Therefore 327682/2 = 1 073 741 824/2 =
536,870,912 is the number of elementary operations
required to execute the algorithm. The elementary operation is the calculation of the distance
(Euclidian for example) between vectors in 100-dimensional space. This is a very processor intensive operation
and benchmarks indicate two hours on a 600Mhz Pentium III Windows 2000 PC and
the situation is even worse when calculating HCL because it requires ~N3operations.
There
are several different ways to improve calculation speed each with benefits and pitfalls. One
is to decrease the cost of elementary operations using processor code optimizations or MMX-like
instructions however this ties the application to a specific CPU architecture. Another
way is to use a grid that allow for the execution of parallel computations on a cluster of computers. The
problem with a grid is that all analysis algorithms cannot run on the grid because
the calculation cannot be spread across many computers. The best way is to use
both techniques however for this version of TIGR MEV only a grid architecture was used
to boost performance. After a review of grid software was
performed Parallel Virtual Machine³
(PVM)
was selected as the environment to host the grid. While there are many parallel implementations
PVM is free and will run Unix, Linux and Windows hosts. Therefore PVM allows
us to create an inexpensive, fast grid to solve large computational tasks for TIGR MEV.
TIGR MEV is an open source bioinformatics system used for computational microarray analysis. Portions of
this software were developed by DataNaut Inc.; however, all rights and title in and to this software
are owned and retained by The Institute for Genomic Research. If you are interested in obtaining the
software visit the TIGR web site.
DataNaut provides software development consulting services with extensive expertise with microarray
technologies. Organizations that are interested in using DataNaut consulting services or having
TIGR MEV customized for specific research applications can send email to info@datanaut.com.