DataNaut
     Company | Approach | Services | Careers | Contact | Sitemap | Home     
Services
Articles & Whitepapers
The best way to understand what we do is to learn what we’ve done for other businesses and how we did it.
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19  contents | back | next


Making the Case For a Grid

Behind the TIGR MEV client application is a powerful computational grid that allows researchers to perform analysis on data sets that could not run on a single PC. However before select a grid architecture for TIGR MEV extensive analysis and benchmarking was performed to determine the best approach.

First the analysis. In TIGR MEV microarray experiments are grouped into single matrix. In the matrix the number of columns is equal to the number of experiments and the number of rows is equal to the number of probes (genes) on the microarray chip. Taking into account the most complicated case the number of probes is equal to 32768 and the number of experiments is equal to 100 we get a combined experiment matrix that has 32768 rows and 100 columns.

From mathematical point of view there are 32768 vectors in 100-dimensional space. Now let’s estimate the computational complexity of Relevance Networks. Our implementation of this algorithm has the complexity ~ O(N2)., where N is the number of vectors being clustered. In this case the value could be a maximum of 32768. Therefore 327682/2 = 1 073 741 824/2 = 536,870,912 is the number of elementary operations required to execute the algorithm. The elementary operation is the calculation of the distance (Euclidian for example) between vectors in 100-dimensional space. This is a very processor intensive operation and benchmarks indicate two hours on a 600Mhz Pentium III Windows 2000 PC and  the situation is even worse when calculating HCL because it requires ~N3operations.

There are several different ways to improve calculation speed each with benefits and pitfalls. One is to decrease the cost of elementary operations using processor code optimizations or MMX-like instructions however this ties the application to a specific CPU architecture. Another way is to use a grid that allow for the execution of parallel computations on a cluster of computers.  The problem with a grid is that all analysis algorithms cannot run on the grid because the calculation cannot be spread across many computers. The best way is to use both techniques however for this version of TIGR MEV only a grid architecture was used to boost performance. After a review of grid software was performed Parallel Virtual Machine³ (PVM) was selected as the environment to host the grid. While there are many parallel implementations PVM is free and will run Unix, Linux and Windows hosts. Therefore PVM allows us to create an inexpensive, fast grid to solve large computational tasks for TIGR MEV.

Page 13 of 19 contents | back | next



TIGR MEV is an open source bioinformatics system used for computational microarray analysis. Portions of this software were developed by DataNaut Inc.; however, all rights and title in and to this software are owned and retained by The Institute for Genomic Research. If you are interested in obtaining the software visit the TIGR web site.

DataNaut provides software development consulting services with extensive expertise with microarray technologies. Organizations that are interested in using DataNaut consulting services or having TIGR MEV customized for specific research applications can send email to info@datanaut.com.

     Company | Approach | Services | Careers | Contact | Sitemap | Home   © 2012 Datanaut, Inc. All Rights Reserved.