Question: Extreme Parallelization For Ngs Analysis
0
Josh Tietjen • 10 wrote:
I'd like to start an open discussion on the topic of parallelization
for
NGS data. I noticed that Galaxy recently came out with a cloud-based
interface using Amazon EC3. I've been trying to learn more about how
these
NGS analysis algorithms (for alignment, assemly, etc.) are actually
implemented in a parallel fashion, but I have had trouble finding
specific
documentation and resources describing how it works and how it is
implemented. Any direction/resources that people can provide would be
much
appreciated.
Also, I have seen some papers describing parallelization of various
specific algorithms, especially recently (such as PASQUAL from Georgia
Tech), but they all seem to be operating on relatively "small"
networks of
distributed computing resources. Does anyone have any idea about how
far
the parallelization and speeding up of these analyses can be pushed?
How
difficult would it to be to implement something that runs on a
distributed
network of say 100,000 computers, or even more... say a million? Is
there a
bottleneck somewhere that would prevent that from being feasible for
NGS
analysis? Or would that make the analyses amazingly fast compared to
what's
available now? I'm thinking of a system like what the SETI project has
set
up for their distributed computing user base and wondering what the
limits
are and how one could implement such a system if the user base is
already
in place.