18

I'm not sure if this is the right place for this but...

I am looking for a list of computationally "hard" problems such that if a problem from this list could be solved effectively, it would be (significantly, or otherwise) beneficial in some form or another to the biology community.

Some examples I have found (or atleast were tagged as "np-hard" problems):

Multiple sequence alignment problem
Protein threading / design problem
Map / sequence assembly problem

The list does not have to be extensive, but hopefully more than a few.

Thank you!

(Also, I couldn't think of any more tags to add so feel free to help out there as well.)

Note: Same question at biostars: https://www.biostars.org/p/98112/

Colton
  • 297
  • 2
  • 6

3 Answers3

4

I'm no expert in computational biology but I am very much interested and do some big data analysis using R for my own projects so I will try to provide some information.

This (http://www.ncbi.nlm.nih.gov/books/NBK25461/) is an excellent book talking about all the grand challenges in computational biology and the emerging fields so I definitely recommend looking through at least the list on the page, if you don't want to read the entire book. To me all this points to the fact that we are in the big data era, where we have masses of data but making sense of them all and putting them into prospective and analysing them in a way to show us new insights into problems is difficult.

You might also be interested in BOINC, which runs a few computationally challenging projects through crowd sourcing/grid computing (http://boinc.berkeley.edu/projects.php). In more theoretical aspects, here is a list of unsolved problems (http://en.wikipedia.org/wiki/List_of_unsolved_problems_in_mathematics), solving of which will excel our understanding of many modelling problems, which percolate in biological problems as well although I'm not at all an expert in biological systems modelling.

Hope this helps!

Behzad Rowshanravan
  • 2,768
  • 1
  • 18
  • 35
3

De-novo assembly

To construct a genome from short reads, it's necessary to construct a graph of those reads. We do it by breaking the reads into k-mers and assemble them into a graph.

enter image description here

In this example, we have a k-mer of 3. We can reconstruct the genome by visiting each node once as in the diagram. This is known as Hamiltonian path.

Unfortunately, constructing such path is NP-hard. It's not possible to derive an efficient algorithm for solving it. Instead, in bioinformatics we construct a Eulerian cycle where an edge represents an overlap in two k-mers.

enter image description here

SmallChess
  • 1,019
  • 12
  • 24
1

This is more of a long comment than an answer, but I thought it might still interest you. We usually care about NP-hardness because it means that we can't do something or solve some problem. But that isn't the only intractable complexity class that should be important to biologists.

If we re-frame NP problems as optimization problems (instead of the strict definition as decision problems) then what NP-hardness usually says is that we can't (in general and in reasonable time) find the global optimum. However, in a lot of evolutionary settings we wouldn't even care about a global optimum, we'd be happy with a local one.

The difficulty of finding local optima is captured by the complexity class known as Polynomial Local Search (PLS). And here we also have surprising complexity barriers. For example, Kauffman's NK-model of static fitness landscapes (also, see this blog post and SE question for overview) is not only NP-hard [Weinberger, E. (1996), "NP-completeness of Kauffman's N-k model, a Tuneably Rugged Fitness Landscape", Santa Fe Institute Working Paper, 96-02-003] but also PLS-hard (Kaznatcheev, 2017; see here for an overview or here for the final paper). This means that not only can arbitrary evolutionary dynamics not find global optima, but in general they can't even always find local optima on these finite (but exponentially large) landscapes. Suggesting that local optimality might not be a reasonable assumption even on static landscapes.

Artem Kaznatcheev
  • 3,012
  • 23
  • 60