- Michael Elberfeld, Ilka Schnoor, Till Tantau:
Influence of Tree Topology Restrictions on the Complexity of Haplotyping with Missing Data.
Theoretical Computer Science, 432:38–51, 2012.
Show PDF | Go to website | Show abstract
Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. One fast haplotyping method is based on an evolutionary model where a perfect phylogenetic tree is sought that explains the observed data. Unfortunately, when data entries are missing, which is often the case in laboratory data, the resulting formal problem IPPH, which stands for incomplete perfect phylogeny haplotyping, is NP-complete. Even radically simplified versions, such as the restriction to phylogenetic trees consisting of just two directed paths from a given root, are still NP-complete; but here, at least, a fixed-parameter algorithm is known. Such drastic and ad hoc simplifications turn out to be unnecessary to make IPPH tractable: we present the first theoretical analysis of a parametrized algorithm, which we develop in the course of the paper, that works for arbitrary instances of IPPH. This tractability result is optimal insofar as we prove IPPH to be NP-complete whenever any of the parameters we consider is not fixed, but part of the input.
- Michael Elberfeld, Till Tantau:
Phylogeny- and Parsimony-Based Haplotype Inference with Constraints.
Information and Computation, 213:33–47, 2012.
Show PDF | Go to website | Show abstract
Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. One fast computational haplotyping method is based on an evolutionary model where a perfect phylogenetic tree is sought that explains the observed data. An extension of this approach tries to incorporate prior knowledge in the form of a set of candidate haplotypes from which the right haplotypes must be chosen. The objective is to increase the accuracy of haplotyping methods, but it was conjectured that the resulting formal problem constrained perfect phylogeny haplotyping might be NP-complete. In the paper at hand we present a polynomial-time algorithm for it. Our algorithmic ideas also yield new fixed-parameter algorithms for related haplotyping problems based on the maximum parsimony assumption.
- Michael Elberfeld, Ilka Schnoor, Till Tantau:
Influence of Tree Topology Restrictions on the Complexity of Haplotyping with Missing Data.
In Proceedings of the 6th Annual Conference on Theory and Applications of Models of Computation (TAMC 2009), Volume 5532 of Lecture Notes in Computer Science, pp. 201-210.
Springer,
2009.
Show PDF | Go to website | Show abstract
Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes from genotype data. One fast haplotyping method is based on an evolutionary model where a perfect phylogenetic tree is sought that explains the observed data. Unfortunately, when data entries are missing as is often the case in laboratory data, the resulting incomplete perfect phylogeny haplotyping problem IPPH is NP-complete and no theoretical results are known concerning its approximability, fixed-parameter tractability, or exact algorithms for it. Even radically simplified versions, such as the restriction to phylogenetic trees consisting of just two directed paths from a given root, are still NP-complete; but here a fixed-parameter algorithm is known. We show that such drastic and ad hoc simplifications are not necessary to make IPPH fixed-parameter tractable: We present the first theoretical analysis of an algorithm, which we develop in the course of the paper, that works for arbitrary instances of IPPH. On the negative side we show that restricting the topology of perfect phylogenies does not always reduce the computational complexity: while the incomplete directed perfect phylogeny problem is well-known to be solvable in polynomial time, we show that the same problem restricted to path topologies is NP-complete.
- Michael Elberfeld:
Perfect Phylogeny Haplotyping is Complete for Logspace.
Technical report abs/0905.0602 [cs.CC],
Computing Research Repository (CoRR),
2009.
Go to website | Show abstract
Haplotyping is the bioinformatics problem of predicting likely haplotypes based on given genotypes. It can be approached using Gusfield's perfect phylogeny haplotyping (PPH) method for which polynomial and linear time algorithms exist. These algorithm use sophisticated data structures or do a stepwise transformation of the genotype data into haplotype data and, therefore, need a linear amount of space. We are interested in the exact computational complexity of PPH and show that it can be solved space-efficiently by an algorithm that needs only a logarithmic amount of space. Together with the recently proved L-hardness of PPH, we establish L-completeness. Our algorithm relies on a new characterization for PPH in terms of bipartite graphs, which can be used both to decide and construct perfect phylogenies for genotypes efficiently.
- Jens Gramm, Tzvika Hartman, Till Nierhoff, Roded Sharan, Till Tantau:
On the complexity of SNP block partitioning under the perfect phylogeny model.
Discrete Mathematics, 309(18):5610-5617, 2009.
Go to website | Show abstract
ecent technologies for typing single nucleotide polymorphisms (SNPs) across a population are producing genome-wide genotype data for tens of thousands of SNP sites. The emergence of such large data sets underscores the importance of algorithms for large-scale haplotyping. Common haplotyping approaches first partition the SNPs into blocks of high linkage-disequilibrium, and then infer haplotypes for each block separately. We investigate an integrated haplotyping approach where a partition of the SNPs into a minimum number of non-contiguous subsets is sought, such that each subset can be haplotyped under the perfect phylogeny model. We show that finding an optimum partition is NP-hard even if we are guaranteed that two subsets suffice. On the positive side, we show that a variant of the problem, in which each subset is required to admit a perfect path phylogeny haplotyping, is solvable in polynomial time.
- Jens Gramm, Arfst Nickelsen, Till Tantau:
Fixed-Parameter Algorithms in Phylogenetics.
In Bioinformatics: Volume I: Data, Sequence Analysis and Evolution, Volume 452 of Methods in Molecular Biology, pp. 507-535.
Springer,
2008.
Show abstract
We survey the use of fixed-parameter algorithms in
phylogenetics. A central computational problem in
this field is the construction of a likely phylogeny
(genealogical tree) for a set of species based on
observed differences in the phenotype, on
differences in the genotype, or on given partial
phylogenies. Ideally, one would like to construct
so-called perfect phylogenies, which arise from an
elementary evolutionary model, but in practice one
must often be content with phylogenies whose
"distance from perfection" is as small as
possible. The computation of phylogenies also has
applications in seemingly unrelated areas such as
genomic sequencing and finding and understanding
genes. The numerous computational problems arising
in phylogenetics are often NP-complete, but for many
natural parametrizations they can be solved using
fixed-parameter algorithms.
- Michael Elberfeld, Till Tantau:
Computational Complexity of Perfect-Phylogeny-Related Haplotyping Problems.
Technical report SIIM-TR-A-08-02,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
2008.
Show PDF | Show abstract
Haplotyping, also known as haplotype phase prediction, is the
problem of predicting likely haplotypes based on genotype data. This
problem, which has strong practical applications, can be approached
using both statistical as well as combinatorial methods. While the
most direct combinatorial approach, maximum
parsimony, leads to NP-complete problems, the perfect phylogeny
model proposed by Gusfield yields a problem, called PPH, that can be
solved in polynomial (even linear) time. Even this may
not be fast enough when the whole genome is studied, leading to the
question of whether parallel algorithms can be used to solve the PPH
problem. In the present paper we answer this question affirmatively,
but we also give lower complexity bounds on its complexity. In
detail, we show that the problem lies in Mod$_2$L, a subclass of the
circuit complexity class NC$^2$, and is hard for
logarithmic space and thus presumably not in NC$^1$. We also
investigate variants of the PPH problem that have been studied in
the literature, like the perfect path phylogeny haplotyping problem
and the combined problem where a perfect phylogeny of maximal parsimony
is sought, and show that some of these variants are TC$^0$-complete or
lie in AC$^0$.
- Michael Elberfeld, Till Tantau:
Computational Complexity of Perfect-Phylogeny-Related Haplotyping Problems.
In Proceedings of the 33rd International Symposium on Mathematical Foundations of Computer Science (MFCS 2008), Volume 5162 of Lecture Notes in Computer Science, pp. 299-310.
Springer,
2008.
Go to website | Show abstract
Haplotyping, also known as haplotype phase prediction, is the
problem of predicting likely haplotypes based on genotype data. This
problem, which has strong practical applications, can be approached
using both statistical as well as combinatorial methods. While the
most direct combinatorial approach, maximum
parsimony, leads to NP-complete problems, the perfect phylogeny
model proposed by Gusfield yields a problem, called PPH, that can be
solved in polynomial (even linear) time. Even this may
not be fast enough when the whole genome is studied, leading to the
question of whether parallel algorithms can be used to solve the PPH
problem. In the present paper we answer this question affirmatively,
but we also give lower complexity bounds on its complexity. In
detail, we show that the problem lies in Mod$_2$L, a subclass of the
circuit complexity class NC$^2$, and is hard for
logarithmic space and thus presumably not in NC$^1$. We also
investigate variants of the PPH problem that have been studied in
the literature, like the perfect path phylogeny haplotyping problem
and the combined problem where a perfect phylogeny of maximal parsimony
is sought, and show that some of these variants are TC$^0$-complete or
lie in AC$^0$.
- Michael Elberfeld, Ilka Schnoor, Till Tantau:
Influence of Tree Topology Restrictions on the Complexity of Haplotyping with Missing Data.
Technical report SIIM-TR-A-08-05,
Schriftenreihe der Institute für Informatik/Mathematik der Universität zu Lübeck,
2008.
Show PDF | Show abstract
Haplotyping, also known as haplotype phase prediction, is the
problem of predicting likely haplotypes based on genotype data. One
fast haplotyping method is based on an evolutionary model where a
perfect phylogenetic tree is sought that explains the
observed data. Unfortunately, when data entries are missing, as is
often the case in real laboratory data, the resulting formal problem
IPPH, which stands for incomplete perfect phylogeny
haplotyping, is NP-complete and no theoretical results are known
concerning its approximability, fixed-parameter tractability or
exact algorithms for it. Even radically simplified versions, such as
the restriction to phylogenetic trees consisting of just two directed
paths from a given root, are still NP-complete, but here, at least,
a fixed-parameter algorithm is known. We generalize this algorithm
to arbitrary tree topologies and present the first
theoretical analysis of an algorithm that works on arbitrary
instances of the original IPPH problem. At the same time we
also show that restricting the tree topology does not always make
finding phylogenies easier: while the incomplete directed perfect phylogeny
problem is well-known to be solvable in polynomial time, we show
that the same problem restricted to path topologies is NP-complete.
- Jens Gramm, Arfst Nickelsen, Till Tantau:
Fixed-Parameter Algorithms in Phylogenetics.
The Computer Journal, 51(1):79--101, 2008.
Show PDF | Show abstract
We survey the use of fixed-parameter algorithms in
the field of phylogenetics, which is the study of
evolutionary relationships. The central problem in
phylogenetics is the reconstruction of the
evolutionary history of biological species, but its
methods also apply to linguistics, philology, or
architecture. A basic computational problem is the
reconstruction of a likely phylogeny (genealogical
tree) for a set of species based on observed
differences in the phenotype like color or form of
limbs, based on differences in the genotype like
mutated nucleotide positions in the DNA sequence, or
based on given partial phylogenies. Ideally, one
would like to construct so-called perfect
phylogenies, which arise from a very simple
evolutionary model, but in practice one must often
be content with phylogenies whose ``distance from
perfection'' is as small as possible. The
computation of phylogenies has applications in
seemingly unrelated areas such as genomic sequencing
and finding and understanding genes. The numerous
computational problems arising in phylogenetics
often are NP-complete, but for many natural
parametrizations they can be solved using
fixed-parameter algorithms.