[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]
Re: New paper on Neoaves
You and David hold to the assumption that homoplasy is random, and can be
overcome simply by enlarging the dataset or expanding the taxon sample.
That was a late-night oversight on my part. If some of the taxa in a data
matrix have a base-composition bias, an evolution model estimated from the
data can clearly lead to a wrong result.
Adding taxa seems to be a good idea in general, however.
Shannon M. Hedtke, Ted M. Townsend & David M. Hillis: Resolution of
Phylogenetic Conflict in Large Data Sets by Increased Taxon Sampling,
Systematic Biology 55(3), 522 -- 529 (June 2006)
No abstract, so I'll retype the conclusions:
"No particular number of genes or taxa will guarantee that phylogenetic
reconstruction is accurate, even if bootstrap support for that
reconstruction is high. If conflicting signals between genes are due to
method inconsistency, adding more genes may lead to increasing support for
the incorrect phylogenetic reconstruction. [That's the definition of "method
inconsistency".] In such cases, increasing taxon representation may improve
accuracy more than does increasing gene number. If we incorporate our
understanding of sources of inconsistency into study design, resulting
phylogenies are more likely to be representative of evolutionary history.
For any given study, how can an investigator know whether it is better
to add more characters or add more taxa to a phylogenetic analysis? High
support values for individual clades indicate that sufficient characters
have been collected to converge on a robust result. Unfortunately, the
well-supported result may be wrong, particularly if small trees with long
branches are being estimated. This outcocme appears to be especially likely
when intensively sampled genomes have been selected across relatively few,
distantly related species -- as with model organisms. In such cases, any
slight systematic bias can become magnified and misinterpreted as
phylogenetic signal. High bootstrap or other support values are almost
guaranteed with genome-sized character sets: the analyses will tend to
converge on some answer, even if the answer has more to do with biases in
the analysis than phylogenetic history. Therefore, it is important to
investigate possible sources of systematic bias, such as long-branch
attraction or model misspecification. Simulation studies can help determine
the likelihood of long-branch attraction problems in these situations and
suggest where additional taxon sampling should occur."
Here's one case that highlights the problem. Naylor and Brown (1998) used
over 12,000 bases (from 19 mitochondrial genes) and recovered 100%
bootstrap
support for a clade that comprised vertebrates and echinoderms to the
exclusion of amphioxus (lancelet). This topology was recovered regardless
of the method of analysis.
Since this was 1998 (you know, when some people honestly believed Rodentia
was paraphyletic to the rest of Placentalia, and Passeriformes the
sister-group to all other extant birds), I'll simply blame the model, and
perhaps the fact that Bayesian analysis was not yet available.
But the thing that is obvious about *morphology*-based phylogenetic
analyses is that they are almost always followed by a discussion of which
morphological characters (synapomorphies) unite which taxa. In other
words, it's plain to see the identity of the characters that diagnose
certain clades. This rarely happens with *molecular* clades. Here, the
characters are at the level of genes and amino acids, and the structural
and functional properties of the sequences are skimmed over.
The reason is obvious: molecular characters all look the same and are thus
very boring, with the few exceptions of those that can be treated like
morphological ones (e. g. the unique 9-bp deletion in BRCA1 that is an
autapomorphy of Afrotheria).
BTW, the "Ontogeny Discombobulates Phylogeny" paper (Wiens et al., Syst.
Biol, February 2005) has a couple of clades that have high Bayesian
posterior probabilities and are nevertheless clearly spurious. However,
these based on are morphological data of paedomorphic salamanders (miscoded
as if they were metamorphosed adults).