[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]
A "new" phylogeny & methodology
Is it just me, or do other people find this endless hunt for ever more
"characters" just plain CRAZY? The computer will, of course, grind out
trees
as long as you give it "characters" to chew on, but do you really BELIEVE
in
the resulting phylogenies, or are they just so much nonsense, to be
discarded
when the next batch of computer-generated trees, crunching even more
"characters," emerges? If so, why not hold out until you've found ALL
possible "characters," grind out the one big tree and be done with it?
As I see it (perhaps incorrectly), one could imagine a theoretical matrix
of thousands, perhaps tens of thousands, of characters, resulting from an
incredibly detailed analysis of thousands of specimens. Would this tell us
anything useful? My guess would be: no, not really -- it would be loaded
with "noise." Using such an analysis, one could, for example, cite every
percent grade in the length of a limb bone as a separate state -- something
like "Ratio of femur/metatarsus length 0 = 50%, 1 = 49%, 2 = 48%...", etc.,
but surely this kind of thing is well within the range of individual
variation and is statistical noise. And that is the heart of the problem of
"too many characters" -- once you have populations of specimens large enough
and if you peer at them long enough, you're going to start seeing
differences that are due solely to individual variation (or, on a slightly
larger scale, clinal variation), but a cladistic-type analysis can't tell
you that; it will still produce a branching graphic placing all the
specimens into some kind of order when, in reality, no such order exists.
(Although I can see why it would be a useful tool, this is the biggest
danger inherent in doing phylogenetic analyses with individual specimens
instead of species or other taxa.)
At the other end of the spectrum, if one is too general with one's
characters (or has too few), then the analysis can't make any sense out of
real divergence patterns at all. I have some problems with analyses that
say things like "Prezygapophyses on caudal vertebrae 0 = short; 1 = long."
This doesn't tell anyone anything -- what is short??? what is long??? These
terms need very explicit explanations (short, for example, may mean "<50%
length of centrum" or "does not extend beyond cranial end of centrum." If
they're ambiguous, then they aren't scientific because they can't be
reanalyzed and retested by other scientists -- they can't read the mind of
the original author to determine what, in his/her mind, "long" and "short"
are, and as we are all aware, one person's "short" is another person's
"long!" An analysis with characters thus defined, and with too few
characters, is far more likely to result in erroneous and/or polytomic trees
instead of something more sensible.
Lastly, one must be quite careful in separating character states -- not
functional decouplings (which is a whole 'nother ball o' wax...), but in
having what are better stated as single, multi-state characters distributed
throughout a matrix as separate, bi-modal characters. For example, some
analyses of the past have cited the astragalar ascending process:tibia
length ratio frequently -- one character would be if the process is 0 = <
1/6 the length of the tibi; 1 = > 1/6, and later we would see the same
character state again, but with 0 = < 1/4 the length of the tibia; 1 = >
1/4. This is much more simply stated as a single character with states: 0 =
< 1/6; 1 = 1/6-1/4; 2 = > 1/4. (Of course, putting in too many state
possibilities gets back to the scenario outlined above!) This reduces the
overall number of characters in an analysis, but retains all the
information, and makes it easier to visualilze the evolutionary
possibilities under consideration. I believe that separating these out into
separate characters in the past was done so that they could be discussed in
the appropriate portion of a paper along with the node or stem for which it
is diagnostic, but that same information can be read fairly easily off the
data matrix and really isn't necessary.
The problem here, of course, is that there are no hard-and-fast rules
about what constitutes "too few" and what constitutes "too many" characters
-- how do we know if we've got too few to perceive reality, and how do we
know if we've got so many that reality is drowned out by the noise? More
isn't necessarily better here (although the number of characters is
certainly going to rise as more taxa are added -- an analysis of all members
of the Sauropsida probably ought to have more characters than an analysis of
the Alligatorinae!) As far as I can tell, we're still running on a sort of
"common sense," intuition-based system for assessing whether or not the
number of characters in an analysis is too few, enough, or too many. And,
of course, no two people are really going to agree on where the lines should
be drawn (which is why they haven't been drawn)...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jerry D. Harris
Dept of Earth & Environmental Science
University of Pennsylvania
240 S 33rd St
Philadelphia PA 19104-6316
Phone: (215) 898-5630
Fax: (215) 898-0964
E-mail: jdharris@sas.upenn.edu
and dinogami@hotmail.com
http://www.sas.upenn.edu/~jdharris
_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.
Share information about yourself, create your own public profile at
http://profiles.msn.com.