[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

A "new" phylogeny & methodology



Is it just me, or do other people find this endless hunt for ever more
"characters" just plain CRAZY? The computer will, of course, grind out trees
as long as you give it "characters" to chew on, but do you really BELIEVE in
the resulting phylogenies, or are they just so much nonsense, to be discarded
when the next batch of computer-generated trees, crunching even more
"characters," emerges? If so, why not hold out until you've found ALL
possible "characters," grind out the one big tree and be done with it?

As I see it (perhaps incorrectly), one could imagine a theoretical matrix of thousands, perhaps tens of thousands, of characters, resulting from an incredibly detailed analysis of thousands of specimens. Would this tell us anything useful? My guess would be: no, not really -- it would be loaded with "noise." Using such an analysis, one could, for example, cite every percent grade in the length of a limb bone as a separate state -- something like "Ratio of femur/metatarsus length 0 = 50%, 1 = 49%, 2 = 48%...", etc., but surely this kind of thing is well within the range of individual variation and is statistical noise. And that is the heart of the problem of "too many characters" -- once you have populations of specimens large enough and if you peer at them long enough, you're going to start seeing differences that are due solely to individual variation (or, on a slightly larger scale, clinal variation), but a cladistic-type analysis can't tell you that; it will still produce a branching graphic placing all the specimens into some kind of order when, in reality, no such order exists. (Although I can see why it would be a useful tool, this is the biggest danger inherent in doing phylogenetic analyses with individual specimens instead of species or other taxa.)


At the other end of the spectrum, if one is too general with one's characters (or has too few), then the analysis can't make any sense out of real divergence patterns at all. I have some problems with analyses that say things like "Prezygapophyses on caudal vertebrae 0 = short; 1 = long." This doesn't tell anyone anything -- what is short??? what is long??? These terms need very explicit explanations (short, for example, may mean "<50% length of centrum" or "does not extend beyond cranial end of centrum." If they're ambiguous, then they aren't scientific because they can't be reanalyzed and retested by other scientists -- they can't read the mind of the original author to determine what, in his/her mind, "long" and "short" are, and as we are all aware, one person's "short" is another person's "long!" An analysis with characters thus defined, and with too few characters, is far more likely to result in erroneous and/or polytomic trees instead of something more sensible.

Lastly, one must be quite careful in separating character states -- not functional decouplings (which is a whole 'nother ball o' wax...), but in having what are better stated as single, multi-state characters distributed throughout a matrix as separate, bi-modal characters. For example, some analyses of the past have cited the astragalar ascending process:tibia length ratio frequently -- one character would be if the process is 0 = < 1/6 the length of the tibi; 1 = > 1/6, and later we would see the same character state again, but with 0 = < 1/4 the length of the tibia; 1 = > 1/4. This is much more simply stated as a single character with states: 0 = < 1/6; 1 = 1/6-1/4; 2 = > 1/4. (Of course, putting in too many state possibilities gets back to the scenario outlined above!) This reduces the overall number of characters in an analysis, but retains all the information, and makes it easier to visualilze the evolutionary possibilities under consideration. I believe that separating these out into separate characters in the past was done so that they could be discussed in the appropriate portion of a paper along with the node or stem for which it is diagnostic, but that same information can be read fairly easily off the data matrix and really isn't necessary.

The problem here, of course, is that there are no hard-and-fast rules about what constitutes "too few" and what constitutes "too many" characters -- how do we know if we've got too few to perceive reality, and how do we know if we've got so many that reality is drowned out by the noise? More isn't necessarily better here (although the number of characters is certainly going to rise as more taxa are added -- an analysis of all members of the Sauropsida probably ought to have more characters than an analysis of the Alligatorinae!) As far as I can tell, we're still running on a sort of "common sense," intuition-based system for assessing whether or not the number of characters in an analysis is too few, enough, or too many. And, of course, no two people are really going to agree on where the lines should be drawn (which is why they haven't been drawn)...

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Jerry D. Harris
Dept of Earth & Environmental Science
University of Pennsylvania
240 S 33rd St
Philadelphia PA  19104-6316
Phone: (215) 898-5630
Fax: (215) 898-0964
E-mail: jdharris@sas.upenn.edu
and     dinogami@hotmail.com
http://www.sas.upenn.edu/~jdharris

_________________________________________________________________________
Get Your Private, Free E-mail from MSN Hotmail at http://www.hotmail.com.

Share information about yourself, create your own public profile at http://profiles.msn.com.