[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Subject Index][Author Index]

Cladostuffola



I think there is a tremendous amount of confusion out there
about how science is really working here and I'm going to
try and clear this up.

Finding trees is only one aspect of doing a phylogenetic
analysis. The procedures themselves for finding the trees is
an optimization procedure given the data at hand, just as
running multivariate statistics on morphometric analyses is
an optimization procedure. Even doing a T-Test is a
procedure and that's all it is. The trick is in developing
hypotheses around the context that a researcher is working
on and analyzing patterns to see if the hypothesis is
supported or not based on the chosen optimization
procedures. The optimization changes given the number of
characters, what characters, how they are coded, how the
OTUs are clumped etc. So the trees are run based on this
starting set of assumptions. there is more power in trying
lots of different sets of data, and evolving bigger and
better matrices. But, if done right, the optimization
procedure allows the evaluation of ideas. So, I suggest
Stegoceras is a sister taxon of Tyrannosaurus. I code 457
characters and extract a bunch of trees. I see it only takes
321 reversals to favor my hypothesis, so I am happy with it.
Except, that people doing randomization tests can show that
that is a zillion times worse than rearranging my characters
at random and re-running, which suggests that I am full of
crap as that hypothesis goes. Cool.

Now remember, science is a dialectic - it evolves using
data and analyses to show where the research should proceed
to and this feedback leads to more data collection and
modification of the procedures used. Waiting till everything
is described before running analyses, as was suggested, is
absurd. First of all, without interim analyses you'd never
know when you are done, and we'll never really be done
anyway. All research works in this way and the interim
papers done by Tom and Chris and Sereno are intended to show
where the interesting patterns are popping up and what
changes in the program of research (not computer program)
need be made. Developing a tree that changes a lot in some
aspects as different researchers do work at varying times
sends up red flags as to where there are problems and these
can then be studied further to see what is causing this
instability and how this can be stabilized. Other sections
stay pretty much the same, this robustness tells us a lot as
well, especially if the character sets are very different. 

So the testing is not done by just running trees one time,
the testing is done by assembling character suites and other
information and making predictions about relationships based
on the data, and testing these predictions by doing the tree
extraction, and these trees are then further analyzed using
a wide variety of different procedures. the results are
decoded and digested, and new hypothese developed that are
hopefully better. Now these hypotheses can be further
tested/refuted with alternative data sets and that's what
keeps the process moving - looking for congruence from many
directions really helps us see stability and robustness. If
foot data show one tree and everything else strongly
supports a whole bunch of related but different trees, then
that suggests the foot data need to be looked at very
critically, or certain processes are working differently
than we expect.

I have been harping on these huge character matrices for
years, not because bigger isn;t better, but because we need
to develop a means for the detailed evaluation of each
character so that we have confidence in the quality of the
input. Not a fault of any of the guys doing the work. I
truly believe that Sereno and Forster and Holtz and Brochu,
etc., probably do a great job. The morphometrician in me
just wants to see detailed analyses of the ratios and other
characters. Might be an interesting excersize to try and
develop a consistent set of characters for use by everyone -
not that it couldn't evolve by any means - so that not
everyone has a different set with many subtle equivalents of
others. We just need a means - perhaps on the web for each
group - where researchers - agreement not necessary on the
end result - agree that these 243 characters (or whatever)
are to be coded in this way, perhaps with a few variants,
and then new papers can go into detail on a new set of
additional 35 characters, and the the data run. the 35 are
then discussed and those that make sense added to the 243.
Then we would all know what characters are simple
heterochronic shifts, which are functionally integrated,
etc. But wegottadosomething so we can optimize (another
optimization procedure!) what we get out of the hard work
that these guys are doing. Further, I think people grossly
overstate how different the results are, there are lots of
strong trends and more and more parts get stabilized.

Rant completed! Back to SVP crapola

Ralph Chapman