On Fri, Sep 16, 2011 at 2:54 PM, Erik Boehm <erikboehm07@yahoo.com>
wrote:
And when we find that extant organisms use more than 1 of the 4
possible
codon sequences to encode Serine at that position?
Align the sequences, put on a tree.
Lets take an example of humans and chips as the extant organisms.
In more than one case (such as hemoglobin), Humans and chimps have the
exact same amino acid sequence, but a slightly different DNA sequence
to
encode that protein.
Suppose we find a fossil ape that is just outside the Human-Chimp
clade.
Which DNA sequence do we use?
The reconstructed common ancestral one. We do it with humans - SNPs
and other intrapopulational variations are taken into account.
What if it is on the human branch, but very basal, do we assume in
every
case where there is ambiguity between humans and chimps that we use the
human version. Sure you could compare to gorillas, but what of the
cases
where the protein sequence is different (as the amino acid identity
with
them is not 100%). If we find consensus between the chimp and gorilla,
sequence, we conclude the chimp sequence is basal, but this basal
member of
the human branch.... we still don't know when to go with the human
encoding
for a particular aa, or the chimp encoding.>
We could use maximum likelihood, for example.
You simply cannot reconstruct the DNA sequence that made the amino acid
sequence with any certainty. You will be reduced to
arbitrary guessing.
"cannot with *any* certainty" is an exaggeration. It is made all the
time when the ancestral state is inferred from extant sequences.
Every single codon assignment is going to involve some level of
guessing
(unless it is methionine in a vertebrate).
It is true. Actually even when we find a methionine it will involve
some level of guessing. It is just that it will not be a random
guessing.
Even when consesnus sequences exist, you still find many variations
from
species to species (SNPs)....
Not only