On Fri, Sep 16, 2011 at 2:54 PM, Erik Boehm <erikboehm07@yahoo.com> wrote:
And when we find that extant organisms use more than 1 of the 4 possible
codon sequences to encode Serine at that position?
Align the sequences, put on a tree.
Lets take an example of humans and chips as the extant organisms.
In more than one case (such as hemoglobin), Humans and chimps have the
exact same amino acid sequence, but a slightly different DNA sequence to
encode that protein.
Suppose we find a fossil ape that is just outside the Human-Chimp clade.
Which DNA sequence do we use?
The reconstructed common ancestral one. We do it with humans - SNPs
and other intrapopulational variations are taken into account.
What if it is on the human branch, but very basal, do we assume in every
case where there is ambiguity between humans and chimps that we use the
human version. Sure you could compare to gorillas, but what of the cases
where the protein sequence is different (as the amino acid identity with
them is not 100%). If we find consensus between the chimp and gorilla,
sequence, we conclude the chimp sequence is basal, but this basal member
of the human branch.... we still don't know when to go with the human
encoding for a particular aa, or the chimp encoding.>
We could use maximum likelihood, for example.
You simply cannot reconstruct the DNA sequence that made the amino acid
sequence with any certainty. You will be reduced to
arbitrary guessing.
"cannot with *any* certainty" is an exaggeration. It is made all the
time when the ancestral state is inferred from extant sequences.
Every single codon assignment is going to involve some level of guessing
(unless it is methionine in a vertebrate).
It is true. Actually even when we find a methionine it will involve
some level of guessing. It is just that it will not be a random
guessing.
Even when consesnus sequences exist, you still find many variations from
species to species (SNPs)....
Not only