April 8, 2004
Tamara Davis (Biology)
Phenotype and Genotype
Prepared by Anne Dalke
Additions, revisions, extensions
are encouraged in the Forum
Tamara talked with us about information contained (and not contained)
in a genome. Some of the ideas she opened up for discussion
- what information is actually obtained by genome projects
- how we interpret the information (guesswork vs. actual knowledge)
- whether or not the sequence of the DNA is really a blueprint for
- what kind of information can't be gained from knowledge of the DNA sequence
- the plasticity of information in the genome.
Tamara began her presentation by describing the ways in which biologists
can now reconstruct the genome: since DNA is double-stranded, and we
know the complementary sequences, we can build them, strand by
strand, amplifying short fragments. Knowing the sequence, we can
design short sequences and fragments, using the slight overlap between
to allow them to interact, and so build entire chains of viruses.
The goal is to put together a minimal cellular genome, the amount
required for cell life.
Caution was urged, however, regarding our
excitement about what this means: creating a genome, de nova, is not
the same thing as creating an orgamism; it means only that researchers
can create the proteins surrounding and protecting an infectious virus.
We know only the order of multitudes. Genes are not a controlled
property; there are lots of other factors responsible for
controlling what happens. In principle, to create a cellular life
form is much more difficult than to create a virus, and it is wrong
to think that we can. We now have available templates to build
complementarily to; we can manipulate each molecule. But there is
a 1-in-500 error rate in the creation of synthetic viruses. What,
Tamara was asked, is the error rate of life? Athough the error rate
in DNA reproduction is actually 1-in-10,000 to 1-in-1,000, the cell has
a variety of correction mechanisms, so that actually only 1-in-a-billion
remain uncorrected. Conceptually, once you have the sequence, only time
stands between you and making the molecule. What is increasingly
understood is that, although lots is possible in principle, it takes
so long that, in practice, it can't be done. The numbers are so large
that we might as well "forget it." It's like connecting all the neurons
in the brain; or knowing that enough monkeys, typing long enough,
can produce a Shakespearean play.
Where, in building a synthetic model, is error introduced? There is an awareness that we cannot control the reaction in a chemical process. Although we should, in principle, be able to block off
polymers so nothing can be added to them, we cannot. When we order DNA, we expect a mistake. This is why numbers are so important: we can conceive of precision, but we ultimately "run into the second law": there is no practical way, with big enough numbers, NOT to have errors.
Gene sequencing, Tamara suggested, is "not the most useful" project; aside from the order of bases, it does not give all the information we need. That information must be annotated in order to make sense of it: what are the repetitive, middle, regulatory elements, and what is garbage? These questions are answered by comparing a sequence to--and asking how it resembles--what was identified by earlier mechanisms. A lot of this annotating involves guessing and hypothesizing. We have to go back and test whether the indentificaiton of "genes" is correct. Sequencing is easy and fairly straightforward; it is the annotation and analysis of that information which are the more complicated projects.
We sought together for an accurate metaphor to describe this complicated process: describing it as "putting the puzzle pieces together" does not adequately describe the complexity. We were asked, instead, to imagine an instruction manual for a Mac and one for a PC; to imagine eliminating all punctuation and spacing (thereby running all letters together); then to imagine randomly inserting one set of instructions into the other. This imagined scenario gets us closer to the experience of trying to trace "how my cells know where to read and where not to." The expression of genes is controlled by a series of factors; we cannot go from the sequencing of the genome to "making it do the right thing in the right cell." There is genetic information, which is stored in a set of processes or structures. Like a program written for a Mac, which will not run on a PC, information in a cell is read by a decoder, and is meaningless without it.
We then extrapolated from what Tamara was saying to ask whether this was not true for "all information": the sequence of DNA is a blueprint for development, but we need also to understand the order of modalities and associated proteins that turn genes off and on. Tamara's research explores the modifications to DNA that cause it to behave differently in different situations. The actual bases, for instance, are modified by the methyl group, which can shut down the expression of genes (by preventing proteins from binding) or can recruit proteins. Methylation is however primarily associated with silencing, in a process known as genomic imprinting: the parent-specific (not random, not stochastic) selection of which of two inherited functions will "pump RNA," and which will shut down. Something "outside DNA" does this: the expression status in every individual in every generation has to be reactivated. The actual state is controlled by methylation, a method whereby cells can tell which are mom's genes, which dad's. How does a germ line know what it's making/methylating? How does our "machinery" target these genes? We have no idea how they know how to do what they do, or why they follow different cues in different circumstances.
Tamara's work involves looking at gene regulation in mice. There are so many layers of information; just knowing the sequence of the DNA does not get her very far. Having the sequencing information to work with, however, allows her to move into interesting and interestlngly challenging work. Biologists thought that getting the sequencing data was going to be more challenging than it was; the project came in on time, under budget. As soon as researchers acquired the ability to plot the location of all known objects (this works in the study of other systems to, such as discovering the order of a galaxy), then they are able to think about higher level orders (such as the distribution of galaxies). As we move to different scales, we need to know what else contributes to the organization that we recognize. This is no surprise: we should have realized, from the beginning, that "knowing the code was not knowing all," that we can only think of code in junction with its decoder.
Acknowledging that the process is biochemical, and looking at the decoding (the ability to translate a gene into a protein) is part of the information we need to recognize. Rather than dismissing some aspects of the process as "just environment" or "context," we need to acknowledge that a tremendous amount of the information in the genome is "non-structural," not part of the protein-coding sequence. The convention now is to identify everything that is not part of the regulatory sequence as "noise," but much of what is identified as "not information" is central to the decoding process. The mother's womb is not yet replaceable by a test tube. The decoder, in this understanding, becomes part of the realm of information, and is clearly bigger than the genome itself. In fact, most of the information is in the decoder.
There was a convention of thinking of information as discrete and meaningful in itself. But can there actually BE information without a decoder? Can you even say you HAVE information, without a means of decoding it? This is actually a practical problem: can we send messages elsewhere in the universe, without knowing anything about what other life forms use as decoder? (You call this a PRACTICAL question??)
The work of Susan Oyama on The Ontogeny of Information was evoked. Oyama argues that the conventional distinction between enviroment and what's inate is "silly": why do we speak of a genome and its environment or context? Why don't we speak of the environment and its genetic context? Why don't we (preferably) talk about developmental systems? Can you even separate out levels of order? Aren't they just arbitrary? Do all humans need to order and sequence the complexities they encounter? Might some cultures be less driven to these activities? Might some other intelligence construct information differently, see randomness and complexity where we see organization? Is our construction of order (and our failure to find it) nonsense, or human stupidity? The more we learn, the less "junk" we see (in the genome, for instance); might "junk" simply be the holding place (as "noise" has been, in earlier conversations) for what we don't understand?
Interesting comparative work has been done with genomes in other species--in the puffer fish, for instance, which has a highly compacted genome with very little junk. There is a conceptual problem here: one can create a functioning cell, and observe that removing some portion of the DNA has no effect. That we can't see an effect doesn't preclude the fact that it could be doing something we haven't looked it; this doesn't mean it doesn't have a role to play.
Our next meeting is Thursday, April 15, when Jenny Rickard (who is
Dean of Undergraduate Admissions and Financial Aid at Bryn Mawr) is
planning to have a discussion with us about "Information, Meaning and
Noise in College Admissions and Financial Aid." She'll invite that
discussion by reviewing, first, perennial questions in college
admissions (What are all the sources of information about colleges?
What means the most to the prospective student? What's the most
important piece of information in a college application?) as well as
recent headlines in admissions and financial aid (What is the
relevant information behind the controversy with early decision and
early action? What does Harvard's recent announcement regarding
financial aid for students with family incomes below $40,000 really
mean? What do we mean by diversity in college admissions?)
Return to Brown Bag