April 2003, Page 16
Evaluating forensic DNA evidence: Essential
elements of a competent defense review
By William C. Thompson; Simon Ford; Travis Doom;
Michael Raymer; Dan E. Krane
"I get a sinking feeling when I hear a client has been
fingered by a DNA test," a defense lawyer recently told us. "Seems
there's not much I can do but negotiate a guilty
plea."
Promoters of forensic DNA testing have done a good job
selling the public, and even many criminal defense lawyers, on the
idea that DNA tests provide a unique and infallible identification.
DNA evidence has sent thousands of people to prison and, in recent
years, has played a vital role in exonerating men who were falsely
convicted. Even former critics of DNA testing, like Barry Scheck,
are widely quoted attesting to the reliability of the DNA evidence
in their cases. It is easy to assume that any past problems with DNA
evidence have been worked out and that the tests are now
unassailable.
The problem with this assumption is that it
ignores case-to-case variations in the nature and quality of DNA
evidence. Although DNA technology has indeed improved since it was
first used just 15 years ago, and the tests have the potential to
produce powerful and convincing results, that potential is not
realized in every case. Even when the reliability and admissibility
of the underlying test is well established, there is no guarantee
that a test will produce reliable results every time it is used. In
our experience there often are case-specific issues and problems
that greatly affect the quality and relevance of DNA test results.
In those situations, DNA evidence is far less probative than it
might initially appear.
The criminal justice system presently
does a poor job of distinguishing unassailably powerful DNA evidence
from weak, misleading DNA evidence. The fault for that serious lapse
lies partly with those defense lawyers who fail to evaluate the DNA
evidence adequately in their cases. This article describes the steps
that a defense lawyer should take in cases that turn on DNA evidence
in order to ascertain whether and how this evidence should be
challenged.
Our focus here is on the
most widely used form of DNA testing, which examines genetic
variants called short tandem repeats, or STR's. Our goal is to
explain what you need to know, why you need to know it, and how you
get the materials and help you need. We leave for a future article
discussion of another less common and even more problematic form of
DNA testing, which examines mitochondrial DNA (mtDNA).
Understanding the lab
report
The first item you need in a DNA
case is the lab report. The report should state
what samples were tested, what type of DNA test was performed, and
which samples could (and could not) have a common source. Reports
generally also provide a "table of alleles" showing the DNA profile
of each sample. The DNA profile is a list of the alleles (genetic
markers) found at a number of loci (plural for "locus," a
position) within the human genome. To understand DNA evidence, you
must first understand the table of alleles.

Figure 1 shows a table of alleles, as
represented in a typical lab report. This table shows the DNA
profiles of five samples — blood from a crime scene and reference
samples from four suspects. These samples were tested with an
automated instrument called the ABI Prism 310 Genetic Analyzer™
using a set of genetic probes called ProfilerPlus™. A company called
Applied Biosystems, Inc. (ABI) developed this system for typing DNA.
It is currently the most widely used method for forensic DNA typing
in the United States, used by about 85 percent of laboratories that
do forensic DNA testing.1
Across the top of the table
are the names of the various loci examined by the test. The
ProfilerPlus™ system examines ten loci. (Labs sometimes also run
another set of genetic probes, called Cofiler™, which includes four
additional loci). The alleles that the test detected at each locus
are identified by numbers. Thus, at locus D3S1358, the test detected
alleles 15 and 16 on the bloodstain. At each locus, a person has two
alleles, one inherited from each parent. In some cases, only one
allele is detected, which is interpreted as meaning that by chance
the person inherited the same allele from each parent. (See in
Figure 1, e.g., Suspect 2's profile at locus D3S1358 and Suspect 4's
profile at locus D8S1179). However, most samples will have two
different alleles at each locus, as seen in Figure 1.
Each allele is a short fragment of
DNA from a specific location on the human genome known as an STR
(short tandem repeat). STRs are places in human DNA where a short
section of the genetic code repeats itself. Everyone has these
repeating segments, but the number of repetitions (and hence the
length of these segments) varies among individuals. The numbers
assigned to the alleles indicate the number of repetitions of the
core sequence of genetic code. ProfilerPlus™ identifies and labels
fragments of DNA that contain STRs. The Genetic Analyzer then
measures their length and thereby determines which alleles are
present.
By examining the DNA
profiles, one can tell whether each suspect could or could not have
been the source of the blood. Suspects 1, 2 and 4 are ruled out as
possible sources because they have different alleles than the blood
at one or more loci. However, Suspect 3 has exactly the same alleles
at every locus, which indicates he could have been the source of the
blood. In a case like this, the lab report will typically say that
Suspects 1, 2 and 4 are "excluded" as possible sources of the blood,
and that Suspect 3 "matches" or is "included" as a possible donor.
One of the loci analyzed is called amelogenin (Amel) and is used
for typing the sex of a contributor to a sample. Males have X and Y
versions of the alleles at that locus; females have only the X
because they inherit two copies of the X chromosome. All of the
profiles shown in Figure 1 appear to be of males.
Lab reports generally also contain
estimates of the statistical frequency of the matching profiles in
various reference populations (which are intended to represent major
racial and ethnic groups). Crime labs compute these estimates by
determining the frequency of each allele in a sample population, and
then compounding the individual frequencies by multiplying them
together. If 10% (1 in 10) of Caucasian Americans are known to
exhibit the 14 allele at the first locus (D3S1358) and 20% (1 in 5)
are known to have the 15 allele, then the frequency of the pair of
alleles would be estimated as 2 x 0.10 x 0.20 = 0.04, or 4% among
Caucasian Americans. The frequencies at each locus are simply
multiplied together (sometimes with a minor modification meant to
take into account the possibility of under-represented ethnic
groups), producing frequency estimates for the overall profile that
can be staggeringly small: often on the order of 1 in a billion to 1
in a quintillion, or even less. Needless to say, such evidence can
be very impressive.
When the
estimated frequency of the shared profile is very low, some labs
will simply state "to a scientific certainty" that the samples
sharing that profile are from the same person. For example, the FBI
laboratory will claim two samples are from the same person if the
estimated frequency of the shared profile among unrelated
individuals is below one in 260 billion. Other labs use different
cut off values for making identity claims. All of the cut-off values
are arbitrary: there is no scientific reason for setting the cut off
at any particular level just as there is no formally recognized way
of being "scientifically certain" about anything. Moreover, these
identity claims can be misleading because they imply that there
could be no alternative explanation for the "match," such as
laboratory error, and they ignore the fact that close relatives are
far more likely to have matching profiles than unrelated
individuals. They can also be misleading in that the DNA tests
themselves are powerless to provide any insight into the
circumstances under which the sample was deposited and are generally
unable to determine the type of tissue that was
involved.
Looking
behind the lab report: Are the laboratory's conclusions fully
supported by the test results?
Many defense lawyers simply accept lab reports at
face value without looking behind them to see whether the actual
test results fully support the laboratory's conclusions. This can be
a serious mistake.
In our
experience, examination of the underlying laboratory data frequently
reveals limitations or problems that would not be apparent from the
laboratory report, such as inconsistencies between purportedly
"matching" profiles, evidence of additional unreported contributors
to evidentiary samples, errors in statistical computations and
unreported problems with experimental controls that raise doubts
about the validity of the results. Yet forensic DNA analysts tell us
that they receive discovery requests from defense lawyers in only
10-15% of cases in which their tests incriminate a suspect.
Although current DNA tests rely
heavily on computer-automated equipment, the interpretation of the
results often requires subjective judgment. When faced with an
ambiguous situation, where the call could go either way, crime lab
analysts frequently slant their interpretations in ways that support
prosecution theories.2
Part of the
problem is that forensic scientists refuse to take appropriate steps
to "blind" themselves to the government's expected (or desired)
outcome when interpreting test results. We often see indications, in
the laboratory notes themselves, that the analysts are familiar with
facts of their cases, including information that has nothing to do
with genetic testing, and that they are acutely aware of which
results will help or hurt the prosecution team. A DNA analyst in one
case wrote:
"Suspect-known crip gang member — keeps 'skating' on
charges-never serves time. This robbery he gets hit in head with bar
stool — left blood trail. [Detective] Miller wants to connect this
guy to scene w/DNA …"
In another case, where the defense
lawyer had suggested that another individual besides the defendant
had been involved in the crime, and might have left DNA, the DNA
laboratory notes include the notation: "Death penalty case. Need to
eliminate [other individual] as a possible suspect."
It is
well known that people tend to see what they expect (and desire) to
see when they evaluate ambiguous data.3 This tendency can cause
analysts to unintentionally slant their interpretations in a manner
consistent with prosecution theories of the case. Furthermore, some
analysts appear to rely on non-genetic evidence to help them
interpret DNA test results. When one of us questioned an analyst's
interpretation of a problematic case, the analyst defended her
position by saying: "I know I am right — they found the victim's
purse in [the defendant's] apartment." Backwards reasoning of this
type (i.e., "we know the defendant is guilty, so the DNA evidence
must be incriminating") is another factor that can cause analysts to
slant their reports in a manner that supports police theories of the
case. Hence, it is vital that defense counsel look behind the
laboratory report to determine whether the lab's conclusions are
well supported, and whether there is more to the story than the
report tells.
Behind the Table of Alleles Detected (Figure 1) is
a set of computer-generated graphs called electropherograms that
display the test results. When evaluating STR evidence, a defense
lawyer should always examine the electropherograms because they
sometimes reveal unreported ambiguities and, fairly frequently,
evidence of additional, unknown contributors. The electropherograms
shown in Figure 2 display the results for the crime scene blood and
four suspects discussed above at three of the ten loci summarized in
Figure 1.
The "peaks" in the electropherograms
indicate the presence of human DNA. The peaks on the left side of
the graphs represent alleles at locus D3S1358; those in the center
represent alleles at locus vWA; and those on the right represent
alleles at locus FGA. The numbers under each peak are
computer-generated labels that indicate which allele each peak
represents and how high the peak is relative to the baseline.
By examining the electropherograms
in Figure 2, one can readily see that the computerized system
detected two alleles in the blood from the crime scene at locus
D3S1358. These are alleles 15 and 16, which are reported in the
Table of Alleles (Figure 1). The other alleles reported in the
allele chart (Figure 1) can also be seen. Our initial examination of
these electropherograms reveals no obvious problems of
interpretation in this case.
However, other cases are not so
clearcut. Consider the electropherogram in Figure 3, which shows the
DNA test results that purportedly "matched" a defendant to a saliva
sample taken from the breast of an alleged sexual assault victim.
Although the laboratory report stated that the same alleles were
found in both samples at these three loci, close examination of the
electropherograms supports a significantly different conclusion.
There are two additional "peaks" in the saliva sample that the
laboratory failed to report — a peak labeled "12" (indicating allele
12) at locus D3S1358, and a peak labeled "OL Allele" (indicating a
possible "off-ladder," or unclassified, allele) at locus FGA. The
laboratory decided to ignore these two peaks and never mentioned
them in its report. A defense lawyer who failed to examine the
underlying test results would never have known about them. However,
they clearly complicate the interpretation of the evidence — raising
the possibility, for example, that the DNA on the breast swab is
from a person with alleles 12 and 17 at locus D3S1358, rather than
just allele 17, which would exclude the defendant as a possible
contributor.

Sources of ambiguity in STR interpretation
A number of factors can
introduce ambiguity into STR evidence, leaving the results open to
alternative interpretations. To competently represent an individual
incriminated by DNA evidence, defense counsel must uncover these
ambiguities, when they exist, understand their implications, and
explain them to the trier-of-fact.
Mixtures. One of the most common complications in the
analysis of DNA evidence is the presence of DNA from multiple
sources. A sample that contains DNA from two or more individuals is
referred to as a mixture. A single person is expected to contribute
at most two alleles for each locus. If more than two peaks are
visible at any locus, there is strong reason to believe that the
sample is a mixture.
By their very
nature mixtures are difficult to interpret. The number of
contributors is often unclear. Although the presence of three or
more alleles at any locus signals the presence of more than one
contributor, it often is difficult to tell whether the sample
originated from two, three, or even more individuals because the
various contributors may share many alleles. If alleles 14, 15 and
18 are observed at a locus, they could be from two individuals, A
and B, where A contributed 15 and B contributed 14, 18.
Alternatively, A could have contributed 14, 15 while B contributed
15, 18, and so on. There might also be three contributors. For
example A could have contributed 14, 15, while B contributed 15, 18
and C contributed 15. Many other combinations are also consistent
with the findings. A study of one database of 649 individuals found
over 5 million three-way combinations of individuals that would have
shown four or fewer alleles across all 13 commonly tested STR loci.5
Some laboratories try to determine which alleles go with
which contributor based on peak heights. They assume that the taller
peaks (which generally indicate larger quantities of DNA at the
start of the analysis) are associated with a "primary" contributor
and the shorter peaks with a "secondary" contributor. In Figure 4,
for example, a laboratory analyst might conclude that alleles 15 and
18 in the left locus (D3S1358), and alleles 19 and 21 in the right
locus (FGA) are associated with a primary contributor, while allele
16 in the left locus and alleles 22 and 25 in the right locus are
associated with a secondary contributor. But these inferences are
often problematic because a variety of factors, other than the
quantity of DNA present, can affect peak height. Moreover, labs are
often inconsistent in the way they make such inferences, treating
peak heights as a reliable indicator of DNA quantity when doing so
supports the government's case, and treating them as unreliable when
it does not.
These interpretive
ambiguities make it difficult, and sometimes impossible, to estimate
the statistical likelihood that a randomly chosen individual will be
"included" (or, could not be "excluded") as a possible contributor
to a mixed sample. Defense lawyers should look carefully at the way
in which laboratories compute statistical estimates in mixture cases
because these estimates often are based on debatable assumptions
that are unfavorable to the defendant.

Degradation. As samples age, DNA like any
chemical begins to break down (or degrade). This process occurs
slowly if the samples are carefully preserved but can occur rapidly
when the samples are exposed for even a short time to unfavorable
conditions, such as warmth, moisture or sunlight.
Degradation skews the relationship
between peak heights and the quantity of DNA present. Generally,
degradation produces a downward slope across the electropherograms
in the height of peaks because degradation is more likely to
interfere with the detection of longer sequences of repeated DNA
(the alleles on the right side of the electropherogram) than shorter
sequences (alleles on the left side).
Degraded samples can
be difficult to type. The process of degradation can reduce the
height of some peaks, making them too low to be distinguished
reliably from background "noise" in the data, or making them
disappear entirely, while other peaks from the same sample can still
be scored. In mixed samples, it may be impossible to determine
whether the alleles of one or more contributors have become
undetectable at some loci. Often analysts simply guess whether all
alleles have been detected or not, which renders their conclusions
speculative and leaves the results open to a variety of alternative
interpretations. Further, the two or more biological samples that
make up a mixture may show different levels of degradation, perhaps
due to their having been deposited at different times or due to
differences in the protection offered by different cell types. Such
possibilities make the interpretation of degraded mixed samples
particularly prone to subjective (unscientific) interpretation.
Allelic Dropout. In some instances,
an STR test will detect only one of the two alleles from a
particular contributor at a particular locus. Generally this occurs
when the quantity of DNA is relatively low, either because the
sample is limited or because the DNA it contains is degraded, and
hence the test is near its threshold of sensitivity. The potential
for allelic dropout complicates the process of interpretation
because analysts must decide whether a mismatch between two profiles
reflects a true genetic difference or simply the failure of the test
to detect all of the alleles in one of the samples.

Figure 6 shows three additional loci from
the case shown in Figure 3, in which a defendant's profile was
"matched" to the profile of a saliva sample from a woman's breast.
The laboratory reported that the DNA profile of the saliva sample
shown in Figure 6 was consistent with the defendant's profile,
despite the absence of the defendant's 14 allele at locus D13S317
because it assumed that the 14 allele had "dropped out." However,
the occurrence of "allelic dropout" cannot be independently verified
— the only evidence that this phenomenon occurred is the
"inconsistency" that it purports to explain. Obviously, there is
another possible interpretation that is more favorable for this
defendant — i.e., that police arrested the wrong man.
Spurious Peaks. An additional
complication in STR interpretation is that electropherograms often
exhibit spurious peaks that do not indicate the presence of DNA.
These extra peaks are referred to as "technical artifacts" and are
produced by unavoidable imperfections of the DNA analysis process.
The most common artifacts are stutter, noise and pull-up.
Stutter peaks are small peaks that
occur immediately before (and, less frequently, after) a real peak.
Stutter occurs as a by-product of the process used to amplify DNA
from evidence samples. In samples known to be from a single source,
stutter is identifiable by its size and position. However, it is
sometimes difficult to distinguish stutter bands from a secondary
contributor in samples that contain (or might contain) DNA from more
than one person.
"Noise" is the term
used to describe small background peaks that occur along the
baseline in all samples. A wide variety of factors (including air
bubbles, urea crystals, and sample contamination) can create small
random flashes that occasionally may be large enough to be confused
with an actual peak or to mask actual peaks.
Pull-up (sometimes referred to as bleed-through)
represents a failure of the analysis software to discriminate
between the different dye colors used during the generation of the
test results. A signal from a locus labeled with blue dye, for
example, might mistakenly be interpreted as a yellow or green
signal, thereby creating false peaks at the yellow or green loci.
Pull-up can usually be identified through careful analysis of the
position of peaks across the color spectrum, but there is a danger
that pull-up will go unrecognized, particularly when the result it
produces is consistent with what the analyst expected or wanted to
find.
Although many technical
artifacts are clearly identifiable, standards for determining
whether a peak is a true peak or a technical artifact are often
rather subjective, leaving room for disagreement among experts.
Furthermore, analysts often appear inconsistent across cases in how
they apply interpretive standards — accepting that a signal is a
"true peak" more readily when it is consistent with the expected
result than when it is not. Hence, these interpretations need to be
examined carefully.
Spikes, blobs and other false peaks. A
number of different technical phenomena can affect genetic
analyzers, causing spurious results called "artifacts" to appear in
the electropherograms. Spikes are narrow peaks usually attributed to
fluctuation in voltage or the presence of minute air bubbles in the
capillary. Spikes are usually seen in the same position in all four
colors. Blobs are false peaks thought to arise when some
colored dye becomes detached from the DNA and gets picked up by the
detector. Blobs are usually wider than real peaks and are typically
only seen in one color. The "OL Allele" shown in Figure 8 below may
be a blob.

Spikes and blobs are not reproducible,
which means that if the sample is run through the genetic analyzer
again these artifacts should not re-appear in the same place. Hence,
the correct way to confirm that a questionable peak is an artifact
is to rerun the sample. However analysts, to save time, often simply
rely on their "professional experience" to decide which results are
spurious and which are real. This practice can be problematic
because no generally accepted objective criteria have yet been
established to discriminate between artifacts and real peaks (other
than retesting).
Threshold Issues: Short Peaks, "Weak"
Alleles. When the quantity of DNA being analyzed is very low
(as indicated by low peak-heights) the genetic analyzer may fail to
detect the entire profile of a contributor. Furthermore, it may be
difficult to distinguish true low-level peaks from technical
artifacts. Consequently, most forensic laboratories have established
peak-height thresholds for "scoring" alleles. Only if the
peak-height (expressed in RFU) exceeds a standard value will it be
counted.
There are no generally
accepted thresholds for how high peaks must be to qualify as a "true
allele." Applied Biosystems, Inc., which sells the most widely used
system for STR typing (the ABI Prism 310 Genetic Analyzer™ with the
ProfilerPlus™ system) recommends a peak-height threshold of 150 RFU,
saying that peaks below this level must be interpreted with caution.
However, many crime laboratories that use the ABI system have set
lower thresholds (down to 40 RFU in some instances). And crime
laboratories sometimes apply their standards in an inconsistent
manner from case to case or even within a single case. Hence, a
defendant may be convicted in one case based on "peaks" that would
not be counted in another case, or by another lab. And in some cases
there may be unreported peaks, just below the threshold, that would
change the interpretation of the case if considered.
Finding and evaluating low-level
peaks can be difficult because labs can set their analytic software
to ignore peaks below a specified level and can print out
electropherograms in a manner that fails to identify low-level
alleles. The best way to assess low-level alleles is to obtain
copies of the electronic data files produced by the genetic analyzer
and have them re-analyzed by an expert who has access to the
analytic software.
Figure 9 shows
electropherograms from a rape/homicide case. The defendant admitted
having intercourse with the victim, but contended another man had
subsequently raped and killed her. The crime lab reported finding
only the defendant's profile in vaginal samples from the victim; the
lab report stated that the second man was "excluded" as a possible
source of the semen collected from the victim's body. However, a
review of the electronic data by a defense expert revealed low-level
alleles (peaks) consistent with those of the second man, which
significantly helped the defense case. Notice how these low-level
alleles are obscured in the upper electropherogram (which the lab
initially provided in response to a discovery request) due to the
use of a large scale (0-2000 RFU) on the Y-axis. These low peaks are
revealed in the lower electropherogram, where the defense expert set
the software with a lower threshold of detection and produced an
electropherogram with a lower scale (0-150 RFU).

Notes
1. Bureau of Justice Statistics, Survey of DNA Crime
Laboratories, 2001. National Institute of Justice, NCJ 191191,
January 2002.<http://www.ojp.usdoj.gov/bjs/pub/pdf/sdnacl01.pdf>
2. See, William C. Thompson, Subjective
interpretation, laboratory error and the value of DNA evidence:
Three case studies, 96 Genetica 153 (1995); William C. Thompson,
Accepting Lower Standards: The National Research Council's Second
Report on Forensic DNA Evidence. 37 Jurimetrics 405 (1997); William
C. Thompson, Examiner Bias in Forensic RFLP Analysis, Scientific
Testimony: An Online Journal, www.scientific.org.
3. See D. Michael Risinger, Michael J. Saks, William
C. Thompson, & Robert Rosenthal, The Daubert/Kumho Implications
of Observer Effects in Forensic Science: Hidden Problems of
Expectation and Suggestion. 90 Cal.L.Rev. 1 (2002).
4. For
more background information on STR testing, see John M. Butler,
Forensic DNA Typing: Biology and Technology Behind STR Markers
(2001).
5. For more information
about this study, contact Dan Krane.