Error Rate
-
Michele Triplett
Dennis,
Ok, I’ll bite, but I should warn you that I have more questions than answers.
I’d say the total error rate (ER) is equal to the practitioner error rate (ER(P)) plus the error rate of the method (or the inherent error rate of the test) (ER(M)).
ER= ER(P) + ER(M)
Some testing methods have an inherent error rate, like home pregnancy tests. The ACE-V method doesn’t seem to have an inherent error rate. What about practitioner error rate? If the ACE-V methodology is applied correctly it would seem like this portion of the error rate should also be zero. One could conclude that if ACE-V is used correctly then the error rate would be zero.
I can see the logic in this but I think there’s more to the problem. 1st, we know that errors have occurred. Should we then attribute all errors to practitioners using ACE-V improperly? How can we determine this when practitioners don’t document how they used ACE-V? Is there a standard way to use ACE-V? From reading different websites and books I’d say that there isn’t a general consensus about how to properly use ACE-V. Therefore each type of use may have a different error rate.
Here’s another thought (kind of straying from the topic). What if I were trained wrong on how to use ACE-V. Then I made an erroneous ID. Should I be fired, demoted, or retrained? Who decides what the proper re-training should be, since there seems to be no standard explanation of ACE-V. If I use the method the way I was trained, is this still a practitioner error? Maybe it’s an industry (or agency) error?
Another thing I wonder is if the error rate of ACE-V can really be determined to be zero? The only sciences that claim to have a zero error rate are exact sciences. Other sciences claim that all conclusions are continually open for review. If this is true, then the claim that conclusions are “Absolute and Final” seems to be outside the limits of science. I guess I’d say that conclusions in science are “the best possible given the available data”.
Back to how error rates relate to Daubert: In the video from the “Commonwealth vs. Patterson” (which was linked in the most recent issue of Jon Stimac’s FP Stuff) the judges seemed to have interpreted that Daubert requires both types of error rates.
Michele
Ok, I’ll bite, but I should warn you that I have more questions than answers.
I’d say the total error rate (ER) is equal to the practitioner error rate (ER(P)) plus the error rate of the method (or the inherent error rate of the test) (ER(M)).
ER= ER(P) + ER(M)
Some testing methods have an inherent error rate, like home pregnancy tests. The ACE-V method doesn’t seem to have an inherent error rate. What about practitioner error rate? If the ACE-V methodology is applied correctly it would seem like this portion of the error rate should also be zero. One could conclude that if ACE-V is used correctly then the error rate would be zero.
I can see the logic in this but I think there’s more to the problem. 1st, we know that errors have occurred. Should we then attribute all errors to practitioners using ACE-V improperly? How can we determine this when practitioners don’t document how they used ACE-V? Is there a standard way to use ACE-V? From reading different websites and books I’d say that there isn’t a general consensus about how to properly use ACE-V. Therefore each type of use may have a different error rate.
Here’s another thought (kind of straying from the topic). What if I were trained wrong on how to use ACE-V. Then I made an erroneous ID. Should I be fired, demoted, or retrained? Who decides what the proper re-training should be, since there seems to be no standard explanation of ACE-V. If I use the method the way I was trained, is this still a practitioner error? Maybe it’s an industry (or agency) error?
Another thing I wonder is if the error rate of ACE-V can really be determined to be zero? The only sciences that claim to have a zero error rate are exact sciences. Other sciences claim that all conclusions are continually open for review. If this is true, then the claim that conclusions are “Absolute and Final” seems to be outside the limits of science. I guess I’d say that conclusions in science are “the best possible given the available data”.
Back to how error rates relate to Daubert: In the video from the “Commonwealth vs. Patterson” (which was linked in the most recent issue of Jon Stimac’s FP Stuff) the judges seemed to have interpreted that Daubert requires both types of error rates.
Michele
-
Michele Triplett
I agree with Steve and Pat (sorry wrong post
), this is confusing. To add another level to the confusion, there's also the issue of whether false negatives are considered errors. Some would argue that they aren't important errors since nobody would be falsely accused due to this type of error. What about clerical errors? I guess the error rate could be stated as:
ER= Type 1 ER(P) + Type 2 ER (P) + Clerical ER(P) + ER(M)
And the error rate could go down exponentially depending on how many verifiers there are, ER squared for 1 examiner, ER cubed for 2 verifiers, etc.
Did you see the video of the recent Daubert challenge? The last 9 minutes talks a lot about error rates but gave me very little as far as answers.
Michele
michele.triplett@metrokc.gov
ER= Type 1 ER(P) + Type 2 ER (P) + Clerical ER(P) + ER(M)
And the error rate could go down exponentially depending on how many verifiers there are, ER squared for 1 examiner, ER cubed for 2 verifiers, etc.
Did you see the video of the recent Daubert challenge? The last 9 minutes talks a lot about error rates but gave me very little as far as answers.
Michele
michele.triplett@metrokc.gov
-
Charles Parker
- Posts: 586
- Joined: Mon Jul 04, 2005 6:15 am
- Location: Cedar Creek, TX
Error Rate
Michelle:
You bring some interesting topics that I think need to be discussed on the topic of error rate. I agree that the topic false negatives has kind of been put in the back of the barn because it does someone no harm. I do not think the public would agree if a miss was made on a subject in an armed robbery and one month later he does another robbery resulting in the death of someone. Although not along the line of a latent print analysis the miss made a few months ago on a record search where the subject that was wanted for a murder was missed, and he then went on to kill someone else was a false negative that certainly did harm to someone else. The philosophy that false negatives do not count because they harm no one is a fallacy, and one that I do not believe the public will agree with. They may not count from an administrative veiw, but to say they do not hurt anyone is strictly conjecture and conjecture should not be part of solution.
But here comes the devil. Everyone (almost) has made one, so how do you count it and counsel it? Tough question.
The next issue is the analogy of testing methods = comparative analysis. I am afraid I do not see the connection. I have seen posts before where the writer associates the comparison of a latent print with a known exemplar to that of a test for A or B component. To me Latent Print Analysis is not a test. It is the use of reasoning (deductive I prefer) based upon knowledge, training and experience to determine if a latent print and the exemplar have a common (individual) source.
The last issue, and this one is really going to get me in trouble is: I do not believe or see where conclusions (opinions) based upon latent print analysis is "Absolute and Final". If they are then the next time I get ready for court we can bring in one of the best examiners in the world to follow me in court and say that I made a correct identification without ever having to look at the latent and the known. I will go you one better. Let us say that you come to me and ask me to go into court and testify to an identification (individulization) without comparing the latent with the inked print and that I would not need to since the entire SWGFAST group has looked at it and determined them to be from the same source. I am still going to want to compare that latent print with the exemplar. I probably will not disagree with them, but I will not know until such time as I sit down and conduct the comparison.
Absolute and Final is a fallacy and I think it is the one thing that hangs in the craw of most people. When two examiners agree that a latent print and an inked print have the same source then that is their opinion from my point of view. When I get a chance to do the comparison and agree with their conclusion, then three of us are of the same opinion.
Now with that being said, if five examiners are of the opinion that latent A matches subject A, and 1,350++ examiners say that it does not, then I think the five have a little bit of a problem. It no longer is just a matter of an opinion. It has proceeded to the next level as being a consensus. There is something wrong with the perception, training, education, etc ad infinitum of the five.
I do like your post, and hope that my interchange here has not disuaded you from sharing your ideas. Keep it up.
You bring some interesting topics that I think need to be discussed on the topic of error rate. I agree that the topic false negatives has kind of been put in the back of the barn because it does someone no harm. I do not think the public would agree if a miss was made on a subject in an armed robbery and one month later he does another robbery resulting in the death of someone. Although not along the line of a latent print analysis the miss made a few months ago on a record search where the subject that was wanted for a murder was missed, and he then went on to kill someone else was a false negative that certainly did harm to someone else. The philosophy that false negatives do not count because they harm no one is a fallacy, and one that I do not believe the public will agree with. They may not count from an administrative veiw, but to say they do not hurt anyone is strictly conjecture and conjecture should not be part of solution.
But here comes the devil. Everyone (almost) has made one, so how do you count it and counsel it? Tough question.
The next issue is the analogy of testing methods = comparative analysis. I am afraid I do not see the connection. I have seen posts before where the writer associates the comparison of a latent print with a known exemplar to that of a test for A or B component. To me Latent Print Analysis is not a test. It is the use of reasoning (deductive I prefer) based upon knowledge, training and experience to determine if a latent print and the exemplar have a common (individual) source.
The last issue, and this one is really going to get me in trouble is: I do not believe or see where conclusions (opinions) based upon latent print analysis is "Absolute and Final". If they are then the next time I get ready for court we can bring in one of the best examiners in the world to follow me in court and say that I made a correct identification without ever having to look at the latent and the known. I will go you one better. Let us say that you come to me and ask me to go into court and testify to an identification (individulization) without comparing the latent with the inked print and that I would not need to since the entire SWGFAST group has looked at it and determined them to be from the same source. I am still going to want to compare that latent print with the exemplar. I probably will not disagree with them, but I will not know until such time as I sit down and conduct the comparison.
Absolute and Final is a fallacy and I think it is the one thing that hangs in the craw of most people. When two examiners agree that a latent print and an inked print have the same source then that is their opinion from my point of view. When I get a chance to do the comparison and agree with their conclusion, then three of us are of the same opinion.
Now with that being said, if five examiners are of the opinion that latent A matches subject A, and 1,350++ examiners say that it does not, then I think the five have a little bit of a problem. It no longer is just a matter of an opinion. It has proceeded to the next level as being a consensus. There is something wrong with the perception, training, education, etc ad infinitum of the five.
I do like your post, and hope that my interchange here has not disuaded you from sharing your ideas. Keep it up.
Knuckle Draggin Country Cousin
Cedar Creek, TX
Cedar Creek, TX
-
Guest
Michele,Michele Triplett wrote: How can we determine this when practitioners don’t document how they used ACE-V?
This seems to be a common theme of yours, that documentation of analysis will reduce rate of error. I disagree. I think the Mayfield case is a prime example. Once subject bias crept in documentation did not help. It seems you are in favor of documenting your analysis, but give little attention to blind verification.
Haven't you seen those taste test commercials? You're not supposed to tell the taste tester what they are supposed to believe (or your verifier). Even if you do have catchy phrases by your coworkers to get you through court.
-
Michele Triplett
Guest,
You’re right, I do advocate for documentation but not in all cases, just for conclusions that aren’t obvious. The examiners in the Mayfield case documented their conclusions but not what lead them to these conclusions. One example is that they determined there were 4 impressions but they didn’t document what evidence there was to support this conclusion.
You’re wrong about me giving little attention to BV. I just wrote a long post on how we use it in our office on a regular basis. I support and encourage blind testing!!
You’re right about catchy phrases, there useless if you don’t understand them and can’t elaborate on them.
About your taste test example: In a taste test the testing is done prior to a conclusion being arrived at, as I think it should be. In ACE-V, this would be in the comparison phase. If verification is a peer review process (as Ashbaugh says) it can’t be done blindly or it would violate the peer review process required by science. Of course this is just my opinion, which I can substantiate. Can you substantiate how peer review can be done blindly and still claim to be scientific?
Charles,
I agree that absolute conclusions is a fallacy, some refer to it as “dogma”. Even if we don’t accept it, here’s one of the problems. This is published as being an accepted industry concept. I’ve never seen anything published that contradicts this opinion. Christophe Champod did bring this up in the e-symposium supporting the idea that absolute conclusions are unscientific. (I sure hope I’m remembering this right!!) And I believe Steve Meagher supported the use of this term and explained why it’s used. As I recall, he did a good job of defending its use. I’ve also seen it mentioned in a few other places, but I’ve personally never seen it in a book or a peer review journal. This is a huge problem as far as Daubert is concerned. We need this new or different prospective published! If anyone knows of any article or book where it says that conclusions aren’t absolute (hopefully worded better than that) please let me know. Because the idea of absolute conclusions is published, it’s still being used in our industry. The attorney in the recent Daubert hearing even tried to articulate this idea, but the judges didn’t seem to go for it.
New topic:
I’m glad you brought up the example of 5 examiners having one conclusion and 1350++ examiners having a different opinion. You assume that the 5 examiners are wrong just by the numbers. Wouldn’t you agree that we shouldn’t look at the numbers of examiners on each side but we should be deciding who’s right based on the supporting data available to make the conclusion (reproducible friction ridge characteristics)?
Another topic:
How to count and council people on false negatives, and by the way, all the examiners in my office can confirm that I’ve made my fair share of these. The problem isn’t that I’ve made them but why I’ve made them. I was trained that there where 3 conclusions I could come to, positive, negative, or incomplete (need better prints or mcp’s). This may not have been because of improper training, it’s very possible that I misunderstood what was being said. Either way I didn’t use the term inconclusive until the last year or so. Personally I think that examiners should be encouraged to use and understand the term ‘inconclusive’. Since I’ve started using inconclusive as a conclusion, my false negative rate has dropped significantly. People could assume from this story that I may abuse the term inconclusive but I really don’t use it that often, but I do use it.
I know you also brought up the analogy of testing methods = comparative analysis. I think this post is long enough so I won’t comment about it here but since I’m one of the people who have used this analogy I’ll email you directly about it.
Michele
Michele.triplett@metrokc.gov
You’re right, I do advocate for documentation but not in all cases, just for conclusions that aren’t obvious. The examiners in the Mayfield case documented their conclusions but not what lead them to these conclusions. One example is that they determined there were 4 impressions but they didn’t document what evidence there was to support this conclusion.
You’re wrong about me giving little attention to BV. I just wrote a long post on how we use it in our office on a regular basis. I support and encourage blind testing!!
You’re right about catchy phrases, there useless if you don’t understand them and can’t elaborate on them.
About your taste test example: In a taste test the testing is done prior to a conclusion being arrived at, as I think it should be. In ACE-V, this would be in the comparison phase. If verification is a peer review process (as Ashbaugh says) it can’t be done blindly or it would violate the peer review process required by science. Of course this is just my opinion, which I can substantiate. Can you substantiate how peer review can be done blindly and still claim to be scientific?
Charles,
I agree that absolute conclusions is a fallacy, some refer to it as “dogma”. Even if we don’t accept it, here’s one of the problems. This is published as being an accepted industry concept. I’ve never seen anything published that contradicts this opinion. Christophe Champod did bring this up in the e-symposium supporting the idea that absolute conclusions are unscientific. (I sure hope I’m remembering this right!!) And I believe Steve Meagher supported the use of this term and explained why it’s used. As I recall, he did a good job of defending its use. I’ve also seen it mentioned in a few other places, but I’ve personally never seen it in a book or a peer review journal. This is a huge problem as far as Daubert is concerned. We need this new or different prospective published! If anyone knows of any article or book where it says that conclusions aren’t absolute (hopefully worded better than that) please let me know. Because the idea of absolute conclusions is published, it’s still being used in our industry. The attorney in the recent Daubert hearing even tried to articulate this idea, but the judges didn’t seem to go for it.
New topic:
I’m glad you brought up the example of 5 examiners having one conclusion and 1350++ examiners having a different opinion. You assume that the 5 examiners are wrong just by the numbers. Wouldn’t you agree that we shouldn’t look at the numbers of examiners on each side but we should be deciding who’s right based on the supporting data available to make the conclusion (reproducible friction ridge characteristics)?
Another topic:
How to count and council people on false negatives, and by the way, all the examiners in my office can confirm that I’ve made my fair share of these. The problem isn’t that I’ve made them but why I’ve made them. I was trained that there where 3 conclusions I could come to, positive, negative, or incomplete (need better prints or mcp’s). This may not have been because of improper training, it’s very possible that I misunderstood what was being said. Either way I didn’t use the term inconclusive until the last year or so. Personally I think that examiners should be encouraged to use and understand the term ‘inconclusive’. Since I’ve started using inconclusive as a conclusion, my false negative rate has dropped significantly. People could assume from this story that I may abuse the term inconclusive but I really don’t use it that often, but I do use it.
I know you also brought up the analogy of testing methods = comparative analysis. I think this post is long enough so I won’t comment about it here but since I’m one of the people who have used this analogy I’ll email you directly about it.
Michele
Michele.triplett@metrokc.gov
-
L.J.Steele
- Posts: 430
- Joined: Mon Aug 22, 2005 6:26 am
- Location: Massachusetts
- Contact:
Error Rate
I think one would have to look at the definition used in other sciences. I've got a few friends that work in biomedical research. When I was working on the Patterson amicus brief, I talked with them about Mr. Meagher's testimony that ACE-V as a method has no error rate, only practitioners do. They found it extremely odd to make that assertion -- in their field, the error rate is the rate at which an incorrect result is reached, regardless of how the error was made.
If you read Cole's article, More than Zero: Accounting for Error in Latent
Fingerprint Identification, 95:3 J. Crim. L & Criminology 985 (2005), you'll find a discussion of the 20 or so known mis-ID (false positive) cases. Mayfield is not the only one to involve several experienced examiners. It seems odd to me that all of those folks misunderstood and misapplied the test.
ACE-V, by its nature involves a series of subjective judgements -- most critically the decision about what to call a difference/discrepancy between the latent and exemplar -- is it explainable or not. If ACE-V were purely mechanical -- how many blood cells are in this slide; did the substance turn blue or red when exposed to a reagent -- then one might be able to blame all of the errors on the observer. Here, however, the observer is a critical part of the process.
To go back to the above example -- one would think that "how many blood cells are on this slide" is a pretty simple question. Turns out that there's an observer effect problem -- the technician will then to come out with an answer that fits the expected norm. A machine count shows much more variation. See Berkson, The Error of Estimate of the Blood Cell Count as
Made with the Hemocytometer, 128 Am. J. Physiology 309,
322 (1940). You can get this affect with scales, x-rays, lab readouts, pretty much anything where a human being knows what the answer should be and takes subconscious mental shortcut. (The key idea here is the shorcut is _subconscious_ -- the person is unaware that they didn't do the test correctly.)
I think "zero error rate" is going to remain a weak spot and a logical target for Daubert challenges. To a typical juror there's always room for a mistake. Admitting that it is theoretically possible, then explaining all the steps taken to make it highly unlikely puts fingerprints in the same camp as DNA, blood typing, etc.
If you read Cole's article, More than Zero: Accounting for Error in Latent
Fingerprint Identification, 95:3 J. Crim. L & Criminology 985 (2005), you'll find a discussion of the 20 or so known mis-ID (false positive) cases. Mayfield is not the only one to involve several experienced examiners. It seems odd to me that all of those folks misunderstood and misapplied the test.
ACE-V, by its nature involves a series of subjective judgements -- most critically the decision about what to call a difference/discrepancy between the latent and exemplar -- is it explainable or not. If ACE-V were purely mechanical -- how many blood cells are in this slide; did the substance turn blue or red when exposed to a reagent -- then one might be able to blame all of the errors on the observer. Here, however, the observer is a critical part of the process.
To go back to the above example -- one would think that "how many blood cells are on this slide" is a pretty simple question. Turns out that there's an observer effect problem -- the technician will then to come out with an answer that fits the expected norm. A machine count shows much more variation. See Berkson, The Error of Estimate of the Blood Cell Count as
Made with the Hemocytometer, 128 Am. J. Physiology 309,
322 (1940). You can get this affect with scales, x-rays, lab readouts, pretty much anything where a human being knows what the answer should be and takes subconscious mental shortcut. (The key idea here is the shorcut is _subconscious_ -- the person is unaware that they didn't do the test correctly.)
I think "zero error rate" is going to remain a weak spot and a logical target for Daubert challenges. To a typical juror there's always room for a mistake. Admitting that it is theoretically possible, then explaining all the steps taken to make it highly unlikely puts fingerprints in the same camp as DNA, blood typing, etc.
-
Steve Everist
- Site Admin
- Posts: 551
- Joined: Sun Jul 03, 2005 4:27 pm
- Location: Bellevue, WA
Guest,Anonymous wrote:Michele,Michele Triplett wrote: How can we determine this when practitioners don’t document how they used ACE-V?
This seems to be a common theme of yours, that documentation of analysis will reduce rate of error. I disagree. I think the Mayfield case is a prime example. Once subject bias crept in documentation did not help. It seems you are in favor of documenting your analysis, but give little attention to blind verification.
Haven't you seen those taste test commercials? You're not supposed to tell the taste tester what they are supposed to believe (or your verifier). Even if you do have catchy phrases by your coworkers to get you through court.
I've been reading this post since it came up the other day and I haven't found anything that substantiates either of your two claims:
That Michele indicates that documentation of analysis will reduce rate of error or that she gives little attention to blind verification.
I do have the benefit of sharing an office with her, so I know some of her feelings on different topics that may or may not end up on the board. Even with that, I still haven't found anything posted to the board to back up these claims.
What I've found her saying regarding reducing errors is that doing an analysis scientifically will reduce errors and that documentation is a way of assuring that the analysis was done scientifically. I don't interpret this idea as being synoymous with "documentation reduces errors." I also don't intepret this idea as being synoymous with "documentation makes an analysis scientific" either. Instead, I see it as a way of auditing whether or not an analysis was done scientifically based on the methodolgy of that science. Here's part of a previous post where she discusses documentation along with error rates:
Regarding your claim about her giving little attention to blind verification, it has been mentioned twice in the above paragraph. First, she says that perhaps it could reduce errors. Second, she includes it in the sentence following regarding scientific analysis possibly including blind verification.I’ll finally get to my point! The study with the 5 examiners seems to show that we can be influenced by outside information and perhaps doing our analysis blindly would reduce errors. I believe that doing an analysis scientifically reduces errors (which may include blind verification) and without documentation we can’t be assured that the examiners in this case did their analysis scientifically. Perhaps the large error rate in this study is due to improper use of methodology and not strictly outside influences.
Further down in the same thread quoted above, I state my claim for the misuse of blind verification as being part of the V in ACE-V instead of as part of the C. Much of my post is a result of the conversations that we've had on the topic.
If you go further into that same thread, you'll find this from Michele:
I don't see that Michele pays little attention to blind testing/verification, I just think she advocates the correct use of it as a testing method and not a verification method. I know from personal experience that she does use it as part of her casework. But I've got the benefit of working with her to know this.When the latent has minimal quality or quantity, blind testing may be needed. Give the latent and the known prints to different examiners without any information. The number of examiners you give it to will depend on the quality and quantity of the latent. Ask the examiners to document why they’ve arrived at their conclusions. You may find that the characteristics you are using to support an identification aren’t reproducible to other examiners. Besides using blind testing, you may want to do additional testing, such as the use of predicting additional characteristics. Set the print aside for a few days. When you go back to it note at what point you think you have confirmed your hypothesis. Then look for additional information in the latent and see if you can predict it’s existence in the known print. If you don’t have additional characteristics to test your hypothesis, you my not want to individualize it. Another tool to use is consultation with other examiners.
Steve E.