Show me the print - a group exercise

Boyd Baumgartner · Post by **Boyd Baumgartner** » Mon Sep 17, 2018 10:47 am

Your calves must be tired from jumping so hard to conclusions

This was the comparison that got Ivan Futrell decertified due to what was deemed an erroneous exclusion by the IAI. (yes, Ivan's permission was given to share this).

The article incorrectly identifies this error as a misidentification when in fact it would have been deemed a missed identification or erroneous exclusion

http://www.poconorecord.com/article/200 ... /306299974

Because they made a technical error in misidentifying a fingerprint on a gun in a Monroe County murder case, two retired FBI analysts have been given one-year suspensions on their forensic expert certifications by the disciplinary body which gave them those certifications.

Ivan Futrell and George Wynn are appealing the International Association for Identification’s decision to suspend them for a year after they provided fingerprint analysis results requested by defense attorneys in the Helen Biank murder case. The suspensions prohibit Futrell and Wynn from testifying or being relied on as experts in any further court proceedings anywhere for that one year.

The issue as I see it goes back to the whole Certification board and standards for conclusions as opposed to just mere ground truth. What did Ivan mean by Exclusion? Was he using the conclusions and definitions the FBI was using at the time? Many in the discipline still see exclusion as 'not identified' not necessarily 'there is evidence that this was not made by person x'. Other people teach that exclusions must have certain criteria associated with the latent such as orientation, area and clear target group.

Even the Fingerprint Source book in the introduction to chapter 3 says:

Two impressions can be analyzed, compared, and evaluated, and if sufficient quality and quantity of detail is present (or lacking) in a corresponding area of both impressions, a competent examiner can effect an individualization or exclusion (identify or exclude an individual).

Does this mean incompetent examiners use inconclusive or perhaps that there is unconscious pressure to make such definitive conclusions? Or does this even say anything without giving those qualitative criteria by which an examiner is to arrive at a conclusion?

The problem that I see existing with this issue is as such: The IAI was presented with what, in effect was a conflict between two sets of conclusions and chose to uphold one and reject another. We don't know the standards they used to arrive at such a decision, or even if it was the best decision. Anecdotally, from this exercise, it seems unjustified to uphold an ID on this comparison. Furthermore, if they have standards by which the resolve disputes, why aren't those used on the certification test and fully articulated?

ID-NEWS-Vol39-No3-2009.jpg

So what do you think?

Bill Schade · Post by **Bill Schade** » Mon Sep 17, 2018 11:37 am

Wasn't George Wynn one of the examiners who got the Ricky Jackson comparison "right"

Seems like you're only as good as your last comparison.

And requiring them to take a 40 hour course on Latent Print Identification before applying for recertification? There must be more to the story than that

Pat A. Wertheim · Post by **Pat A. Wertheim** » Mon Sep 17, 2018 12:27 pm

Boyd Baumgartner wrote: ↑Mon Sep 17, 2018 8:32 am What if I told you that someone was decertified over this comparison.... What would you say?

Pat A. Wertheim wrote: ↑Mon Sep 17, 2018 9:26 am Next question -- are you going to turn over the names of the people who agreed with "identification?"

You're right, of course. When you said "comparison," I inferred "identification."

But more to your question, I knew about Ivan's decertification, but the comparison was never available for review. Not being privy to the actual discussions and charts that the cert board based their decision on, one can only wonder. But based on what we've seen here, I would be hard pressed to have to consider "identification" versus "exclusion" and come down hard on either side. Your informal survey shows a number of us fall in that zone.

So to take this discussion to the next step, which is where I presume you want us to extend this thought process, if this print had been on the Certification renewal test and an examiner were forced to make a hard call on either "identification" or "exclusion," there would be a significant number of certifications revoked.

Boyd Baumgartner · Post by **Boyd Baumgartner** » Mon Sep 17, 2018 12:53 pm

Pat A. Wertheim wrote: ↑Mon Sep 17, 2018 12:27 pm So to take this discussion to the next step, which is where I presume you want us to extend this thought process, if this print had been on the Certification renewal test and an examiner were forced to make a hard call on either "identification" or "exclusion," there would be a significant number of certifications revoked.

I don't know if I was going there with it necessarily. I was just curious how it would play out controlling for extraneous information and knowing that this comparison had real world implications. For me, it all rolls up into what I see as all interrelated: articulated standards, proficiency testing that mirrors casework, differentiating the strengths of conclusions, and the role of the IAI as more than just a networking organization. I think it's problematic that someone's professional reputation and ability to testify could change just because the makeup of the board investigating an incident like this was comprised of different people. In a worst case scenario, someone might be referred to a Brady List because of this.

anwilson · Post by **anwilson** » Mon Sep 17, 2018 1:40 pm

But more to your question, I knew about Ivan's decertification, but the comparison was never available for review. Not being privy to the actual discussions and charts that the cert board based their decision on, one can only wonder. But based on what we've seen here, I would be hard pressed to have to consider "identification" versus "exclusion" and come down hard on either side. Your informal survey shows a number of us fall in that zone.

I started in the field back in 2011. My first outside training was with Ivan. He talked about this incident in depth with the class and I will admit to feeling at the time that he was just disgruntled. When I saw this comparison now 7 years later (I work with Boyd so I was privy to it before he posted) I felt horrible in my original view of Ivan. While no one seems to have excluded the print I think the consensus at least appears to fall somewhere between identification and exclusion. For me, this goes to the heart of the importance of Inconclusive as an option in casework. I'd love to know if that was an option at the time and if it wasn't, would any of those involved change their conclusion if they now had the option to go Inconclusive?

The other question I have would be in a situation where you have a poor quality latent like this but interpret some level of consistency, is it more appropriate to potentially overstate an ID, call the print no value, or exclude the print when you don't have the option of inconclusive? It seems from the voting that people would say no value regardless which is interesting. I'd love to hear some of the thoughts on saying no value after a comparison has been performed.

John Vanderkolk · Post by **John Vanderkolk** » Tue Sep 18, 2018 10:07 am

Back in the day, I was trained for writing examination notes with something like "ident" and "non-ident" after comparisons. My conclusions were often stated something like, "...identified as having been made by the person who made the prints in item 2. No other identifications were effected." Back in the day, what did "non-identification" or "No other identifications were effected" mean to me? Basically, they meant I did not determine the source of the prints. I remember the SWGFAST discussions on "non-ident" back in the day. It took a while for "non-identification" to go away, if it has actually gone away. I wonder if "non-ident" was the conclusion that resulted in de-certification.

josher89 · Post by **josher89** » Wed Sep 19, 2018 5:37 am

When I first started my training in 2004, I was told by senior co-workers that "non-ident" was a safety net. It meant you couldn't find it at the time, but you weren't ruling out the possibility that it could be them. Maybe with a different set of exemplars, maybe on a different day, etc. it might be ID'd but it was a rare thing to say "excluded".

ER · Post by ER » Tue Sep 25, 2018 5:50 pm

Is this a complex comparison? What makes it so/not?
Absolutely yes. Because multiple distortion factors including deposition pressure, possible overlay, uncertainty in precise minutiae location, and color difference. And then after comparison it's definitely complex because there's a difference of opinions among experts.

If this is a complex comparison AND it's an ID, is it as strong of an ID as say a 10P to 10p ID? Why?
I think there might be enough there to ID (so long as it's not from an AFIS search). Is it as strong of an ID as inked to inked. Of course not. The risk of error increases as the amount of information decreases and with increased distortion. The ID decision is still very accurate, but this is definitely a comparison where trained examiners might validly reach either the ID or INC decision.

Should the strength of an ID be reported as well for transparency's sake?
I think this depends on what you mean by 'reported'. Is it necessary to put in the levels of distortion or the number of points into every report? Absolutely not. All of the analysis and any differing conclusions must be in the case notes, but that level of detail is not appropriate for every single report. If an examiner is asked about the difficulty of the comparison in an interview, deposition, or on the stand, should the examiner be honest of forthcoming? Absolutely yes.

What role does objectivity play in the conclusion?
Especially in complex comparisons, the examiner should be documenting which minutiae were marked during analysis vs. later in the examination process. This is a comparison where a blind verification should be utilized in addition to a normal non-blind comparison. (I'm hoping that's what the question was aiming for.

Do minutiae introduced after the exposure to the known diminish their value? (the GYRO paper seems to think so)
Yes. At least to some extent.

If you need the orange minutiae to tip you over the threshold for sufficiency for an ID are you biased?
No. Using the word 'tip' suggests the concept of a scale. Orange points of similarity may be smaller pebbles on the scale than nice green points, but they still have some weight that can tip the scale. Features showing similarity each add some weight towards ID, and features showing differences add some weight away from ID. It's much more appropriate to utilize orange points as smaller pebbles on the scale than to discount them entirely like they never existed because of an inappropriate fear of 'bias'.

If you consume all the data in order to ID the print, does it diminish the weight of the conclusion (ID more correctly weighted as 'Support for ID')
I've never heard anyone express a concern of "consuming all the data". I'll rephrase the question as, "Does an ID with fewer points have less weight?" Well, yes. But it's still an ID. Maybe it's better to say that an ID with data that was not "consumed" has an increased weight.

Should objectivity be a standard for a conclusion?
No. And if it is, then apply it to the latent print field after it's applied to psychologists, medical examiners, biologists, meteorologists, doctors, medical technicians, and lawyers first.

If you have to alter your original GYR minutiae from the analysis, should that really cause them to be orange?
No. GYRO is just a method of documentation. If you move a green point from an ending to a bifurcation, it's still green. Your documentation showed that it moved, but that's fine. GYRO doesn't indicate moved points. It's best practice to be have this documented somehow, but changing the color entirely for connective ambiguity isn't what GYRO was designed to do.

Is GYRO a sufficient tool given that it doesn't mark sequence, distortion, or differences in data between the prints?
No. But it's not meant to. It's meant to document confidence in minutiae location (and sequence) during Analysis. Additional documentation should be used to document any distortions or differences (if any).

How does one reconcile GYRO's notion of confidence in minutiae with subjectivity of minutiae?
My thoughts on GYRO's indication of confidence is that when I find the match, I'll have to delete 5% of my green points (95% confidence), 25% of my yellow points (75% confidence) and 50% of my red points (50% confidence). This doesn't count moving points, because my confidence indication is meant to mean the presence of some ridge event occurring, not whether the point is an ending or a bifurcation. Now, I do need a better tool to calibrate myself on these confidence levels by tracking all this, but I'm working on that.

Given that the comments in the Analysis were remarkably consistent regarding the presence of overlapping ridges, can an ID be aggregated from two fragments of what is arguably the same print?
Possibly, but I currently haven't seen adequate evidence that LPE's are highly accurate at determining when two fragments are actually from the same print (or are even adjacent). John Black's paper is a good start, but my recollection of that paper seemed to indicate that when LPE's leave non-adjacent prints in a position that looks adjacent, examiners incorrectly marked them as adjacent. I also seem to remember that the study design was changed slightly to prevent LPE's from leaving non-adjacent prints that look adjacent.

if people render a conclusion outside of the majority, should that be considered an error? Should they be held accountable?
I don't think that an exclusion decision is appropriate here, and should be considered an error. Agency policy should be clear on the consequences of an erroneous exclusion, but it should not be severe (especially in this case).

Does your agency utilize a process similar to this where you render a consensus conclusion? (AFIS Inconclusives, Complex Prints, etc)
I would think that every conclusion is a consensus conclusion since every conclusion must be verified. So yes?

Boyd Baumgartner · Post by **Boyd Baumgartner** » Wed Sep 26, 2018 12:44 pm

ER wrote: ↑Tue Sep 25, 2018 5:50 pm If you need the orange minutiae to tip you over the threshold for sufficiency for an ID are you biased?
No. Using the word 'tip' suggests the concept of a scale. Orange points of similarity may be smaller pebbles on the scale than nice green points, but they still have some weight that can tip the scale. Features showing similarity each add some weight towards ID, and features showing differences add some weight away from ID. It's much more appropriate to utilize orange points as smaller pebbles on the scale than to discount them entirely like they never existed because of an inappropriate fear of 'bias'.

I suppose I'm interested in fleshing out your overall view of objectivity and bias as they seem to be at variance with what's published. For instance, take a look at the White Box Study:

White Box Study Conclusions Paragraph 8 wrote:ACE distinguishes between the Comparison phase (assessment of features) and Evaluation phase (determination), implying that determinations are based on the assessment of features. However, our results suggest that this is not a simple causal relation: examiners’ markups are also influenced by their determinations. How this reverse influence occurs is not obvious. Examiners may subconsciously reach a preliminary determination quickly and this influences their behavior during Comparison (e.g., level of effort expended, how to treat ambiguous features). After making a decision, examiners may then revise their annotations to help document that decision, and examiners may be more motivated to provide thorough and careful markup in support of individualizations than other determinations. As evidence in support of our conjecture, we note in particular the distributions of minutia counts, which show a step increase associated with decision thresholds: this step occurred at about seven minutiae for most examiners, but at 12 for those examiners following a 12-point standard.

I understand your explanation, in essence to say that basically any sign of similarity adds weight, but then you go on to say:

ER wrote: ↑Tue Sep 25, 2018 5:50 pm I think there might be enough there to ID (so long as it's not from an AFIS search)

Doesn't this all just add up to not bias as a fear, but as a factor in assessing similarity and weight? Logically speaking, the minutiae are no more or less similar just because it came from AFIS vs a detective/trial consultation/etc. Is there a alternative universe scenario where you ID this print in one universe because you're consulting on the case and you inconclusive it in another because it came from AFIS?

NRivera · Post by **NRivera** » Thu Sep 27, 2018 7:35 am

I have to echo this response somewhat. It does make me cringe a little bit to hear that a comparison decision is influenced by the source of the known print. I know that's a reality for many examiners, and there are good reasons for it, but I don't think it's what we should strive for. Yes, we are far more likely to encounter a close non-match from AFIS. That should be cause for caution in every comparison, not just AFIS searches. AFIS is just an easier way to find them. That being said, I view GYRO markings as being more for my benefit than any reviewer or outside customer of the data. I was trained that it's ok to be "wrong" on your analysis markings so long as you adhere to due diligence in searching for the ridge detail. I mean, who hasn't found a matching known and realized that their initial latent orientation was more than 30 degrees off? Marked up a latent palm that ended up being identified to a finger? It's not often, but it can happen and it doesn't invalidate an identification when it does. Nothing says you can't use GYRO to annotate orientation and anatomical source on analysis as well. That's a good reminder to you on comparison of how you should tailor your search strategy, especially on difficult prints.
I also wanted to throw my 2 cents in on "consuming all the data". The textbook answer to the question of how many features does it take to make an ID goes along the lines of considering all the data present, not just the number of points. Following that train of thought, I submit that you should be consuming all the data on every comparison, whether you annotate it or not. Do we gain anything from annotating 120 minutiae on the comparison of a really good quality latent? Absolutely not. Drawing a hard line on what or how much to annotate is a big gray area that we could debate ad-nauseum, but we should be considering everything that is there at least visually every time. Why? Because close-non-matches from AFIS.

ER · Post by ER » Fri Sep 28, 2018 5:43 pm

When I compare in general, I have a standard of what I am comfortable identifying. It's a bit nebulous and hard to verbalize, but it's based on years of training, practice, research, testing, and discussion with other experts. But when I find similarities that result from an AFIS search, I want a bit more for a few different reasons.

1) It's just a numbers thing. The relevant population when the suspect is listed on the request and the relevant population when the name comes from AFIS is very different, and this difference gives many orders of magnitude differences in the prior odds. In order to overcome the higher priority, it is appropriate to require a high LR. Now, this is all without using numbers but the theory is pretty solid.

2) I think there is enough evidence out there to believe that borderline ID's from AFIS searches have a somewhat higher risk of error than borderline IDs that didn't come from AFIS searches. Requiring slightly more information helps to mitigate this increased risk. Yes, we always need to be careful, and yes, there is a risk of error even without AFIS, but the slightly higher risk means that a higher threshold is appropriate.

3) We've all seen the most famous example of an erroneous ID to an AFIS candidate. I don't think that the FBI (or even the rest of us at the time) fully appreciated the power and slightly increased risk from IDing AFIS candidates, and we ended up with Mayfield. I'm hearing some preliminary data from Europe where a review of proficiency tests showed a significant false positive rate (much higher than the black box paper). It sounds like there were quite a few close-non-matches from AFIS included in the tests. (I need to get more info here, so please don't take any of the Europe stuff as completely accurate.) If you've taken a class from me in the past 5 years, you've seen the print that about 20% of examiners erroneously ID. Although that's comparing under non-ideal conditions and in an erroneous exclusion class, it still shows a practical example of the increased risk.

Before I answer about the White Box quote, what part do you see as not matching up with what I was saying about orange points. I'm missing the specific thing here.

NRivera · Post by **NRivera** » Mon Oct 01, 2018 8:50 am

ER wrote: ↑Fri Sep 28, 2018 5:43 pm ...The relevant population when the suspect is listed on the request and the relevant population when the name comes from AFIS is very different, and this difference gives many orders of magnitude differences in the prior odds.

I understand what you're trying to say here ER, and it makes sense. I am looking at the prior odds from the perspective that they are determined by who had legitimate access to deposit the questioned print, but I can see how the pool of potential subjects for comparison will also influence that number. In addition, we usually have no idea what the prior odds could be based on legitimate access. Having said that, It's not at all uncommon, at least for us, to exclude named subjects for comparison and subsequently identify a completely different individual in AFIS so maybe my experience in that regard is different from most. Would you say that seeing a named subject on a request creates a certain expectation on your part when approaching a comparison? If you are adjusting your decision threshold based upon that information, is that not a form of contextual bias?
I believe we would be on significantly more solid footing if we approached every comparison as if it came from an AFIS search and focused solely on the data at hand during ACE-V.

Boyd Baumgartner · Post by **Boyd Baumgartner** » Tue Oct 02, 2018 8:27 am

ER wrote: ↑Fri Sep 28, 2018 5:43 pm Before I answer about the White Box quote, what part do you see as not matching up with what I was saying about orange points. I'm missing the specific thing here.

I suppose it's all intertwined so I'll try to untangle it, but I'll start off with the fact that I tend to see Pat's concept of 'Reliable Predictability' as the way I approach comparisons because I think that's the strongest strategy given what we focus on. Namely, the data in a print. The job of the examiner is to navigate the print and look for opportunities to anchor to various data points in order to reach a conclusion. Just as a rock climber anchors to various points, the weaker the more perilous, so does the Examiner. With peril being a risk of error in the case of a conclusion, especially that of an ID. For me, this sets the stage for the significance of orange dots, the moving of original dots and ties in with the white box quote.

I think the White Box Study abstract gets it right when it says:

White Box Abstract wrote: The extensive variability in annotations also means that we must treat any individual examiner's minutia counts as interpretations of the (unknowable) information content of the prints: saying “the prints had N corresponding minutiae marked” is not the same as “the prints had N corresponding minutiae.”

And this:

White Box Discussion and Results Paragraph 2 wrote:Value is a preemptive sufficiency decision: NV indicates that any comparison would be inconclusive.

Taken together, I see the practical implications of this as such. If an examiner changes the location of their data points, they were not strong to begin with, and their preemptive bid as it were about the ability to ID was incorrect. (maybe think about this as the prior probabilities in a Bayesian framework). Given my reliable predictability framework, the more one has to move a point, the less predictive power it had/has.

In the same vein, if an examiner marks data points in the latent after seeing the known and they didn't mark them in the analysis, this severely diminishes their predicting power because of the directionality of their predictive power. Their predicting power flows only to the latent. I wouldn't discount them altogether, but I would certainly would debate their role as 'evidence' (again, given a Bayesian framework), and if I had a significant portion of them, I'd conservatively lean inconclusive.

I think it's problematic to consider 'relevant population' for comparisons because it's biasing (aka has zero to do with the data in the print). First, if your LIMS tracks it, count how many officer and victim prints you ID each year. Given the relevant population of officers and victims to potential suspects, relevant population should dictate you look at officer and victim prints first. Secondly, given that human behavior follows pareto distribution a purely relevant population framework would mean you give more credence to subjects developed through AFIS, not less. If you truly thought AFIS candidates had less relevance, you wouldn't stop looking once you identified the print. Thirdly, you don't know how relevant subject names are in a case. You don't know when the detectives are on a 'fishing expedition', when the victim actually turns out to be the suspect in a case, when there's a crooked cop, etc. From a purely statistical population standpoint, the relevant population should only be applicable to populations with those features, with those sequences, with those shapes, in that orientation, in that spatial relationship, which is the work Hardy-Weinberg does in DNA statistics that we lack.

CLPEX Chat Board

Show me the print - a group exercise

Based on what you've seen, the most appropriate conclusion is:

Re: Show me the print - a group exercise

Re:What do you think?

Re: Show me the print - a group exercise

Re: Show me the print - a group exercise

Re: Show me the print - a group exercise

Re: Show me the print - a group exercise

Re: Show me the print - a group exercise

Re: Show me the print - a group exercise

Re: Show me the print - a group exercise

Re: Show me the print - a group exercise

Re: Show me the print - a group exercise

Re: Show me the print - a group exercise

Re: Show me the print - a group exercise