OSAC 5 Conclusion Scale/Research

Boyd Baumgartner · Post by **Boyd Baumgartner** » Sun May 31, 2020 7:03 pm

The other day on the CAL DOJ conference call, representatives from the OSAC Friction Ridge Subcommittee were on to answer questions regarding the removal of the word DRAFT from their documents, their relationship with the ASB and any other questions people had. (FRStat, Policy, Enforcement, etc).

I asked a question that was basically this; Considering that all the performance testing (black box, white box, miami dade) was done on three conclusion scales, does moving to this 5 scale model undermine the validity of fingerprint identification, especially given that foundational validity as outlined by the PCAST:

“requires that [the method] be shown, based on empirical studies, to be repeatable, reproducible, and accurate, at levels that have been measured and are appropriate to the intended application.”

While obviously the OSAC members didn't think it undermined the validity (spoiler alert: it does), they also pointed to a study that compared the 3 standard scale to the 5 standard scale which I believe was cited as evidence that the scale is valid (but not in a way recognized by the PCAST report mind you). They also framed the reluctance of people to adopt the scale in terms of fear. I always love the fact that any time there’s a new policy in the works it’s presented as literally having no risks or trade offs associated. This should always be a red flag. There are trade offs to every policy change.

Let's take a look at the paper and discuss what I mean.

From the paper:

However, it is important to note that our results need only to be approximately similar to casework, because the goal of this study is not to measure error rates on an absolute scale, but to consider what changes might occur if an expanded conclusion scale is adopted.

There goes the 'appropriate to the intended application' prong of the PCAST quote above regarding foundational validity. This paper, in it's own words wasn't intended to do that. That was evident from the methods section where it's stated that:

The experiment differed from normal casework in that participants only had 3 minutes to complete each trial and latent prints and exemplar prints were shown at the same time

This in effect, makes the study more of a Seven Minute Abs of comparisons than anything else.

But wait! There’s more. Later in the paper it says:

The distribution of proportions shown in Table 5 suggests that our comparisons were of similar difficulty to those from black box studies, which are designed to emulate the difficulty of impressions encountered in casework. Thus, we believe that our choice of latent impressions and comparison exemplars produced an environment that is similar to actual casework.

First of all, difficulty is a function of the Examiner. I wish someone would write that down. We’re more concerned with complexity, which is a function of the print. (lack of clarity, lack of orientation, use of level 3 data, limited data (read: boundary cases)). Complexity is even in the ULW comparison software as a checkbox. It also has a poor man’s version substituting as the Quality Metric in the ULW LFIS/LFFS encoding modules.

Other experimental design flaws include:

This experimental design omitted the “of value” decision. We made this decision because the interpretation of our results depend in part on model fits from signal detection theory, and it is difficult to fit models in which an initial quality threshold is assessed

Don't get me wrong, I like the study, I just don't think it's evidence for the validity of the 5 conclusion standard in the way that it was sold on the call. Let's look at what the paper has to say about the results.

First, the proportion of Identification responses to mated pairs drops from 0.377 in the 3-conclusions scale to 0.266 in the 5-conclusion scale. This suggests that examiners were redefining the term Identification to represent only the trials with the strongest evidence for same source.

Queue needle scratch on the record sound.

This is problematic in the sense that it reduces IDs to being uni-dimensional. On the call, I forget who asked the question but there was some discussion that actually Support for Same Source had a multi-dimensional component, meaning that there could be a scale within that scale. And let's look at the problem with that.

Second, note that the Inconclusive rate drops from 0.569 in the 3-conclusion scale to 0.351 in the 5-conclusion scale. Some of these Inconclusive responses likely distributed to the Support for same source response, because not all of the Support for Same Source responses could have come from the weak Identification trials (0.377-0.266 is only 0.111, whereas the proportion of support for same source is 0.241).

Backing up a minute to the Materials and Methods section we see this:

This experimental design omitted the “of value” decision. We made this decision because the interpretation of our results depend in part on model fits from signal detection theory, and it is difficult to fit models in which an initial quality threshold is assessed. Both scales included an ‘inconclusive’ category, and while we understand that in casework ‘no value’ and ‘inconclusive’ have different meanings, we considered the two to be approximately equal for the purposes of comparing the traditional and expanded conclusion scales.

Given the inconclusive rate and their definition of inconclusive in this scale as literally having no value, the implications here are that there is a tendency to erroneously associate a person to a case where as in the 3 point scale, that is a not a problem. But that’s only if you think Support for Same source means you’re associating a person to the case (or inculpatory as the paper terms it). In Figure 1, it appears as though the Jury thinks that way.

So, in essence what’s happened here is that we’ve redefined the meaning of Identification and the jury will redefine the meaning of ‘Support for Same Source’ and we’ve actually added to this bucket people who should not have been there. The overall effect, we’ve outsourced overstating to the jury from the Examiner, but made no real difference.

Hold up though, there’s still one last ball we need to juggle. Varying degrees of Identification.

The Litmus test in any of these is always Mayfield. By the uni-dimensional redefinition of Identification to mean only the strongest, what happens to the Daoud ID? The Zero Point ID? Certainly we aren’t lumping Daoud into the strongest category of ID are we? Remember, the paper says:

Out of 27 participants, 21 had an identification threshold shifted to the right (i.e. more conservative) in the 5 Conclusion scale relative to the 3 Conclusion scale (exact probability is 0.0029). This demonstrates that examiners redefine what they mean by an Identification when given more categories in the scale (become more conservative).

So, in the instance of Mayfield, we actually have two competing ‘Support For Same Source’ propositions. Where is the guidance for that? Especially considering the fact that one of them was an error, not just by a 3 scale standard, by any standard.

And this brings me to my next point. Rejection at the ASB level. On the DOJ call there were numerous questions about the appropriateness of the overlap between OSAC and ASB board members. The implication being that acceptance of the OSAC documents could be forced and not organic. If we look at the mission statement of the ASB https://www.asbstandardsboard.org/mission-vission/ (the typo in the url is too ironic) they’re actually charged with:

Provide training on implementation of ASB standards

as well as

Foster collaboration and participation: AAFS; Constituents and Other SDOs

So riddle me this, How is the ASB supposed to provide training on a standard they didn’t approve, which obviously contradicts itself and shows itself to be impractical? (at least at this stage) Especially in light of the fact that the OSAC cut off collaboration in spite of having overlapping members.

Lastly there’s this. Does the IAI Latent Print Certification become invalid now that the DRAFT has been removed? No one has been certified under a 5 standard scale after all. I would envision a whole host of problems with implementing a 5 point scale given the fiasco we had with adding one of the 3 into the mix not too long ago.

If anyone wants to do a Zoom mock trial where they defend the 5 conclusion scale where I play defense, let me know. We can record it and put it up here.

Dr. Borracho · Post by **Dr. Borracho** » Mon Jun 01, 2020 3:15 am

Boyd Baumgartner wrote: ↑Sun May 31, 2020 7:03 pmIf anyone wants to do a Zoom mock trial where they defend the 5 conclusion scale where I play defense, let me know. We can record it and put it up here.

I nominate Mr. Swofford. No one else I'm aware of believes in the 5 conclusion scale to the same degree he does, nor has anyone else worked harder to promote it. I, for one, would love to hear his defense of this earthquake-shaking change in friction ridge comparison conclusions under a competent cross-exam such as Boyd proposes. Can we have a vote to subpoena him for such a mock trial, perhaps at the next IAI in front of a live audience, a 1,000 person jury, so to speak?

Dr. Borracho · Post by **Dr. Borracho** » Mon Jun 01, 2020 6:55 am

Another question comes to mind, although maybe it's already been asked and answered. And perhaps it is more a question of individual department policy than of the proposed 5 conclusion scale, but how should verification be handled?

If I make an identification and my verifier says no, it's only support, not strong support, what do I do? Do we default to the lowest level of association, i.e., support, rather than strong support?

What if I say "tends to support different source origin," but my verifier says no, it's just totally in the middle of the scale as inconclusive? Where do we go from there? Do I get to shop around for somebody who thinks as I do? Is it a conflict requiring a root cause analysis and corrective action?

I know others must have been thinking about this. Your thoughts?

Michele · Post by **Michele** » Mon Jun 01, 2020 8:07 am

Boyd said,

First of all, difficulty is a function of the Examiner. I wish someone would write that down.

Your wish is my command, I added both terms to the fingerprint dictionary.

Complexity
A measure of intricacy based on the amount and type of components utilized, as opposed to the measure
of difficulty. The complexity of a fingerprint can be measured by assessing four factors: the self-evidence
of the area and orientation, the type of features utilized, the ambiguity of the features utilized and the
quantity of the features available.

Difficulty
A measure of personal ability, as opposed to the measure of task complexity.

Let me know if you have other definitions that would be helpful to practitioners.
http://fprints.nwlean.net/

tombusey · Post by **tombusey** » Mon Jun 01, 2020 6:00 pm

A better version of the manuscript can be found on my website:

https://buseylab.sitehost.iu.edu/homepa ... lications/

(I pulled off the old draft and screwed up the first links in this thread; sorry about that)

Here is the complete site:
https://buseylab.sitehost.iu.edu/

Here is the direct link:
https://buseylab.sitehost.iu.edu/homepa ... Scales.pdf

I'd love to talk more about the implications of this article.

-Tom

Boyd Baumgartner · Post by **Boyd Baumgartner** » Tue Jun 02, 2020 1:30 pm

Tom,
I'd love to hear your thoughts on the implications of the article. If you think I'm making thin arguments regarding my interpretation, by all means push back. At the end of the day, I will be forced to articulate on the stand why I do or do not follow/agree with these policy decisions, so I'd rather consider their strongest form rather than give them short shrift in order to develop a response.

My succinct interpretation of the problem that arises is this: Once you do away with the current idea of identification, all you have is support for same source (on the inculpatory side), so now you need a way to discuss the strength of that support. The categories given are not self evident in what they mean or how to apply them.

Thanks!

ER · Post by ER » Wed Jun 03, 2020 2:30 pm

What is your message to the scores of agencies that are already reporting associations less than an ID (e.g., "could not exclude" or "inconclusive with similarities")? These conclusions weren't included in any performance study. Should everyone simply revert to the conclusions and definitions from these studies? Which terms and definitions? Did every study use the same terms and definitions?

You're holding the old 3-conclusion scale up as a universally accepted and consistently applied standard. It is most clearly not. If performance studies MUST come before accepting a conclusion scale, then shouldn't every ID from before 2011 be thrown out. None of them were performance tested. And if it's ok to accept them, then why not accept the 5-conclusion now and allow the research to proceed over the next few years?

I feel the need to reiterate. The OSAC Conclusion document DOES NOT require the use of 5 conclusions. It defines 5 conclusions for each agency to use as they see fit. Those that only want 3, get to only use 3. However, there are hundreds of examiners already using more than 3 conclusions that are asking for standard terms and definitions for what they are already doing.

So, when offered the olive branch of, "Keep using 3 conclusions. That's perfectly fine", why is the response, "NO! No one is allowed to use more than 3 conclusions!" This is a consensus document that requires collaboration and cooperation and compromise. The OSAC document includes options to fit the needs of many. Frankly, your alternative does not.

Where is the compromise for the agencies already reporting more than three?

Michele · Post by **Michele** » Wed Jun 03, 2020 6:48 pm

Why can't you develop a limited number of conclusions (3 or 5) prior to developing the method, stating standards for how to arrive at each conclusion, and finding out what other conclusions may be possible? Because that's not science, it's jumping on board without critically analyzing the situation.

I'll be discussing the current system set up to improve forensics and examine the topic above and other flawed thinking during a webinar next week if you're open to the thoughts of others.

https://www.tritechtraining.com/061120- ... ocols.html

ER · Post by ER » Wed Jun 03, 2020 8:55 pm

Where was the outrage when everyone switched to Exclusions 10 years ago? Where was the outrage when everyone switched to Individualization and then back to Identification? Where was the outrage when SWGFAST wrote and rewrote and rewrote the conclusion standard?

The scores of forensic experts, statisticians, lawyers, and quality assurance personnel with local, county, state, federal, and international involvement have set forth a COMPROMISE document that allows each agency to choose the conclusion scheme that works for them.

But a small group is telling everyone, "NO! YOU can't do it like that because WE don't like it."

We need a compromise that allows each agency to set their own scheme with everyone agreeing on basic terms and definitions. Isn't that the best solution? Almost half of surveyed agencies are already reporting the equivalent of Support for Same Source with more and more agencies moving in that direction. Yet not one of those agencies is telling YOU what to do at YOUR agency. Why are you forcing your views on them? Why are you telling them that they aren't "critically analyzing the situation"?

Again, where is the compromise for the agencies already reporting more than three?

josher89 · Post by **josher89** » Thu Jun 04, 2020 6:27 am

Isn't Support for Same Source saying the same thing as reporting an AFIS search that, while having some clear corresponding data yet not overwhelmingly so or at least not enough to ID, but still reporting a name and calling that an investigative lead (with the caveat that it's not an ID)?

It would take all of the stakeholders to understand what this means and would involve some discussion but I know that there are agencies out there that use this (the same as Inconclusive with similarities). They are saying the same thing in different ways.

How is SSS any different? Is it because we aren't able to assign a value of support (limited, moderate, strong)? Those will always be adjectives and I don't think it will be possible to assign a strict value to them. Rather, an expert can assign their level, or amount of support, based on their expert opinion and hopefully backed up with some research and not just a 'gut feeling'. It could be dead wrong, but nevertheless, we've been giving opinion evidence for over a century.

Boyd Baumgartner · Post by **Boyd Baumgartner** » Thu Jun 04, 2020 7:44 am

ER wrote: ↑Wed Jun 03, 2020 2:30 pm What is your message to the scores of agencies that are already reporting associations less than an ID (e.g., "could not exclude" or "inconclusive with similarities")? These conclusions weren't included in any performance study. Should everyone simply revert to the conclusions and definitions from these studies? Which terms and definitions? Did every study use the same terms and definitions?

My message? Words mean things.

I feel the need to reiterate. The OSAC Conclusion document DOES NOT require the use of 5 conclusions. It defines 5 conclusions for each agency to use as they see fit.

If the 5 conclusions can have meanings as each agency sees fit, they are quite literally meaningless from a standards perspective (see response #1)

The scales are just acting as tags. But tags without value associated to them just become the fashion, not an indicator of value. I think people are done with fashionable language. The C in ACEV doesn't stand for compromise, it stands for Comparison, so like standards for exclusions, qualified standards that point to data in the print or intricacies encountered in the comparison are of better utility.

anwilson · Post by **anwilson** » Thu Jun 04, 2020 8:07 am

You're holding the old 3-conclusion scale up as a universally accepted and consistently applied standard.

Logical fallacy you're using: Straw Man Argument. “We” are not saying that. What "we" are saying is that muddying the water of inconclusive with no basis that it’s an improvement or clearer and could potentially be very damaging to the very citizens we serve and then criticizing people who don’t agree with that as being “old-school”, “afraid of change”, etc. is inappropriate. People are free to disagree with me and I am not speaking for others.

So, when offered the olive branch of, "Keep using 3 conclusions. That's perfectly fine", why is the response, "NO! No one is allowed to use more than 3 conclusions!"

Another Straw Man Argument, show me where anyone on this thread said this. You’re deflecting away from the discussion because to you, yelling puts you in a stronger position. It’s a cheap and easy way for you to try to make your position look stronger by mischaracterizing a dissenting view to sound ridiculous.

The scores of forensic experts, statisticians, lawyers, and quality assurance personnel with local, county, state, federal, and international involvement have set forth a COMPROMISE document that allows each agency to choose the conclusion scheme that works for them.

Logical fallacy you're using: Appeal to Authority. I don’t care what someone’s credentials are if they’ve shown they have a vested interest in pushing something out that hasn’t been shown to improve anything. While people in positions of authority can be helpful guides, they can also be wrong and deserve some level of scepticism since they can make mistakes, mislead, and can overstep their expertise. You're also using the Bandwagon Fallacy: specifically concensus gentium. I’m not going to simply accept something because the “relevant authorities” say it is best with no evidence that it is. Historically, that has not boded well for the vast majority of people.

But a small group is telling everyone, "NO! YOU can't do it like that because WE don't like it."

Again Straw Man Argument: no one is saying that. Also Appeal to Authority: the “minority” view should be discounted because they are not the “majority”? I'm asking you if that's what you mean as I don't want to misunderstand what you're trying to say/imply.

Why are you forcing your views on them?

Rinse and repeat of the Straw Man Argument.

Perhaps, instead of yelling at people for disagreeing with you Eric you should take some time to reflect on your own angry posts. All you do is get on this board yelling at people and mischaracterize comments you don’t agree with. That’s fine if that’s what you need to do to make you feel good about yourself. But trying to drown out dissenting voices is not helpful and just shows your weakness to articulate your position based on anything other than your feelings. Like SWGFAST, some will use the OSAC documents to articulate their work and that’s great if it helps them better articulate their work, and others won’t because they can articulate their position in a different way that is still transparent and that’s ok as well. Ultimately, what the courts and citizens need from our discipline is for examiners to better articulate how they reached their conclusion when testifying. And that needs to be done in an open way that takes into account a risk assessment of the potential prejudicial and damaging affects our work can have on groups of people. The point made by people who have disagreed with you is that there are other options out there and all should be openly considered.

ER · Post by ER » Thu Jun 04, 2020 1:36 pm

Perhaps, instead of yelling at people for disagreeing with you Eric you should take some time to reflect on your own angry posts.

Ad iram fallacy: accusing someone else of being angry to disprove their argument and also Tone Policing: judging an argument on its tone instead of its content
My posts here have never come from a place of anger, and capitalization is for emphasis, not yelling.

All you do is get on this board yelling at people and mischaracterize comments you don’t agree with.

Straw Man Argument and a bit of Ad Hominem Fallacy
That's not "all" I do. I don't yell. And I post on other topics too. And sometimes just read the board.

But trying to drown out dissenting voices

Another Straw Man Argument
I'm not trying to drown out anyone. I appreciate the debate here, and always have. I've never told anyone to stop posting here or discouraged anyone from posting here. Most of my last post was posing questions and seeking answers. I'm also not overwhelming the thread with my comments. I'm pretty confident that I fall pretty low on the number of posts on this forum and on these threads compared to other posters.

they’ve shown they have a vested interest in pushing something out

Ipse dixit: claim presented as true without support
What vested interest?

Is analyzing each other's posts like this really productive? Can we instead simply start with what we agree on?

Like SWGFAST, some will use the OSAC documents to articulate their work and that’s great if it helps them better articulate their work, and others won’t because they can articulate their position in a different way that is still transparent and that’s ok as well.

I completely agree. Agencies that want to continue using three conclusions should do so. I believe that the OSAC conclusion document allows them this option while also standardizing the terms that they choose to use. However, if others choose to articulate their conclusions in a different way, I also agree, that's ok. It's not the "standard" way, but it may be the way that best fits their courts and their community.

Ultimately, what the courts and citizens need from our discipline is for examiners to better articulate how they reached their conclusion when testifying.

100% agree.

The point made by people who have disagreed with you is that there are other options out there and all should be openly considered.

I completely understand and have consistently supported the decision to remain with a 3 conclusion scale. That's not what I would choose, but I fully support any agency that wants that model. But I've also heard from many agencies that want an expanded scale and many agencies that are already using an expanded scale. I believe their choice should be considered and supported as well.

And that needs to be done in an open way that takes into account a risk assessment of the potential prejudicial and damaging affects our work can have on groups of people.

I also completely agree with you here. But I would pose one follow-up question. Shouldn't this risk assessment be conducted by the agency that's implementing (or has already implemented) the additional conclusions? And for the many that already have done this, shouldn't there be a standard terminology and definition available for them to use?

Is there anything from the above that you agree with? Can we start there?

Mark · Post by **Mark** » Thu Jun 04, 2020 1:48 pm

Well said Ann. And finally, better effort Eric as we've discussed the "tone/anger" before. It leads no where. When you keep it out you have solid arguments. Stick with that.

Mark · Post by **Mark** » Thu Jun 04, 2020 1:49 pm

Oops. Amanda. Sorry about that.

CLPEX Chat Board

OSAC 5 Conclusion Scale/Research

OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research

Re: OSAC 5 Conclusion Scale/Research