AFIS Scores

Boyd Baumgartner · Post by **Boyd Baumgartner** » Tue Oct 16, 2018 8:44 am

Sandy Siegel's mailing list included a quote by Ed German and had what I thought was an interesting proposition, which became a topic of discussion this morning.

It said:

"Perhaps the most valuable fingerprint-related information I have heard in the past year is FBI Latent Print Examiner Kyle Tom’s ongoing research about NGI candidate scores. His preliminary research showed that when the matching scores of the #1 and #2 candidates has a difference of 1,250 or more, 83.5% of the time it will be an identification.

This is important because it means all agencies should consider implementing a policy that any such 1,250 or more difference in NGI candidate responses should require review by more than one examiner - either because the first examiner made an identification, or because there is an increased chance of an erroneous exclusion. Hopefully, the FBI will publish research on this topic in the future.

Some agencies are of the opinion that allowing examiners to see the matching scores in candidate lists biases them and should be precluded. The current chair of the OSAC Friction Ridge Subcommittee AFIS Best Practices Task Group (Mike French) and I are both of the opinion that AFIS matching scores are an important objective (not subjective) measurement which can lend valuable information to the examination process and aid quality assurance."

A couple points that I thought interesting were:

Is it problematic to have someone who works for an AFIS provider on the board of a best practices committee that is pushing the use of a product they're involved in?
Considering NGI does not require you mark a decision, the data on individualization is most likely coming from the FBI itself. Are the practices at the FBI sufficient to extrapolate to other agencies? (e.g. on what standards are their IDs based? Might that metric change if they engage in practices like the 'Show me the Print' thread?
What is the correlation of the data to the Quality Metrics in the ULW? If you're running only 10P quality latents, the data will be skewed.
What is the correlation between orientation being known, pattern type, presumptive finger position and how does this affect the number?
How is a claim of AFIS scores being objective actually claimed when they're either the result of manual encoding or in the instance of a LFIS the encoding/matching algorithm is still trained and significance determined by an algorithm, which are not necessarily objective.

The AFIS score dilemma has even been the topic of some research on it's biasing effects, so beware of everything you read on the internet.

Anyway, what do you think?

josher89 · Post by **josher89** » Tue Oct 16, 2018 10:52 am

I'll speak to point 1 now, and the rest when I have more time.

Mike has never, ever, used his position with Morpho (now Idemia) to push an agenda; product or otherwise. In fact, he barely speaks of it outside of a particular task group he and Ed (among others) are on for an upcoming document.

I think scores are generally not useful unless there is a significant gap between the top two candidates. We have observed very large gaps between those scores when #1 is a hit. We've seen hits on #2 and #3 as well, but the difference in the top three scores is relatively small. We have seen significant score differences between #1 and #2 candidates where they were obviously different (diff. pattern type) but similar groupings of L2D - like around a delta. We haven't the time to calculate the values that the FBI has (as mentioned in that post) but I wish we did. I do think that knowing how the score relates to potential ID is important and each agency can implement quality control measures to help combat identifying close non-matches and erroneous exclusions and this helps in determining when to do that.

I know there are some agencies that will only look at the top two or three candidates (using LFIS) and if a no-hit result happens, they will re-submit as an LFFS with minutiae cleaned up. This is a quality assurance measure and for that lab, I'm guessing it works well.

Scores shouldn't (and I'm guessing, aren't) used to determine ID but they can be used to assist in making an ID when you have no suspect. Case AFIS works - Glenn and Carey have the proof - and this can be looked at similarly.

Boyd Baumgartner · Post by **Boyd Baumgartner** » Tue Oct 16, 2018 11:03 am

My point about Mike isn't necessarily that he's pushing an agenda, it's more that he may be a tad biased about integration into an agency's workflow. It's right there in the quote from Ed.

This is important because it means all agencies should consider implementing a policy...

I wouldn't implement a policy based on that. The assumption is that since we work in a resource limited environment, this can help. This is the basis for 'Hit it and Quit it' case strategies as well where when one print hits in AFIS, you stop working the case, report the name and wait for further instruction by the Detective. Definitely resource saving, but is it best practice? Are scores helpful? When they're helpful they are....profound, I know.

I think this is the issue with Case AFIS as well. Especially when it may involve a significant amount of digital imaging to get those elim prints and additional latents into the system. It may impose a diminished ROI based on time when you look at implementing it, especially if those latents you are running aren't the normal ones you'd run through AFIS. Aren't you just increasing the risk of an inconclusive? There needs to be guidelines around such 'best practices'

josher89 wrote: ↑Tue Oct 16, 2018 10:52 am Case AFIS works - Glenn and Carey have the proof

Glenn looks the same, but Carey looks different than I remember...

youtu.be/OGvjJCPb8tU

Bill Schade · Post by **Bill Schade** » Tue Oct 16, 2018 11:50 am

Very funny, but I dont think this was the "proof" that Josh was referring to

NRivera · Post by **NRivera** » Tue Oct 16, 2018 12:49 pm

I ran some similar numbers on our NGI hits a couple of years ago. The average "hit" score was 20577 (N=219) and the average difference to the highest non-matching score was 4523. I would have to crunch some more on my numbers to break them out in a manner similar to Kyle's. The biggest takeaway we got from this was that 91.78% of our AFIS ID's were to the 1st candidate. If you went down to 10 candidates you're still at 98%. I didn't have time to dig into each case and figure out whether or not that 2% would have translated directly into missing the ID's completely or if it would have been caught by a separate search. The numbers do justify cutting down the requirement to look at the traditional 20 candidates is many cases, but there is nothing to prevent an examiner from looking at 20 or launching as many searches as they feel necessary in others.

I didn't see at it as a call for a policy. I saw 13 instances where the score difference between the matching 1st candidate and the second candidate was less than 1000. From my perspective, a comparison decision is a comparison decision is a comparison decision and you should put it through the same QA procedures you would otherwise normally use regardless of the known's origin. If you would consider implementing enhanced QA procedures for an AFIS search result where there is minimal difference in the top two scores, how does that translate onto the data actually present in the questioned and known prints? Why wouldn't you consider enhanced QA procedures for similar comparisons that don't originate from AFIS searches? It seems more reasonable to me to rely on the data present in the impressions than solely on what the scores may tell us.

Steve Everist · Post by **Steve Everist** » Tue Oct 16, 2018 12:58 pm

"Perhaps the most valuable fingerprint-related information I have heard in the past year is FBI Latent Print Examiner Kyle Tom’s ongoing research about NGI candidate scores. His preliminary research showed that when the matching scores of the #1 and #2 candidates has a difference of 1,250 or more, 83.5% of the time it will be an identification.

I’m curious to hear more about this data than the 1250/83.5%. Specifically, why were these numbers decided to be the critical indicator where a QA method would be should be applied? Is there something specific about 83.5%? Should this number be higher or lower to trigger policy? What does the graph look like that ranges from 70% - 90%? What about 95% or greater?

Is this for both fingers and palms, or just fingers?

This is important because it means all agencies should consider implementing a policy that any such 1,250 or more difference in NGI candidate responses should require review by more than one examiner –

Why this number?
How prevalent is it for agencies to not review candidate responses by more than one examiner?

…either because the first examiner made an identification, or because there is an increased chance of an erroneous exclusion.

If the examiner made an identification, then I’d hope it would be verified as the policy, not based on an AFIS score. I could even see stats like this going the other way with limited-resource agencies, because the statistics show that over XXXX score, XX% are ID’s, so they won’t be verified (unless going to court?).

It could be helpful for those agencies that don’t have resources to verify all AFIS non-hit searches, but why is 1250/83.5% the sweet spot? Can these results be expected to apply to agencies that may run prints of different quality levels (higher or lower)?

Hopefully, the FBI will publish research on this topic in the future.

This could be helpful. I feel this has brought up far more questions than answers as it could apply to the multitude of workflows for different agencies. We do 100% verification of AFIS searches, so it wouldn't provide much assistance to us. And I’m not sure it would translate to other agencies who have different workflows and run prints of higher and/or lower quality than the test prints. It seems more like it would need to be studied at the agency level, as Norberto has done.

Even with that, robust QA measures up front may negate the need for creating policy based on this “objective” number.

Boyd Baumgartner · Post by **Boyd Baumgartner** » Wed Oct 17, 2018 1:46 pm

I guess I'll just tack this on to this thread since it was brought up and just consider it a broader discussion of the use of AFIS.

Case AFIS

The big ideas wrote:
Forensic Magazine article

Presentation from Glenn

The 2 premises I can see being put forward are(feel free to correct me if I'm wrong):

People erroneously exclude
Searching is time intensive, especially for complex prints

The proposition at hand is that an agency can use AFIS to remedy the two problems listed above by:
Searching complex prints
Searching against people known to be connected to the case (e.g. subjects previously identified, victim elim prints, officer elim prints)

The result:

Better casework being defined as more IDs and more latents that are closed out through being Identified.

I think the 'game changer' label as used in the video might be overselling it a bit. The biggest issue is one I discussed in a previous thread re:PCAST and performance studies on the generalizability of performance studies. The big idea is that novel research itself must be validated and just because something is valid internally, doesn't mean it's valid externally.

So, in essence the 'proof' that Glenn and Carey have is, at best constrained to their lab, using their policies using their matching algorithm using those latents and those subjects. I'm not even disputing the findings. They found what they found......ok.

If we dig deeper on the two premises listed above we can easily ask:

Is there even an exclusion policy and if so what is it?
Developing exclusion guidelines as we've discussed on this forum could be a way to reduce erroneous exclusions without spending an extra couple million on a new AFIS.

Are those erroneous exclusions distributed among examiners evenly?
If Steve has10 erroneous IDs in our office and I make 0, saying the average erroneous exclusions is 5 does not mean that you should expect to see 5 from each of us, it means Steve needs a performance improvement plan.

What elements go into defining a 'complex' print?
If there is no standard as to what is meant by complex, you may change the hit rate of the Case AFIS. The absolute strongest chance you have in hitting in AFIS is when the encoder is allowed to work on both prints. Why you ask? Because it's systematic, humans are not. Look at the questions and answers in the Show me the Print thread as to what goes into someone calling a comparison complex and to the variability in feature extraction. The way we define complex here as applicable to Case AFIS is either limited data (low quantity) or has a high level of ambiguity (low quality).

Certainly not all Case AFIS performance will be the same, right?
Glenn and Carey's original paper mentions Cogent
Glenn and Bill are discussing Idemia
Forensic Comparison Software also includes this feature

What is the efficiency of performing this task?
Glenn and Carey's presentation mention that interns did this work. However is it conceivable that by the time you implemented an exclusion guideline and accounted for the distribution of erroneous exclusions that you could accomplish the same reductions and increase in case quality without spending one red cent on an upgraded AFIS? I think that's a reasonable position to take.

hype.jpg

Dr. Borracho · Post by **Dr. Borracho** » Thu Oct 18, 2018 5:17 am

I have never used "case AFIS," or as I've also heard it referred to, "closed AFIS."

But I know a couple of examiners who work in a lab that uses the process. As I understand it, they have the option of doing their comparisons in the normal manner first, i.e., compare all latents to all suspects. They claim the idents and search all exclusions and inconclusives through their desktop AFIS software against all the exemplars in the case. The idea is that the extra step is simply a quality assurance measure to minimize erroneous exclusions.

As I understand it, they also have the option of using a search of all latents against all subjects as a first step, but then concluding with a manual search of all inconclusives and exclusions against all exemplars.

Either way, the idea is that the extra step helps to minimize erroneous exclusions and, by policy, relieves the examiners of an error and a CAR if an erroneous exclusion is later discovered.

So, help me understand. Either way, what's wrong with that?

NRivera · Post by **NRivera** » Thu Oct 18, 2018 10:47 am

What the drunk Doctor said ^^^^^^^^^

I still take issue with bringing "complex" latents into the conversation this way. Yes, AFIS close non-matches are a thing that happens. Yes, they can increase error rates. Yes, they warrant additional QA measures because of this. IMHO, if you wouldn't call it because it came from AFIS, you shouldn't call it on a named subject. If you call upon enhanced QA measures because it came from AFIS and it could be a close-non-match, you should be doing the same if it's a named subject. In short, don't be a hero.

Boyd Baumgartner · Post by **Boyd Baumgartner** » Thu Oct 18, 2018 11:28 am

We have MorphoTrak version of AFIS here which has closed search, meaning you can pick specific people to search a latent against. It's basically a Case AFIS without the ability to scan in elim prints. Anecdotally from both using it and seeing it's 'hit rate' when verifying cases it's been used in, it's more of a hail mary option as opposed to a strategy. As I said above, it's used more when there is some sort of ambiguity in the print, either orientation, inability to determine the area on the hand from which it came, or it fails to have been identified and fails to meet the standards for an AFIS quality print, but there are subjects who have been identified in the case. Considering that my experience with it is not all that fruitful, I just don't see it justifying the added work, that's it. I could pull the numbers out of the Morpho Reporting that came with the system.

If someone wants to pull the numbers from their AFIS on the number of Hit decisions which come from that kind of closed search wants to post them, I'd like to see what percentage of runs and what percentage of hits they actually represent.

Boyd Baumgartner · Post by **Boyd Baumgartner** » Tue Oct 23, 2018 10:34 am

I queried our Morpho Reporting tool and found for 2017, we ran 186 Palms and 213 Fingers (399) through the closed search process. We got 11 palm hits and 9 finger hits for a combined hit rate of 5%. I can't tell you where in the workflow they were launched (Upfront to subjects in the case or Post Non-Closed Search)

Closed searches represented 4.65% of the 4574 total searches we launched

Steve Everist · Post by **Steve Everist** » Wed Oct 24, 2018 9:07 am

Boyd Baumgartner wrote: ↑Tue Oct 23, 2018 10:34 am I queried our Morpho Reporting tool and found for 2017, we ran 186 Palms and 213 Fingers (399) through the closed search process. We got 11 palm hits and 9 finger hits for a combined hit rate of 5%. I can't tell you where in the workflow they were launched (Upfront to subjects in the case or Post Non-Closed Search)

Closed searches represented 4.65% of the 4574 total searches we launched

As one of the few people who used this feature, I can tell you that mine were not run upfront. They would have been searches done for the more ambiguous prints left in the case to see if they could be located instead of calling them inconclusive (due to ambiguity), which is one of our conclusion options (when a print isn't located and doesn't meet our exclusion guidelines to be excluded).

NRivera · Post by **NRivera** » Wed Oct 24, 2018 9:12 am

That is a lot of AFIS searches compared to our shop. What is your "suitable for AFIS"criteria, if you have one? How many latents of value total did you have for that same year? We tallied just shy of 5,300 latents in 2017 but only a fraction of those were run through AFIS.

Boyd Baumgartner · Post by **Boyd Baumgartner** » Wed Oct 24, 2018 11:05 am

In 2017 we worked 4151 Cases with a total of 14346 Impressions of Value (AFIS or Subject Value) , 1478 Individualizations that came directly from AFIS (approximate hit source = 80% local, 19% NGI, 1% WIN), 29128 Total Evaluations and 1238 distinct subjects Identified.

Our current minimum AFIS standards are:

Required Run:
8-12 objective features with additional non galton data and known area/orientation
OR
12 Objective Features with known area/orientation without additional non-galton data
OR
16 objective features and unknown area/orientation

Examiner's discretion:
8-12 objective features with no additional non-galton data

Edit: I forgot to mention that impression to be run is likely to be part of the capture of a standard ten print/palm card.

antonroland · Post by **antonroland** » Thu Oct 25, 2018 2:38 am

Can anyone tell me why those hit scores are actually even visible in AFIS software?

What purpose do they serve other than the source of discussions such as these?

CLPEX Chat Board

AFIS Scores

AFIS Scores

Re: AFIS Scores

Re: AFIS Scores

Re: Video proof

Re: AFIS Scores

Re: AFIS Scores

Re: AFIS Scores

Re: AFIS Scores

Re: AFIS Scores

Re: AFIS Scores

Re: AFIS Scores

Re: AFIS Scores

Re: AFIS Scores

Re: AFIS Scores

Re: AFIS Scores