Who shaves the barber? - Bertrand Russell
Bertrand Russell, British mathematician and philosopher, in challenging a popular theory of logic of the day known as set theory, showed that it suffered from a self-referential paradox. Namely, that there is no such thing as a set of all sets because it would necessarily need to, but could not contain itself.
It was stated as follows:
This is a paradox, because it is a situation that cannot exist."Imagine a town where there is a rule: everyone has to be clean-shaven, and the barber is the only person who can shave you. But the barber has a special rule: he only shaves people who do not shave themselves.
So, everyone in the town has to be clean-shaven, but no one can shave themselves.
So, who shaves the barber?"
This same paradox raises its head from time to time, however it is reformulated not in the domain of mathematical set theory but in the socio-political sphere. No doubt you’ve read a newspaper headline intimating something like ‘Who polices the police?’ And now it seems that this dilemma has spilled over into the socio-forensic disciplines.
Who examines the Examiner?
These questions pose the same problem as Bertrand Russell’s Paradox of the Barber, an infinite regress. It takes sense and turns it into nonsense very quickly. Let’s take a look...
If I say, ‘I’m an expert’, then who is qualified to judge my expertise? And if you admit that there is someone qualified to judge my expertise, who is the person that qualifies to judge the expertise of the expert on expertise? I won’t belabor the example, but you get the point. Sense making becomes nonsensical exponentially. This is the essence of infinite regress. It poses solutions which cause problems which need solutions that cause problems that never resolve. In the case that is about to be discussed we’d call this a grift, because it involves a stream of money and prestige in which the same people causing problems offer solutions, write papers, form committees which find new problems to which they offer solutions, write papers and form committees, blissfully unaware that there are people laughing at the proverbial new clothes of the emperor.
Considering this is a post about fingerprints, let’s get to it. In 2009, it came to light that the Houston Police Department’s (HPD) fingerprint unit was having some problems, some of them potentially criminal.
https://www.chron.com/news/houston-texa ... 622998.php
Houston hired an outside auditor, a company by the name of Ron Smith and Associates (RSA/ RS&A). The auditor was hired to check the work of the people thought to be involved in problematic casework. The scope of the contract was extended to work down a backlog of about 6000 cases to the tune of around 5 million dollars.
The written findings of the audit were published by the City of Houston.
https://www.houstontx.gov/police/audit/ ... tPrint.pdf
The findings of the initial audit state:
“It should be noted that, generally speaking, the most significant error which can be found in friction ridge comparisons is an “erroneous identification”, which is the incorrect determination that two areas of friction ridge impressions originated from the same source."
The findings go on to say:
“Based upon the previously established criteria, there were however a significant amount of technical errors which may, or may not have had an impact on the investigations which were represented by these cases”
The technical errors as defined by Ron Smith and Associates included:
“Cases which were reported as not being ‘sufficient for further analysis’, when in fact they did indeed contain latent prints which were sufficient for comparison purposes.” (we’ll come back to this)
It appears that the auditor examines the examiner.
Who audits the Auditor?
During the period in which Ron Smith and Associates was performing backlog reduction, they were asked to compare prints as part of a review of a cold case from 2001. A case that ultimately went to appeals based in part on the fingerprint evidence and ultimately resulted in an ethics investigation.
A longer version of the case synopsis given as part of the appeal can be found here:
https://cases.justia.com/texas/first-co ... 1499127228
An abbreviated synopsis of the case is as follows:
2001: a murder of a woman known to be a prostitute whose body was found in an alley with what was presumed to be bloody fingerprints both on and near her body. Houston Police Department responded to the scene and were able to enhance and preserve the latent prints via photography as well as preserve the portion of the actual item that contained the palm prints (a metal pole). Fingernail clippings and semen stains were also preserved by the Harris County Medical Examiner. HPD investigated the case and exhausted the leads with no viable suspects
2006: DNA from the fingernail clippings consisted of a mixed sample with a major and minor contributor composed of two males. DNA from the semen had a single contributor. DNA profiles were created by an outside laboratory.
2009: CODIS was searched using the DNA profiles and two individuals were identified, one by the name Joseph Webster from the mixed sample taken from the fingernail clippings, one by the name Lorenzo Jones from the semen stains. Both individuals admitted to soliciting prostitutes but denied being involved in the murder.
2010: HPD Cold Case Unit requests HPD Latent Print Unit to compare 51 individuals including the 2 individuals developed as part of the CODIS hits. No identifications were found.
2011: Ron Smith and Associates as part of the contract work for the city of Houston is asked to re-examine the prints and compare them against the 51 individuals again. No identifications were found.
2012: Detective Holbrook, the original detective on the case in 2001 returns to HPD homicide, re-reviews the case and per the appeal “instructed Ron Smith to reexamine their prints, believing that the bloody print was of sufficient quality to render an identification”. (We’ll come back to this statement as well). Ron Smith and Associates find similarities, ask for better quality prints from Webster and subsequently make an identification.
2013: HPD asks for the poles containing potential blood and bloody prints to be processed for 1) the confirmation of blood 2) DNA samples. Blood was not confirmed, nor were DNA profiles obtained.
2015: DNA profile taken from Webster was compared to the minor contributor DNA profile from the fingerprint clippings produced in 2006. Webster could not be ruled out as a contributor, with the chance that the DNA profile came from a random person at about 0.43% (1 in 230)
2015: A new DNA profile is obtained from the fingernail clippings using an advanced testing technique and compared to Joseph Webster. The result is that Webster could not be ruled out as a contributor, with a chance that a random person could have the same DNA profile at about 0.000015% (1 in 68 million)
As a result of the DNA found underneath Herbert's fingernails and the identification of Webster as the source of the bloody palm-print, Webster was indicted and tried for Herbert's murder.
During the trial, the State presented evidence that:
• Herbert was a prostitute who worked downtown, and that Webster admitted to possibly having sex with her;
• Herbert's body was found in a narrow and secluded gated alleyway that would not normally be used by the public—a location where someone might reasonably be expected to lure a prostitute he intended to kill;
• the DNA underneath Herbert's fingernails was Webster's—indicating that Herbert attempted to defend herself from Webster or at least had direct physical contact with him before she died;
• the palm-print found next to Herbert's bloody body was Webster's—placing Webster at the scene of the crime, crouched right in front of the narrow stairwell and Herbert's body; and
• the palm-print was made in blood—indicating that Webster moved or had some sort of contact with Herbert's body after she was dead.
Webster was convicted.
Upon appeal, Webster’s counsel argued that the sufficiency of the evidence was inadequate to convict. Specifically, as it relates to the latent print evidence, Webster argued that the examiners at HPD and the RSA analysts suffered from bias since they missed the identification the first time and only made an identification after they were allegedly aware of the DNA hit to Webster.
A complaint was submitted to the Texas Forensic Science Commission (TFSC).
As a result of the allegations of bias by Webster’s counsel, an ethics investigation was submitted to the Texas Forensic Science Commission. The complaint was accepted for investigation at the 4/21/2022 meeting.
The Commission offered to:
“evaluate the integrity and reliability of the forensic analysis, offer recommendations for best practices in the discipline of latent print analysis, and make other appropriate recommendations.”
https://www.txcourts.gov/media/1455318/ ... 22-1-1.pdf
This is what the Commission is tasked with doing by law, although arguably the more thorough version of what they were tasked with would be more ethically rigorous. It comes across more as an ethnography than an investigation.
The Commission is charged with the preparation of a written report that contains:
It appears the Commission audits the auditor.(1) observations of the commission regarding the integrity and reliability of the forensic analysis conducted;
(2) best practices identified by the commission during the course of the investigation; or
(3) other recommendations that are relevant, as determined by the commission.
Who does the Commission commission?
The TFSC had provided relatively scant information on the state of the ethics complaint other than brief updates from quarterly meeting minutes posted on the site. One such document from a 09/15/2022 meeting states:
“The Commission will contract with Friction Ridge expert Glenn Langenburg to assist the Commission in the development of its report that will highlight key issues and recommendations that address the historical background and evolution of the friction ridge discipline, identify methods for avoiding cognitive bias in the discipline, and recommend changes for a positive impact on the field in Texas and nationwide.”
https://www.txcourts.gov/media/1455833/ ... 1522-3.pdf
Then just recently, on 10/20/2023, the Forensic Science Commission released a video of their quarterly meeting discussing a forthcoming 73 page full report on the complaint. In the video they thank:
- Glenn Langenburg
- Henry Swofford
- OSAC Program Office
- Texas DPS Latent Print Advisory board
It appears that the Commission commissions experts. It seems we have come full circle.
How do we investigate the ethics of an ethics investigation?
Since we’ve come full circle with the inclusion of new experts who have been brought in to scrutinize the ethics of the original experts, we’re in infinite regress. Who will scrutinize the ethics of these experts? Let’s put the microscope on the experts before we address the complaint. Surely, we wouldn’t want an ethics investigation to have compromised ethics, right?
Glenn Langenburg teaches for Ron Smith and Associates. Someone who gets paid by an agency under investigation and who would stand to risk losing payment or prestige, necessarily presents at the very minimum the appearance of an ethical dilemma. He does however, have at least 13 years of Latent Print Examination under his belt working in an agency.
https://www.ronsmithandassociates.com/p ... urg_CV.pdf
Henry Swofford is more of a bureaucrat than a bench examiner. If you look at his resume it’s not apparent that he has any significant examination experience, with the majority of his forensic examination experience coming from Blood Alcohol Content testing. While he has held chair positions on standards boards, management positions, and pursued research, he seems like an odd pick for an actual analysis.
https://dfs.dc.gov/sites/default/files/ ... ord_CV.pdf
This is evident in a leaked document associated with the case which states:
“Henry Swofford did an informal review of the palm print on the pole and could not identify it as belonging to Webster.”
Why wouldn’t Swofford perform a detailed analysis of the print? Why is his inability to identify the print indicative of anything? The answer as stated above is because this isn’t a person who has spent any significant time looking at fingerprints, let alone complex ones in dispute. The fact that he qualifies, let alone holds certifications from the International Association of Identification in Latent Fingerprints, Footwear and Crime Scene Investigation is a discussion for another time, but it would seem that the IAI is little more than a puppy mill for forensic certifications.
Mr. Swofford seems to be at odds with himself for what he deems appropriate techniques. From the leaked document:
“Mr. Swofford analyzed the quality of the palm print itself using the newly adopted best practice recommendations made by the OSAC Friction Ridge Subcommittee in September 2020. He took a copy of the latent print and feature markings of the original RSA examined and analyzed the quality using LQMetric software.”
Compare what he did, namely use LQMetric software, with what he’s published about LQMetric Software. According to his Ph. D thesis, LQMetric is:
https://serval.unil.ch/resource/serva ... 01/REF.pdf“...geared entirely towards optimizing or predicting AFIS match performance rather than focused on assessing local ridge clarity (discernibility of feature data) and predicting human performance using image quality attributes considered by human analysts during manual comparisons. Consequently, these types of predictive models are often based on the aggregate of qualitative and quantitative attributes of the entire impression to provide a single estimate of utility or quality. These approaches often lack transparency and often do not necessarily correspond to the same features considered by human analysts during traditional examinations. The motivation behind this focus is largely driven by industry desires to optimize the performance of AFIS in a “lights-out” environment. "
He’s saying two things here. First, that a holistic metric is inappropriate given that the algorithm is measuring AFIS viability which stands in opposition to what Examiners look at, namely individual features. Secondly, such tools and models are biased towards the performance of a product, not assessing the quality of a print, which is what the tool purports to be. However, the thesis is also offering a product, one that directly competes with LQMetric. Would it be fair to characterize Swofford for also having a motivation to focus on an industry desire to optimize his proceedings from the licensing of a software product?
Salesmanship is not a new concept when it comes to Swofford and Langenburg however. They both have a financial incentive outside their use as consultants in this case. Each has a statistical model, which should they be used to reconcile a dispute, would act as a pseudo-validation of a statistical model that could then be incorporated into a ‘best practice’ paradigm, that is to say there is something to be gained by means of bureaucracy, not science.
Swofford model (FRstat): https://forensiccoe.org/statistical-int ... ns-frstat/
Langenburg model (Xena) https://assets.publishing.service.gov.u ... ite_up.pdf
There have been accusations of pseudoscience and snake oil between the two camps previously.
https://docplayer.net/170998794-Defence ... rstat.html
Furthermore, given Swofford’s seat on the Friction Ridge Subcommittee and the American Standards Board of the AAFS his influence in defining ‘best practices’ is problematic at best. Is it any surprise that statistical models have made their way into the conversation even though no model has been validated for use or demonstrated to make more accurate conclusions?
Even the latent print community has grown skeptical of what in effect are bureaucrats pushing agendas for personal gains. In a recent request-for-comments period by the American Academy of Forensic Science Standards Board, the community pushed back when statistical models were proposed.
“May use statistical or probabilistic systems is meaningless at this point in time. Does this mean FRStats or Xena? Could it mean an 8 point standard versus a 12‐point standard? Until a published, peer‐reviewed, validated, and accepted "probabilistic system" has been recognized, it should not be left wide open to interpretation.” (pg 27 of 55)
https://www.aafs.org/sites/default/file ... und01.pdf
Did you catch that? A standard was being proposed that was so vague that non-validated statistical models developed by people with influence over the standards board could be admitted. This is not science, or best practices, it’s lobbying.
So an investigation into an ethics violation of experts reveals at least the potential to have an ethics violation of experts through perverse incentives, namely personal gain even at the expense of truth.
Complaints about the complaint
Next, we need to look at the complaint. We already know that the complaint alleged the conclusion reached was not reproducible and therefore not reliable. With the allegation that there was bias based upon the DNA hit to Webster.
It’s out of fashion to blame examiners and fire them for mistakes. If we’ve learned anything from past high-profile errors like the Brandon Mayfield case, they will blame 1) the images, 2) the process and 3) a boogeyman like bias that is unmeasurable but somehow in everything, everywhere, all at once and only retroactively inferred.
From the Mayfield case:
“The FBI issued a rare public apology after Mayfield’s release — but maintained the error was due to the low resolution of the print.”
“Second, the OIG examined whether the FBI's verification procedures contributed to the error. FBI procedures require that every identification be verified by a second examiner.”
https://oig.justice.gov/sites/default/f ... /final.pdf (pg 10)
Everywhere:“However, whether Mayfield's religion was a factor in the Laboratory's failure to revisit its identification and discover the error in the weeks following the initial identification is a more difficult question.” (pg 12)
“However, under procedures in place at the time of the Mayfield identification, the verifier was aware that an identification had already been made by a prior FBI examiner at the time he was requested to conduct the verification. Critics of this procedure assert that it may contribute to the expectation that the second examiner will concur with his colleague.” (pg 10)
All at once:
“The OIG found that a significant cause of the misidentification was that the LPU examiners' interpretation of some features in LFP17 was adjusted or influenced by reasoning "backward" from features that were visible in the known prints of Mayfield. This bias is sometimes referred to as "circular reasoning," and is an important pitfall to be avoided.”
https://oig.justice.gov/sites/default/f ... final.pdf
So what do we see in the ethics investigation?
According to the quarterly meeting:
And“TFSC staff forwarded the complaint to RS&A, and they raised questions about the quality of images the blind examiners were given”
“RS&A maintained ‘any competent examiner’ would reach an identification conclusion so there must have been an issue with the images the blind examiners used"
We know that there were two sets of images taken of the latent prints, the film prints taken at the scene of the crime and digital images taken by the DEA who had the actual poles on which the prints were found.
The TFSC’s presentation provides a red herring in a sense when they state:
“The other set of photos of L-1 were taken by the DEA using digital photography. Digital photography was in its infancy at the time and the images were taken at a low resolution 384 ppi. (Today’s standard is 1000 ppi)"
This statement is ignorant of digital imaging however and represents the problem with an appeal to ‘best practices’ when those practices are made by fiat and not evidence. Let’s take a look. Since the TFSC thanks The Organization of Scientific Area Committees for Forensic Science (OSAC), we’ll go to OSAC’s digital imaging standards and look at what they say about resolution.
5.1 The procedure described in this document is in accordance with current SWGFAST guidelines (6), as well as National Institute of Standards and Technology (NIST) standard (7), which specify 1000 pixels per inch (ppi) at 1:1 as the minimum scanning resolution for latent print evidence. This standard appears primarily to be historical and directed towards scanners, rather than cameras, though recent studies suggest that it is suitable for capturing Level 3 detail (8).
“5.2 While the 1000 ppi resolution standard permits the capture of level three detail in latent prints, it does not mean that any image recorded at a lower resolution would necessarily be of no value for comparison purposes. Such an image could have captured level two details sufficiently for comparison.”
https://compass.astm.org/document/?cont ... -US&page=1
So, the standard is historical, pertains to scanning and pertains primarily to the inclusion of level three detail, which would most likely not even be present in a viscous blood print.
This is an appeal to historicism and not science.
The TFSC presentation states:
“The report emphasizes the importance of a linear sequential approach to the ACE-V process in which documentation of the features in the questioned impression occurs prior to an examination of a known exemplar.”
It’s fashionable to say this, and was stated both in the OIG report on the Mayfield error and in the PCAST report but is it true? Like, based in science, true?
This very topic was researched in 2015 in a paper titled ‘’Changes in latent fingerprint examiners’ markup between analysis and comparison’.
The paper admits:
“.... the details of how to document analysis and comparison are mostly unspecified, and SWGFAST's standards are unenforced, leaving the details to be sorted out by agency standard operating procedures or by the examiners’ judgments.”
The discussion of the paper finds no correlation between change in markup and error nor error rate different from that of the black box study.
“We observed frequent changes in markups of latents between the Analysis and Comparison phases. All examiners revised at least some markups during the Comparison phase, and almost all examiners changed their markup of minutiae in the majority of comparisons when they individualized. However, the mere occurrence of deleted or added minutiae during Comparison is not an indication of error: most changes were not associated with erroneous conclusions; the error rates on this test were similar to those we reported previously” 
https://noblis.org/wp-content/uploads/2 ... _Final.pdf
The footnote associated with that quote is from a study titled, Repeatability and Reproducibility of Decisions by Latent Fingerprint Examiners
https://journals.plos.org/plosone/artic ... ne.0032800
There is a sleight of hand in the complaint and the investigation if you look closely, namely the allegation of bias. The allegation states that the bias is due to Webster being identified via a DNA hit and NOT from a non-blind verification or from circular reasoning via the exemplars, as was the allegation in the Mayfield error. There are two problems with this. First, RS&A missed the identification the first time when Webster was included with the 51 subjects. If inclusion of Webster was biasing, then the implications are twofold: 1) any investigative lead into a suspect developed by a detective is biasing and 2) overwhelming the examiner with 51 names was more biasing because the identification was missed. Neither of these issues are addressed, nor is the mechanism of how getting a DNA hit suddenly makes minutiae appear that weren’t there.
“The complaint incorporated information that the crime scene mark was submitted to other examiners along with relevant known exemplars in blind examinations and the blind examinations reached an ‘inconclusive’ conclusion” stating that, “The blind examiners include an independent latent print examiner and two HFSC latent print examiners”
The TFSC presentation says that they took into consideration some national reports on forensic science and ‘pertinent empirical research in friction ridge examination’ such as the Noblis black box study. So what does ‘the pertinent empirical research’ say about blind verifications?
Before we get to the Noblis study, what does Glenn Langenburg’s own research into bias state?
“The results showed that fingerprint experts were influenced by contextual information during fingerprint comparisons, but not towards making errors. Instead, fingerprint experts under the biasing conditions provided significantly fewer definitive and erroneous conclusions than the control group.”
https://onlinelibrary.wiley.com/doi/abs ... 09.01025.x
Unless the blind examinations were provided in the context of an unworked case (which is going to be a near impossibility given that the case was decades old and the use of film, CDs, old reports, etc, would have been included), the examiner would have been given a comparison they knew would be scrutinized and therefore is introducing bias. According to Langenburg’s own research, when this happens, ‘significantly fewer definitive and erroneous conclusions’ are the result. Read ‘inconclusive’.
This is exactly what the blind examinations found, meaning it was as equally likely a methodological error on the TFSC’s part in introducing blind examinations in this fashion and not necessarily a problem with RS&A or the print.
Further highlighting the deficiency of the investigation is the following research which shows that collaboration reduces bias in fingerprint conclusions. This as opposed to pitting conclusions against each other vis-a-vis Examiner vs Blind Examiner and thereby reifying reproducibility. Pooling decisions highlights a wisdom-of-the-crowds approach to complex comparisons.
From the article:
“That is, the pooling of decisions systematically decreases the number of situations where decision-making agents disagree and increases the number of situations where they agree. Pooling decisions thus also reduces outcome variation between decision-making systems at the level of individual cases.”
https://www.sciencedirect.com/science/a ... via%3Dihub
Even previous research, what is commonly referred to as the Noblis White Box study recognizes that:
“Blind verification can be expected to be effective in detecting most errors and flagging debatable decisions, and should not be limited to individualization decisions.”
“Examiner assessments of difficulty may be useful in targeted quality control, which could focus on difficult decisions: operating procedures could provide means for an examiner to indicate when a particular decision is complex. Quality control measures, however, should not focus solely on difficult decisions, since even easy or obvious decisions were not always repeated or reproduced.”
“Metrics derived from the quality and quantity of features used in making a decision may assist examiners in preventing mistakes, and in making appropriate decisions in complex comparisons. Such metrics may be used to flag complex decisions that should go through additional quality assurance review and in arbitration of disagreements between examiners.”
“Procedures for detailed documentation of the features used in analysis or comparison decisions could be used to assist in arbitrating inter-examiner disagreements at the feature level.”
“Repeatability and reproducibility are useful surrogate measures of the appropriateness of decisions when there is no “correct” decision, as when deciding between individualization and inconclusive. The reproducibility of decisions has operational relevance in situations where more than one examiner makes a decision on the same prints. Reproducibility as assessed in our study can be seen as an estimate of the effects of blind verification– not consulting or non-blind verification.”
The last quote being especially pertinent given that the TFSC explicitly states in the presentation that:
Simply stated, there is no ‘correct’ decision as when deciding between individualization and inconclusive as explicitly stated in the above study. If ever there were a prescription, this is it and the TFSC and their ‘experts’ expert’ failed miserably.“Important to note neither the complaint nor the report allege the identification was wrong, as we do not have ground truth for the sample”.
Based upon best practices informed by the research, blind examinations were an inappropriate investigative method here.
The Final Analysis:
“Incompetence annoys me. Overconfidence terrifies me.” -Malcom Gladwell
The TFSC stated that the result of their investigation was to:
“...identify methods for avoiding cognitive bias in the discipline and recommend changes for a positive impact on the field in Texas and nationwide.”
A couple things to note. The TFSC is not a standards board, a research arm, nor do they employ any means of funding for research. Furthermore, if these goals are attainable, you would think they would be requirements of accreditation or licensure of experts in the state of Texas considering that the TFSC has been responsible for both since 2015. Lastly, that the TFSC thinks they could actually accomplish such discipline-wide goals while clearly being illiterate in current scientific literature, lacking funding for additional research and by choosing to drape itself in snake oil experts and fashionable policy prescriptions comes across as pathological.
It would seem that the TFSC is more concerned with the appearance of competence than with its actual practice. Not even mentioned in the presentation is the fact that the cold case detective is stated to have told Ron Smith and Associates that the impression could be identified. If bias was a concern, one would think that having police staff dictate what is and is not of identification value is quite concerning. Especially given that the TFSC is in charge of licensing experts. This was explicitly stated in the Appeals record when it stated:
2012: Detective Holbrook, the original detective on the case in 2001 returns to HPD homicide, re-reviews the case and per the appeal “instructed Ron Smith to reexamine their prints, believing that the bloody print was of sufficient quality to render an identification”.. Ron Smith and Associates find similarities, ask for better quality prints from Webster and subsequently make an identification.
Additionally, we know an error was made, as it was the very type of error that Ron Smith and Associates documented in its audit of the Houston Police Department.
“Based upon the previously established criteria, there were however a significant amount of technical errors which may, or may not have had an impact on the investigations which were represented by these cases”
The technical errors as defined by Ron Smith and Associates included:
“Cases which were reported as not containing any latent print identifications in which there was a latent print identification” (pg 4)
In summary, an ethics investigation sidestepped the question of truth by hiring experts with questionable credentials and perverse incentives to ignore the science in order to publish a self-congratulatory work which overlooks deficits which it has the authority to change through accreditation and will have zero effect on the fingerprint industry other than contempt for those that wrote it. While it comes across as the house that Jack built, it is really a house of cards.
When magical thinking is involved, anything is possible.