Fear and Loathing over Risk Assessments Part 2

How Should We Think about Racial Disparities?

In a previous post, I considered some of the less convincing critiques of pretrial and sentencing risk assessments that sound in the ecological fallacy. The fallacy argument mistakenly targets risk scores as applying only group inferences to individual case decision-making. The takeaway was straightforward. A comprehensive understanding of actuarial tools must include rigorous counterfactual thinking about a state of the world in which they aren’t available. In this follow-up, I discuss an even more serious claim: that actuarial tools might lead to unjustifiable racial disparities in criminal justice outcomes.


The ProPublica piece to which I linked before focuses on the troubling implications of racial imbalances in scores and predictive accuracy. The article’s opening vignettes compare a black teenage defendant with no prior record who stole a bicycle with a middle-aged white male who stole hardware from a Home Depot. Importantly, he had prior armed robbery convictions, whereas she had no record. The proprietary scoring algorithm known as COMPAS deemed the young girl a high-risk individual and her older counterpart a low-risk one. And yet: “Two years later, we know the computer algorithm got it exactly backward. [The girl] has not been charged with any new crimes. [The man] is serving an eight-year prison term for subsequently breaking into a warehouse and stealing thousands of dollars’ worth of electronics.” (emphasis added) Errors of this sort–what statisticians call Type I and Type II, respectively–deserve further scrutiny. They are, after all, merely anecdata.

Four researchers set out to crunch the numbers and published both an academic study and the general-audience piece. Using data from Broward County, FL, their initial analysis showed that black and white defendants received different COMPAS scores, even adjusting for other defendant characteristics. (Click on the image to enlarge.) methodology-risk-of-recidivism-scores-by-race-900-363-482d1cA number of reasons, varying in legitimacy, might explain the differential. For example, COMPAS might rely on inputs that themselves are tainted by systemic racism. But that does not mean that the risk assessment is biased. Justifying that claim requires comparing recidivism outcomes by race, conditional on the risk score assigned. For example, if white and black defendants assigned to a moderate risk score recommit crimes at the same rate, we would say the mechanism does not introduce racial bias.

The study authors then turn to these more relevant conclusions. The observed that COMPAS “correctly predicted an offender’s recidivism 61 percent of the time, but was only correct in its predictions of violent recidivism 20 percent of the time” and that it “correctly predicted recidivism for black and white defendants at roughly the same rate (59 percent for white defendants, and 63 percent for black defendants) but made mistakes in very different ways. It misclassifies the white and black defendants differently when examined over a two-year follow-up period.” Results based on simple comparison tests by race were robust to covariates such as criminal history, age, and gender. But are they robust to more exacting interpretations?

Take the ProPublica contingency tables. These numbers ostensibly reveal the extent of Type I and Type II errors in a meaningful way. The mismatch story ensues because high-risk black defendants who did not fail were much more likely to be labeled high-risk than white defendants; the opposite was true for low-risk designations. But as Jennifer Doleac and Megan Stevenson point out, these frequency tabulations carry only limited value. Why? If underlying recidivism rates differ by race, then these ratios necessarily will be misleading.

The reason is purely mathematical. As Doleac & Stevenson remind us, “[i]n a group with high recidivism rates, the numerator will be larger because the pool of people labeled high risk is bigger and the denominator will be smaller because there are fewer people who do not reoffend. The result is that the ratio of these numbers is always larger than it is for low-recidivism groups.” (emphasis added) Much like the inherent problems with reporting odds ratios in empirical work, reporting false positive/negative rates can obscure the issue and impede interpretation. Christopher Lowenkamp and his colleagues adopted a better approach. In part, they ran regressions using variables that reflect precisely the test I mentioned above: “interaction terms between an individual’s race and the [COMPAS] decile score.” Using this framework, Lowenkamp et al. found “no significant differences in . . . relationship between the COMPAS and general recidivism for White and Black defendants. A given COMPAS score translates into roughly the same likelihood of recidivism, whether a defendant is Black or White.” (emphasis added)
I do not mean to suggest–far from it–that we should ignore differential sentencing outcomes by race. (The same is true for all prior procedural decisions in a criminal case.) It is important, though, to distinguish differential outcomes (e.g., different risk assessment scores) by race from bias on account of race. The two are not necessarily equivalent. In addition, measuring bias itself depends on the counterfactual reference point that the researcher or policymaker identifies for comparative purposes. As with the need for counterfactual thinking, puzzling through the relationship between race and risk assessments raises tough questions. The Lab will continue to ask them through our PSA field work and hopefully generate useful evidence in response.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

2 thoughts on “Fear and Loathing over Risk Assessments Part 2

  • Is the tool being used to a) predict recidivism? or b) reduce recidivism? These are different goals/aims. Isn’t it harder to create tools that do reduction oriented risk tools? For example, if race is fixed, a person who is released into a community with better support for reentry (read a wealthier community, with good public transport, lots of support groups, lots of non profits that work with people coming back into society and help with common issues like child support, job counseling, and affordable housing options etc.) will do better than a person of the same race released into a neighborhood with limited or no reentry support systems, limited jobs, limites transporation and little non profit support to deal with ancilary needs like child support, job counseling, and little or no affordable housing options etc. So do recidivism tools take into account the type of neighborhood where those who have served their time are released to? If race impacts the resources in a region–then does that not get back into the recidivism risk assessment tool? This story on NPR illustrated this http://www.npr.org/2017/05/16/528587632/after-6-prison-terms-a-former-inmate-helps-other-women-rebuild-their-lives when Susan talks about how different it is between South LA and Santa Monica b/c Santa Monica has more resources. So if the tool is one to prevent recidivism, how do you take race out of it–if race is such a strong factor in community resources and approach to crime?

    • These are important questions.

      1. The tool is designed to classify defendants by the risk they pose to both public safety (via new criminal activity) and the court (via failures to appear). It’s not intended per se to predict the probability of failure after release as much as provide judges with a relative sense of how “risky” it would be to release the defendant.
      2. The consequential hope among the PSA’s developers is that, by classifying defendants better by their risk of failure (rather than, primarily, the severity of the current charge), the rate of failures to appear and recidivism should both decline (in addition to the length of incarceration before case disposition).
      3. As for the community circumstances, this variation is a great reason why the PSA should be evaluated in a diverse array of states and counties. Some counties have pretrial services to increase the chance of a successful post-release outcome. Many do not. The PSA expects that the court will, for higher-risk defendants, at the very least require phone call or in-person check-in, if not more services.

      At the end of the day, the PSA (but not necessarily ever risk assessment instrument) was developed using a national sample of cases in an attempt to be as inclusive and representative as possible. Race undoubtedly correlates with criminal justice outcomes, but a good risk assessment tool should 1) not use race as risk factor; b) be neutral with respect to how the risk scores predict the relative likelihood of failing post-release; and c) drive all adopting jurisdictions to create more robust pretrial services divisions.

Leave a Reply

Your email address will not be published. Required fields are marked *