Of Scores and Simulations
Evidence-based practices in criminal law proceedings are rapidly gaining traction. Some high-level officials and scholars have voiced concerns about their use in sentencing, as the empirical turn in judicial decision-making continues to gain momentum. A significant portion of the A2J Lab’s mission is to take the practice community’s concerns seriously and then rigorously evaluate solutions to procedural problems.
One of our signature studies, soon to be launched in the field, will examine a pretrial risk assessment tool. The stakes are no less serious at that stage than at sentencing. Unwarranted incarceration before conviction can have a number of staggering consequences, including interfering with arrestees’ livelihoods, disrupting family relationships, and straining local law enforcement resources. Risk assessment tools hold the promise of mitigating those harms by scoring an arrestee’s propensity for failing to appear or recidivating. Only those prone to misbehave should be kept behind bars; everyone else should be released (perhaps with supervision) until disposition.
In the forthcoming RCT, we will randomize whether judges at initial appearance receive the results of a pretrial scoring mechanism. In an RCT under consideration at another site, we would double randomize risk score provision: some judges receive it in half of their cases, whereas others never or always receive it.
The risk scoring mechanism doesn’t just spit out numbers; it also offers a recommended course of action. Our objective is to determine whether failure rates decrease significantly conditional on using the scores. As we look ahead to eventual data analysis, I have conducted several simulations that speak to a threshold question: what is the chance that we would observe a significant effect assuming that the risk score mechanism generates one? This is not a practical question. It’s a statistical one and leads us on a quest for power. (No, not this kind.)
The simulations, which are the basis for future scholarship, have already taught us valuable lessons. The first is that pretrial risk scores should be most useful when the arrestee population is divided starkly into two groups, one not at risk of failing and one extremely like to foul up. Such a population might look like this histogram. (Click on any thumbnail for a larger image.)
I then model the risk score as helping judges separate the signal from the noise. Even judges, human as they are, will err. Or arrestees (quite obviously) might have better information than judges about their next move. The risk score cuts through the informational thicket so that only the safe bets are released. After the simulation runs (and I mean runs . . . 18,000 times per set of assumptions), we get something like this figure:
Applied statisticians usually declare victory when power is at least 80%. This simulation, with 4000 made-up arrestees, didn’t make the cut. (The highest power achieved is about 22% when we posit that the risk score reduces noise by 90%.) If we keep the same assumptions about risk profiles but increase the number of arrestees to 25,000, the story is much different in the next figure.
Voilà! We have power. But we would need to randomize a jurisdiction with 8000 arrestees per year for three years and hope that risk scores are incredibly effective in reducing informational static to get the statistics on our side.
The point of this work is much less to pass judgment on pretrial risk assessments or on a field experiment involving them. Rather, our simulations point out the sorts of questions that researchers and policymakers should consider before rendering their own verdicts. Based on the math, risk scores will lead to more measurable effects on criminal procedure when: (1) judges are less able to accurately assess risk themselves; (2) arrestees are quite certain to either behave or recidivate after initial appearance; and (3) case volume in a jurisdiction is very high. We have yet to launch the upcoming RCT, much less analyze the results. But these purely hypothetical simulations already have us—and hopefully you—thinking harder about how to assess pretrial risk assessment scores.