Overview
In recent years, algorithmic-driven policies and recommendations have become a ubiquitous and integral part of our society, from online shopping to job screening. Driven in part by this transformative change, the academic literature on optimal policy learning has flourished. The increasing availability of granular data about individuals at scale means that the use of these new methodologies will only continue to grow.
This investigatory study focuses on safe policy learning for pre-trial risk assessments. The study defines policies as “safe” if they do not lead to worse outcomes than the status quo on average. Pre-trial risk assessment instruments (RAIs) may help judges decide whether to release with different levels of conditions or detain an arrested individual before trial by providing simple scores classifying the risk of their failing to appear at court or committing a crime. Among other reasons, because arrestees are presumed innocent, it is important to avoid unnecessary incarceration. Pre-trial RAIs have become increasingly prominent in criminal proceedings, used in at least 23 states and affecting the lives of thousands every day.
The study proposes that existing studies of RAIs have been asking the wrong question: they focus on how RAIs have impacted the rate of failure to appear or pre-trial crime, when they should investigate how RAIs improve or worsen judges’ decision-making. Unlike a pill, the intervention – the RAI in this case – is not given to the person whose behavior it is meant to change, but rather a third party, the judge. Thus, the effectiveness of the intervention must be measured by changes in judicial decisions. But to evaluate such changes in a randomized controlled trial would require randomizing RAI scores given to judges and measuring their responses, a prohibited act as it entails deliberately providing judges wrong information.
How can we learn and create better policies on algorithmic assessment? Unfortunately, prior methods for policy learning are not applicable because they require existing policies to be “stochastic” (meaning they vary randomly) rather than deterministic (meaning each set of inputs produces only one output value). This study develops a robust optimization approach that partially identifies the expected utility of a policy, and then finds an optimal policy by minimizing the “worst-case regret” (meaning the lowest possible utility). The resulting policy is conservative but has a statistical safety guarantee, allowing the policy-maker to limit the probability of producing a worse outcome than the existing policy. The researchers extend this approach to common and important settings where humans make decisions with the aid of algorithmic recommendations, such as judges using RAIs. We derive new classification and recommendation rules that retain the transparency and interpretability of the existing instrument while potentially leading to better overall outcomes at a lower cost.
Pre-trial Risk Assessment
The study’s methodology is motivated by a popular pre-trial risk assessment instrument called the Public Safety Assessment (PSA), used in Dane County, Wisconsin, along with many other places.
The PSA consists of classification scores based on the risk that each arrestee will engage in three types of risky behavior: (i) failing to appear in court (FTA), (ii) committing a new criminal activity (NCA), and (iii) committing a new violent criminal activity (NVCA). Judges abalance between these risks and the cost of incarceration when making their pre-trial release decisions.
The PSA consists of separate scores for FTA, NCA, and NVCA risks, based on 9 risk factors. Importantly, the only demographic factor used is the age of an arrestee, and other characteristics such as gender and race are not used. The other risk factors include the current offense and pending charges as well as criminal history, which is based on prior convictions and prior FTA. Each of these scores is constructed by taking a linear combination of underlying risk factors and thresholding the integer-weighted sum. For the sake of transparency, policy makers have made these weights and thresholds publicly available (see https://advancingpretrial.org/psa/factors).
Developing a Safe Policy
The study’s primary goal is to construct new algorithmic scoring and recommendation rules that could lead to a higher overall expected utility than the status quo rules, while retaining a high level of transparency, interpretability, and robustness. While there are many factors besides the risk assessment instruments that affect the judge’s decision and the arrestee’s behavior, the study focuses on changing the existing algorithms rather than the other factors.
The researchers develop their optimal safe policy approach in two parts. To construct a safe policy in the population, i.e., with an infinite number of samples, they analyze the population optimization problem that constructs a safe policy. Then, they give concrete examples to build intuition before describing our methodology in greater generality. Finally, they develop several theoretical properties of our approach.
Because we do not have access to an infinite amount of data, we cannot compute the population safe policy. Instead, the researchers show how to learn an empirical safe policy from observed data of finite sample size.
Applying the Safe Policy to Pre-trial Risk Assessment
The study applied this optimization methodology to the PSA. Given the highly technical nature of the findings, they will not be summarized in detail here. However, they are relevant for policy makers and technologists as they suggest an opportunity for improving the existing scoring rules.
Future Research
There are several avenues for future research. First, on implementation choices under the proposed approach: while we consider several representative cases, there are many other structural assumptions that would lead to different forms of extrapolation.
Second, we can use similar statistical tools to create tests of safety for given policies. By creating a worst-case upper bound on the regret of a policy relative to the status quo, we can test for whether the proposed new policy is an improvement over the existing status quo.
Third, there are many ways in which optimal algorithmic recommendations may differ when considering long term societal outcomes rather than short term ones. For example, pre-trial detention driven by a risk assessment recommendation may alter the long term behavior and welfare of an arrestee. Understanding how to design algorithms when they affect long-term future outcomes is key to ensuring that recommendations do not take a myopic view. One potential way to incorporate long term outcomes may be with the use of surrogate measures.
Finally, within the robust optimization framework, the notion of “safety” can be considerably expanded. In this paper, we consider policies to be safe if they do not lead to worse outcomes on average; however, this does not guarantee that outcomes are not worse for subgroups.