Support the A2J Lab

Your support is greatly appreciated!

Read More

In the News

Stay up to date with the latest happenings

Read More

Read the Blog

Keep up with the Lab on our blog

Read More

Guest Post: Evaluating Make It Right

Today’s guest post is authored by Katy Weinstein Miller, Chief of Programs & Initiatives at the San Francisco District Attorney’s Office.

In 2013, San Francisco District Attorney George Gascón launched a new approach to handling juvenile delinquency.  Rather than prosecute young people accused of certain felony offenses, the office began offering them the opportunity to participate in “restorative community conferencing” – a facilitated, community-based conversation with the person they harmed, leading to an agreed plan for addressing that harm.  This model, called Make It Right, is an important step for San Francisco and for the field of criminal justice.  At a time when our juvenile caseload is at historic lows but our racial and ethnic disparities are at historic highs, we need new ways to address crime, promote healing, and make our community safer.

The implementation of Make It Right presented an opportunity – and in DA Gascón’s, view, an obligation – to rigorously research the effectiveness of the program through a randomized control trial (RCT).  Our justice system has long operated based on precedent and gut instinct, with little attention to studying results.  While often at odds, prosecutors, defense counsel and judges have shared a reluctance to engage in research that impacts the way they handle their cases.  To be sure, this is understandable for professionals who have been trained to give each case, and each client, individualized consideration.

RCTs present heightened ethical concerns for justice system stakeholders, particularly for diversion programs.  Random assignment requires us to deny the opportunity for some young people, but not others, to avoid prosecution and potentially alter their life course.  Conversely, it denies some victims the established protections of the court system.  Both our defendants and those they have harmed are disproportionately vulnerable populations.  Our gut tells us that restorative models can yield better outcomes than traditional prosecution – for both the young person and the victim – but without research, we just don’t know if that’s true.  The fact that our system disproportionately impacts vulnerable individuals in high stakes situations should underscore, not undercut, the need to employ rigorous methods to determine what works.

While logistical challenges can often derail RCTs in the justice sector, Make It Right’s design makes it well-suited for random assignment.  Our Juvenile Unit Managing Attorney reviews all of San Francisco’s juvenile cases, promoting uniformity in charging decisions and clarity about Make It Right program eligibility.  Following a three-step process, she determines (1) whether the case is chargeable; (2) whether the presenting offense is eligible for the program; and (3) whether the youth is ineligible to participate due to certain factors (such as geographic limitations and prior record/current probation status).   All cases flagged as eligible for the program are forwarded to our Juvenile Division Office Manager, who uses a randomized block method to assign the case to either treatment or control groups.  In each block of 10 cases, 7 are assigned to treatment, and 3 to control.  If case is randomized into the treatment group, our Office Manager directly refers the case to our nonprofit partners, who offer the program to the young person and victim, and facilitate the restorative process.  If the case is randomized into the control group, the Office Manager prepares the charging documents for filing in court.  The randomization process has yielded an unexpected benefit: because our Managing Attorney can only refer cases that she is prepared to prosecute, it ensures that she is not using Make It Right to “widen the net” of young people involved in our justice system – which is often a negative effect of implementing diversion programs.

For us, the hardest part of the Make It Right RCT is waiting for the results.  Preliminary findings are strongly encouraging – but the small scale of Make It Right means it is taking time to yield statistically significant findings.  The patience required to conduct rigorous research stands in direct contrast to our sense of urgency to reform the justice system – but we know that the results of that research will enable all of us to make more meaningful, effective change.

The Make It Right program is a partnership of the San Francisco District Attorney’s Office, nonprofits Community Works West, Huckleberry Youth Programs, and research & innovation center Impact Justice.  The program is under evaluation by the California Policy Lab at the University of California’s Goldman School of Public Policy.

Previewing and Reviewing Pretrial Risk Assessment RCTs

On Tuesday, Jan. 16 pretrial staff in Polk County, Iowa entered their offices with a slightly different charge. They had been accustomed to perusing a list of arrestees scheduled for first appearance and searching for individuals who qualified for an interview and pre-disposition release. That morning, some staff members continued this time- and resource-intensive practice. Others reviewed administrative records and entered nine risk factors into a new software system that calculates (hopefully familiar to readers of this blog) PSA risk scores. Polk County is the first jurisdiction in Iowa to implement the PSA. Three more counties will join them in the coming months as pilot sites, and eventually the entire state will adopt it.

As the A2J Lab looks ahead to launching its second RCT evaluation of the PSA, we came across a study of its progenitor, the Virginia Pretrial Risk Assessment Instrument (“VPRAI”). When the VPRAI arrived in courtrooms around the state, there was no way to convert risk predictions into actionable release recommendations. (That fact stands in stark contrast to the Decision-Making Framework accompanying the PSA.) The solution was the Praxis, “a decision grid that uses the VPRAI risk level and the charge category to determine the appropriate release type and level of supervision.” Virginia pretrial staff also embraced the so-called Strategies for Effective Pretrial Supervision (“STEPS”) program to “shift the focus . . . from conditions compliance to criminogenic needs and eliciting prosocial behavior.” The combination of these innovations, it seemed, would improve Virginia’s ability to pinpoint risk and reduce failure rates during the pre-disposition period.

Marie VanNostrand of Luminosity and two co-authors were interested in understanding, first, the VPRAI’s predictive value. Second, they assessed the benefits of the Praxis and STEPS program through a randomized study design. Unlike the A2J Lab’s field experiments, which usually take individuals as the units of randomization, the Virginia study randomized entire pretrial services offices to one of four conditions: (1) VPRAI only; (2) VPRAI + Praxis; (3) VPRAI + STEPS; and (4) VPRAI + Praxis + STEPS. The authors then used this exogenous (nerd speak for “completely external”) source of variation to analyze staff, judicial, and defendant responses.

The results were quite favorable for the introduction of the Praxis as well as for the VPRAI itself. One estimate suggested that higher VPRAI risk scores correlate strongly with higher actual risk. About two-thirds of the time, if one were to pick two defendants at random–one who failed and one who didn’t–the one who failed would have a higher VPRAI score. Pretrial services staff who had access to the Praxis also responded to its recommendations. Their concurrence (agreement) rate was 80%, and they were over twice as likely to recommend release relative to staff who did not have the decision grid. Next, the availability of the Praxis (versus not having it) was associated with a doubling of the likelihood that judges would release defendants before disposition.

What about defendant outcomes? The authors found that the availability of the Praxis was associated with a lower likelihood of failing to appear or being arrested for a new crime. STEPS alone had no discernible effect.

The VPRAI study suggests a few lessons for our ongoing pretrial risk assessment work, including in Iowa. First, we continue to emphasize that the tool under investigation, the PSA, is far from a cold, lawless automaton, as many commentators seem to worry. Yes, algorithms produce scores, and decision matrices generate recommendations. But human beings must still consider that evidence alongside their own human judgment. One hope is that such evidence will enhance the quality of judges’ decision-making. For now, we just don’t know; that’s the reason for our PSA RCTs. Relatedly, we think that final verdicts on actuarial risk assessments should await reports like the VPRAI study and the A2J Lab’s growing portfolio of evaluations. There will always be local policy issues deserving of debate and attention. However, we need strong evidence for or against these tools’ value before praising or condemning them wholesale. Finally we should, as always, evaluate this brave new world reliably. That means deploying, where possible, principles of experimental design. RCTs, simply put, represent our best shot at understanding causal relationships.

Stay tuned for more updates from Iowa and beyond!

The Evaluation Feedback Project Has Launched

Discussions about the use of performance standards and metrics to measure the quality, effectiveness, and efficiency of legal services have become common in the access to justice community. Increasingly, legal services programs are being asked to use data to communicate about their effectiveness to funders, community stakeholders, and policy makers. And, more importantly, by grounding decisions about legal assistance in evidence-based approaches, we will all be better prepared to determine how best to assist people in need.

Some service providers are responding to this call to implement better evaluation methods by designating an attorney or other administrative staff to manage surveying, the collection and analysis of administrative data, and collaboration with others to conduct needs assessments and impact analyses. However, many do not have a background in program evaluation, and there currently exists no organized, national resource for facilitating collaboration or the sharing of information across legal programs on this topic.

The access to justice community can do a lot to collaborate rather than each program reinventing the evaluation wheel. To facilitate the sharing of knowledge and expertise in an effort to grow evaluation capacity among our peers, the A2J Lab has partnered with  Rachel Perry (Principal, Strategic Data Analytics) and Kelly Shaw-Sutherland (Manager Research and Evaluation, Legal Aid of Nebraska) to launch a project that seeks to match programs that are working to develop evaluation instruments (e.g., client surveys, interview and focus group protocols, etc.) with experts who volunteer to provide feedback on the design of these tools. The volunteers are our own peers from the field who have done work in this arena, as well as a network of trained evaluation experts, many of whom have experience with evaluation in other fields.

Here’s how the project works:
1. A program or individual submits an evaluation tool for feedback;
2. We determine if the submission falls within the scope of this project;
3. We match the submission with 1-3 evaluators from a volunteer database;
4. Volunteers review the evaluation tool and provide feedback to the original submitter.

A secondary goal of this project is to create more of a community of data and evaluation oriented folks within the access to justice world. So, we encourage all of you to get involved! Check out the project page to learn more, submit an evaluation instrument to receive feedback, or volunteer to provide feedback to other programs working on developing evaluation tools.

Happy New Year!

Happy (nearly) New Year from all of us here at the Lab!

We’re excited for all we accomplished in 2017. This past year, we’ve seen the Lab grow in size and impact.

We now have over 6,360 participants enrolled in the Lab’s evaluations. We’re collaborating with 38 partners, including court systems, legal aid organizations, and other academic institutions. Over 75 student team members, along with our staff, have developed over 1,850 pages of self-help materials, as well as two digital self-help tools, to test for efficacy as we seek to learn the best way to help pro se defendants.

As the Lab runs more and more studies, our impact increases—and so do our costs.

In 2018, we’re hoping to double the number of studies we have in the field, but we can’t do it without your support.

If you’re thinking about making any final gifts in 2017, would you consider making a contribution to help the Lab continue to learn the best ways to help people with legal problems? Your gift will be put to immediate use in support of the Lab’s mission.

We look forward to sharing more news of our work in 2018!

The Ethics of Randomization

We get a lot of questions about the ethics of randomized control trials in the legal profession. The questions from our potential study partners go something like this: Is it ethical to let something other than our professional judgment determine who gets services and who doesn’t, even if it is for the sake of research? And, is it ethical to conduct such research when the stakes can be so high for our study participants?

We have answers.

First of all, we take very seriously our commitment to research that meets not only the standards set by Harvard University’s Committee on the Use of Human Subjects, but also a broader set of ethical norms from other fields conducting research on human subjects.

But we get it. Many people have ethical concerns that go beyond these standards. The study participants are often particularly vulnerable populations and many legal service providers are in this profession specifically to help people in need. Many of these cases involve life events for which the stakes are high: safety, shelter, health, and so forth. Access to legal services in many cases can end up being a critical lifeline.

So let’s talk about when the conditions are right for ethical randomization in the legal services context.

One important contextual factor within which most of the A2J Lab’s studies exist is that of resource scarcity. You are all likely acutely aware of the tenuous resources in the legal services world: federal and state funding levels cause significant anxiety every year, Interest on Lawyer Trust Accounts continue to decline, and courts are facing budget cuts that are driving some of them to reduce hours of operation. The unmet legal needs are significant. Stanford Law professor Deborah Rhode estimates that 4/5ths of the civil legal needs of the poor remain unmet.[1] Legal Services Corporation estimates that 85% of the civil legal problems faced by low-income Americans receive inadequate or no legal help.[2]

In this resource-constrained context we cannot provide services to everyone, and some mechanism must determine who receives services and who does not. This mechanism might involve a human making triage determinations based on her professional judgment, or it might involve the distribution of resources based on a first-come/first-served basis. It also might look like a lottery, where some impartial system allocates resources and determines service recipients randomly. The point is, we are already unable to provide services to everyone. A lottery is one of several options we have for making such determinations, and, in cases where services are already distributed by lottery, is one we already use. So it is frequently ethical to use a lottery to allocate the scarce resource (certain kinds of legal help, court-sponsored mediation, etc.) that one of our studies seeks to evaluate.

The other important factor is equipoise. Essentially, the concept of equipoise means that we do not already know whether the way we are allocating resources or providing services is the most effective way. Because we have no established tradition in the law for conducting rigorous research on effectiveness, we have subsisted for years based on policy preferences, professional judgments, or educated guesses rather than evidence. Thus, in all of our studies, we are operating in a state of profound uncertainty that justifies the use of a lottery (randomization) to find out what works.

Consider the chart below to think about how both equipoise and resource scarcity interact. The left column shows possible positive outcomes for a person working to solve a legal issue with some less expensive form of legal assistance, say, self-help materials. The right column shows possible positive outcomes for that person were she to receive an expensive form of legal assistance, say, a traditional attorney-client relationship. The third line (in which a higher level of assistance cause a worse outcome) appears in strike-through because, we surmise, it happens to infrequently that it is safe to ignore. Specifically, note that in some of these hypotheticals, legal assistance changes the outcome and, in other hypotheticals, the legal intervention does not make a difference.

Note: it’s really hard to tell into which line people belong. We can make educated guesses, and research can try to help us make good ex ante predictions. But it’s hard to tell in advance.

For now, though, suppose we did know what would happen to a particular person if we gave her self-help and what would happen to her if we gave her a traditional attorney-client relationship. In a resource-rich environment, one might simply look at whether this client experiences positive outcomes after receiving maximal legal assistance and stop there. In that case, we could provide an attorney-client relationship to anyone in line numbers 1 and 2. When resources are scarce, however, it becomes important to determine whether, when, and how those resources are really making a difference. The first row of the chart provides the ideal scenario for legal services: a person who would not succeed without legal assistance. The other rows present cases for which an investment of scarce dollars is not ideal: a person who would have succeeded even without maximal legal assistance (row 2), or a person who will not succeed even with self-help only (row 4). Every dollar spent on a case that falls into one of the scenarios described in rows 2 or4 is a dollar not invested in one of the scenarios in row 1.

  Positive Outcome for Person Receiving Self-Help Positive Outcome for Person Receiving an Attorney-Client Relationship
1 No Yes
2 Yes Yes
3 Yes No
4 No No

The problem is, because of the lack of research-based information in the law, we don’t actually know how to identify which clients or cases will fall into which rows. Now also imagine there are multiple columns showing different levels of legal assistance, methods for delivering such services, and types of cases. Then, it gets more complicated.  That’s what we need rigorous research for.

And really, if you think about it, the fact that the stakes can be high in the law generally is a reason to randomize and test, not to avoid doing so. When stakes are high, we should insist on rigorous evidence of effectiveness, not guesswork.

[1] Rhode, Access to Justice, Fordham Law Review

[2] The Legal Services Corporation, The Justice Gap: Measuring the Unmet Civil Legal Needs of low-income Americans, June 2017,

More information on our Default Part II study in four graphs

We’ve been working on some new data representations for our Problem of Default Part II study, which is now in the field in Boston. This Part II study doesn’t have its own non-intervention control group (meaning, all of the groups we’re evaluating are receiving some sort of intervention). This is because Part I already demonstrated that even limited intervention has a statistically significant effect on defendants’ answer and appearance rates compared with no intervention. Part II seeks to build on that knowledge by testing whether some interventions are more effective than others.

That said, we always like to be as thorough as possible as we design our studies. To that end, before we launched Part II, we did some analysis of existing court case data for all small claims cases filed in 2016 to gather some baseline information. We’ve created four graphs, now live on a new study web page. (If you haven’t seen the study volume tracker, that’s worth a look as well.)

The graphs contain a lot of information, and, if you’re not familiar with statistics or the intricacies of programs available in Massachusetts courts, they might be a little difficult to read.

Before we drill into an example, we have a few notes on the definitions of the different variables. One variable is whether or not a hearing for a case was scheduled on a Lawyer for the Day (LFD) program day. The Massachusetts Lawyer for the Day program is a pro bono legal service that provides some pro-se advising services in some courts on certain days of the week. Exact services and availability varies between courts. Another is whether a defendant fails to appear (FTAs) at a given hearing.[1] The graphs break down data between these two variables at different courts in four different ways:

  • If a defendant ever failed to appear (FTA’d) at any hearing that was held
  • If a defendant failed to appear at their first hearing that was held
  • If the defendant’s first scheduled hearing was scheduled on a day when the Lawyer for the Day (LFD) program was happening at the court and the defendant appeared at that scheduled hearing
  • If any of the defendant’s scheduled hearings were scheduled on a day when the LFD program was happening at the court and the defendant appeared at one or more such scheduled hearings

Let’s take a look at an example data point:

In this example, the circled dot is the proportion of study ineligible (noted by color) cases in Cambridge Small Claims Court (y-axis). The dot’s size shows that the number of cases it represents comprises about .4 of the total cases in the court, which in this case would be around 325 cases (.4 of the court’s total number of cases in the sample, 811).

The dot shows us that in almost 25% of the study ineligible cases in Cambridge Small Claims Court, the first hearing was scheduled on a Lawyer for the Day program weekday and the defendant appeared at that hearing.

Our hope is that these graphs, along with the frequently updated study volume information, provide a window into the study’s design and progress as we move forward. Look for more updates on data from this and our other studies in early 2018.

[1] In Boston Municipal Court (Civil), the defendant FTAs if the defendant does not file an answer or does not appear at the first hearing; the defendant does not FTA if the defendant does both of those things.

In The News

Over the past few weeks, we’ve been talking about a few news stories here at the Lab. We thought they might be of interest to you as well. If you’re looking for some reading, consider the following:

Happy reading!

RCTs in law: the Shriver studies

As you may remember from a previous post, the A2J Lab is developing an RCT in Providence, Rhode Island to study the effectiveness of triage in summary eviction cases.

Part of our interest in studying eviction is that it’s a topic very much on policymakers’ minds. Because housing instability continues to receive a lot of attention (see, e.g., Matthew Desmond’s Evicted) more resources and political action tend to follow. The even better news is that those resource allocations and policy changes have been based, at least sometimes, on empirical research.

As the Lab has documented elsewhere, access to justice interventions often aren’t studied at all. If they are, the studies are often observational rather than randomized—and readers know how important we think randomization is here at the Lab!

We’re always very excited to see the work of other legal studies teams who think so as well. One recent example is the evaluation of the California Shriver Civil Counsel Act. Among other things, the Act enhanced tenant representation in eviction cases. As part of a trial for additional funding provided by the Act, the State studied the impact of seven pilot programs designed to increase access to legal representation among low-income populations in California. More than simply moving toward empirical data, analysis in the report states that: ”[i]mportantly, for a limited period of time, three pilot projects randomly assigned litigants to receive Shriver full representation or no Shriver services, and data for these two groups were compared.”

We’re excited to see other researchers embrace randomized evaluations and policymakers appreciate their findings. The ultimate hope is that this progress continues to transform the legal profession into a more evidence-based one.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.

Welcome to the Lab’s newest members!

We hope that all of our U.S. readers enjoyed a break over Thanksgiving. As we mark the beginning of December, we’re thrilled to introduce you to two new staff members who joined us this fall.

Our new Associate Director of Research Innovations, April Faith-Slaker, joins us after serving in a variety of access to justice positions, most recently as the Director of the Resource Center for Access to Justice Initiatives at the American Bar Association. April is working with current and potential new partners across the country to develop RCTs on a variety of access to justice topics. If you’ve been considering designing an RCT for one of your programs, please let her know. She’d love to connect with you.

Sandy North, the Lab’s Associate Director for Administration, is responsible for a variety of projects, including, usually, the blog! If there’s a story you’d like to share or any feedback you’d like to offer, she’d love to hear from you.

We’re excited to have the Lab grow as we seek to fulfill our mission, and we look forward to sharing updates on the work of all of our staff in the coming months!

Why RCTs? Recent study on stents is one example

This past week, we’ve been avidly watching reactions to a new study, published in The Lancet, about the efficacy of using stents to help patients with chest pain. The New York Times ran an article on the study; so did The Atlantic.

If you haven’t been following this (potential) bombshell of an RCT, the study found no value in using stents to combat heart pain. Why is this such big news? Partially because using stents for cardiac pain is big business. According to the study’s authors, more than 500,000 patients receive the procedure annually for chest discomfort.

It’s also big news because it goes against intuition, even the sort that medical laypeople possess. Without evidence to the contrary, it might seem logical that opening blocked arteries with a stent would reduce chest pain. No wonder doctors adopted the practice with vigor! Now there are data that don’t back up that perception. Even in medicine, a field long conditioned to accepting the validity of empirical research, studies will bump up against the fallacy of conventional wisdom.

That fact doesn’t surprise us at the A2J Lab. What did grab our attention is that the authors received permission to run the study at all. As we mentioned in a recent post, all RCTs in the U.S. need to receive institutional approval before human subjects can enroll in a study. Based on our experience, it would be fairly startling if this type of study, which flies so baldly in the face of “conventional wisdom,” were to receive approval in the United States. An ethical review committee could have responded that this evaluation would prevent some participants from receiving a “benefit,” namely the treatment they “need.” The deeper held the belief, the harder it is to accept or allow the introduction of contrary evidence. That’s why we need to test interventions rigorously, particularly when resources are scarce and lives are at stake.

One final note on the study’s design. Critiques from medical researchers have included that the study is flawed due to “Type II error.” In short, they contend that the sample size (in this case, about 200) is insufficient to rule out false negatives. The challenge of having sufficient sample size is an important component of any RCT. The Lab, for example, uses power analysis to maximize the chance that a study will have enough observations to detect an effect, should that effect really exist. But a study’s sample size isn’t the only factor that’s important in determining its validity; it’s also important to know how generalizable the results are, regardless of their statistical significance.

This is just one more example of why RCTs are important. Have you seen others recently? Share them with us in the comments or on social media.

Subscribe to Blog via Email

Enter your email address to subscribe to this blog and receive notifications of new posts by email.