Compass recidivism data set | Net Phi: Internet, Society, and Philosophy

Introduction

Our project analyzed a sample from the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) for a group of 10,000 defendants from Broward County, Florida. The COMPAS software is used more widely in the U.S. to help aid judges in sentencing decisions. Our data set predicts the likelihood that a defendant will be a recidivist. It gives three scores: Risk of Violence, Risk of Recidivism, and Risk of Failure to Appear.

Methods

We used Lindsay Porier’s denotative reading to figure out the meaning of the variables, and Colin Koopman’s Micro level analysis to see how those variables are calculated and derived so that they can be entered into the data set. The denotative reading established a baseline understanding of the definitions of the variables. We focused on the risk of recidivism and risk of violence scores, using denotative reading to define those concepts. The denotative reading gave us a baseline understanding of the scores so that we could use micro analysis to analyze the variables with more granularity. The micro analysis allowed us to understand better the metrics and methods that go in to producing a risk score.

Work Performed

To perform the denotative reading, we read into the specifics of the database to better understand what the variables of risk of recidivism and risk of violence are. One main source was a guide to the COMPAS system by the data bases creator Northpointe. In their article about it, they give more detail into how the recidivism scores are calculated. As well as how they go from a raw score to a decile score and then to a rating of high, medium, or low. We also were able to figure out how the data is used in real world context. The denotative reading helped us to figure out the meaning of the variables and a general idea of their formulation. We then used a micro level analysis into the formulation of the scores. This lets us see deeper into the creation of the scores. We then found the list of questions that are used to calculate the scores. By analyzing this form, we saw the way that the COMPAS system creates the variables that are used for the numeric value to be put into the data base. We also looked at the systems results to look for any bias or inaccuracy in the results.

Results

Through our analysis, we found the specific meanings of the risk scores. The Risk of Recidivism means the predicted likelihood that the defendant will be taken to jail for a crime within two years of being released. Risk of Violent Recidivism means the likelihood that the defendant will be taken to jail for a violent crime within two years of being released. We found that the level of risk is relative to other people in the database and not an absolute risk. We also found the specific questions used to create the score. They involved sections about; family history, prior criminal history, residence status, social environment, education history, work history, and then a series of strongly agree to strongly disagree scale based on statements about criminal personality, criminal attitudes, and anger. Finally, through reading about the databases' accuracy, it was only about 60 percent accurate and mislabeled black defendants as high risk at about twice the rate of white defendants.

Term and Year

Winter 2026