Predicting Students' Dropout and Academic Success | Net Phi: Internet, Society, and Philosophy

The Dataset:

This dataset is used to build models to predict university students' dropout and success rates based on information collected about the students. The dataset includes demographic information that was known at the time of students' enrollment and their academic performance at the end of their first and second terms. This dataset could be used to help students that have a higher likelihood of dropping out.

Problem Statement:

The Predict Students' Dropout and Academic Success dataset was designed to help reduce academic dropout or failure within higher education. What are certain biases that could impact affected students within implementations of this dataset, and what variables might indicate these implications?

Research Elements:

Research Databases Element: Structural Inequality

Structural inequality refers to the systemic inequalities that groups face when biases and stereotypes are so heavily rooted in a group or society that the inequalities they face are inherent to the systems around them. This element helps with analyzing the biases that could come out of the use of this dataset and how those could influence or worsen inequalities that certain student demographics already face.

Format Anatomies Elements: Denotative, deconstructive, and connotative reading

Using denotative reading we can look at the data dictionary for this dataset to analyze the values literal meanings which helps us understand what factors contribute to academic success and failure. Additionally, using deconstructive reading we can analyze what variables are missing from the dataset and what information about students is being overlooked or not considered. Finally, using connotative reading we can analyze what cultural influences contribute to the values' meaning and the variables and how that influences the data produced.

Dataset Analysis:

Structural equality:

If this dataset were to be implemented and used to create a prediction model for students academic success and failure rates, it could have the potential to harm certain students that happen to meet certain demographics that are labeled as higher risk for dropping out.

For example: if this prediction model was used in school admissions processes, students that are labeled as higher risk for dropping out might be unfairly dismissed from a higher percentage of schools; and students with a higher likelihood of succeeding based on the dataset will be unfairly preferred during admissions.

The dataset however doesn't include information about students' race, this can prevent further biases and inequalities against students of certain races because it doesn't allow an association to be made between academic success/failure and a student's race. Without this association it prevents further biases against students of certain races and looks at more concrete factors on academic performance like income, age at enrollment, and other outside factors.

Denotative reading:

Using denotative reading we analyzed the values literal meanings to assess what variables, like academic path, demographics, and social-economic factors, influenced academic performance the most. In the dataset there are 2,209 graduating students, 1,421 dropouts, and 794 currently enrolled students. Using denotative reading we gathered that 1st semester performance, gender, financial status, and age are strong predictors of academic success/failure. In addition, understanding the literal meanings of the values encoded in the dataset helped with further analysis of the dataset.

Deconstructive reading:

Using deconstructive reading we analyzed this dataset to find what variables are missing and what information about students that could contribute to their success in school is missing.

The dataset doesn't include many other factors that can contribute to a student's success like mental health, family income and support, veteran status, or children or dependents. In general, the dataset doesn't include much information about whether or not a student is non-traditional beyond age, immigrant status, and first-gen student status. This seems like a fairly big oversight considering non-traditional students have a higher likelihood of dropping out. If these variables were added it would give a clearer view of reasons that many students may drop out of school or have a harder time succeeding.

The special education needs variable also has a binary value of "yes" or "no" that leaves out any kind of nuance or data about specific special education needs and their affects on students' academic performance. If this variable was expanded it would give a better view on how specific special education needs affect students and how we can better support them.

Connotative reading:

Using connotative reading we analyzed what cultural influences impacted the variables and their values. In the dataset, gender is marked with a binary 0 or 1; 0 for male and 1 for female. The gender binary has evolved over time and it is now culturally understand as more diverse than two options. This dataset does however use gender and sex interchangeably, but even sex is still not binary. Expanding the options could make the data results and prediction models more accurate.

Summary:

The dataset cannot give an accurate prediction on academic success and failure due to its missing variables and not being able to consider which specific reasons non-traditional students have that make them more likely to drop out. In addition, there are potential harms and biases that could come out of the implementation of this dataset, specifically harming students that may end up being labeled as more likely to drop out or "fail." The dataset also does not take into account mental health or various special education needs.

There is an overall concern that if this dataset is not implemented properly, it will cause harm. It could cause harm to students now stereotyped as more likely to fail in higher education settings. In addition, it could also harm struggling students that are labeled as more likely to succeed if this dataset is used to assess resource allocation to students and to help identify which students need the most help. Students may struggle with school for reasons not included in the dataset, and if it implemented improperly, these students will be left behind.

Term and Year

Winter 2026