
Members:
Robert Cronin, Jonathan Olson, Jacob Moreno
Summary:
The dataset we chose to analyze is from the Mortgage Disclosure Act. This was a 1975 US law which required banks to publish relevant information on their mortgage lending activities, in an effort to identify systematic discrimination within the US home mortgage system. We filtered the data to include the lending activities of The Bank of America within Oregon during the fiscal year of 2022. After our filtering, the data consisted of 75 columns, each representing a specific type of data (e.g. applicant race), and 3533 rows, each representing an individual mortgage.
We noticed that the data concerning race and ethnicity could potentially have issues. So we framed our analysis around finding out if the data fields are equitable. And if they aren’t, what might not be tracked which could indicate this discriminatory practice.
We performed a deconstructive analysis of the data in an attempt to identify where the representation of the data was lacking. The purpose of the dataset was to reveal discriminatory practices within lending activities, so we specifically focused on race and ethnicity, which is a part of identity that can tie directly into discriminatory housing practices. After analyzing the components of the data, we discovered issues within the descriptors for an applicant's race. The applicant_race category contains classifications for the applicants race, and it has 18 possible values. 11 out of 18 of these values were describing an Asian race. This representation is not an issue, rather the issue is that no other races (white/black/hispanic/etc) have this level of representation. Discrimination based on race is one of the most common forms of systematic oppression, and having such a lackluster descriptor for race could create a blindspot in the data, negating our ability to identify unethical practices. For example, if a certain race which falls under the broad category of black is discriminated against more than other races in that category, then this data wouldn’t capture this difference.
The broad categories used for races other than those from Asia are not accurate descriptors for individuals, especially in a multicultural place like America. The dataset also completely discludes certain ethnic or racial backgrounds entirely, for example not including any from the middle east or northern Africa, which have become an extremely discriminated against group in the past two decades. The solution for this issue would simply be to expand the allowed values for race and ethnicity to further represent those within the dataset, using the variety of responses for asian races as a blueprint.
Using Koopman’s method of looking at the micro level of the dataset (looking at the data dictionaries), we further identified issues within the dataset. By looking at the data dictionary, we found what values are allowed in each data field, then we identified what data was not being represented. This expanded and reinforced the conclusions from our deconstructive analysis because the methods are similar to each other. Macro structures were not necessarily present within the issue, because we were working with the common definitions of race and ethnicity in our analysis. So we drew our conclusions without this using this level of granularity.
This analysis resulted in the identification of a misrepresentation of race for black and Middle Eastern individuals which is further supported by exterior circumstances within the loan process. People of African American or middle eastern descent are much less likely to be offered mortgage options, and it was quantitatively found that the options offered to them are inferior to those offered to white lenders. Having a lacking definition of these terms could obscure systematic discriminatory practices, and must be corrected if the dataset is to achieve what it aims to achieve.
The Home Mortgage Disclosure Act was intended to collect data for the purpose of reducing bias. However, our analysis found that the allowed values for race and ethnicity aren't wide enough for seeing certain bias.