This project focuses on Diabetes 130. This dataset can be described in general terms as US hospitalized patient records of those with diabetes from the years of 1999-2008, containing information regarding their age, gender, and race, the data set consists of 101,767 rows and 50 columns. The Diabetes 130 Dataset contains sensitive information, such as the age, gender, and race of patients. Our project aimed to make visible how this infringes upon the autonomy of diabetic patients in their self-identification and selfhood.
We used the connotative and deconstructive research methods as described by Poirier in “Reading Datasets.” Using this element helped us to inquire into the socio-political context behind the confining data labels, which may not account for every individual accurately. We also analyzed at the micro (cell) level as described by Colin Koopman in “Format Anatomies.” Using this element aided in making visible and intelligible that which is obscure. We dove into the data dictionary and considered permissible variables, data types, and both present and missing data. We looked closely at the information already present, specifically those which are descriptors of individuals, as opposed to looking at the context behind this data, as detailed above.
We found that race, gender, and age fields are lacking variety in what ways patients identify themselves. Limiting these options could cause the data to be skewed and possibly excluding or misrepresenting different groups of people. The race field has only five allowed values: Caucasian, Asian, African-American, Hispanic, and Other. This excludes many people (for example, those of mixed-race or Indigenous descent) and potentially forces them to identify in a way that is not fully accurate, or groups anyone whose identity doesn’t fit cleanly into one of these categories as “Other”, rendering their race insignificant. Similarly, the gender field has only three allowed values: Male, Female, and Unknown/Invalid. This categorization does not recognize a difference between sex and gender, with sex being biological and gender being a social and psychological construct that exists on a spectrum, and may change throughout one’s life. This reflects the social ideology regarding gender of the early 2000s, when this dataset was created. Finally, we found that the age field allowed values were grouped into increments of 10 years. This is limiting in the sense that it is not as specific as possible. Additionally, it can hide differences between ages that are grouped together, particularly in the younger and older age ranges where development and changes in health statuses occur in large amounts at a fast pace.
Lack of all-encompassing permissible values for variables has technical impacts as well, in respect to what can then be done with the data. The objective of this dataset is to determine the early readmission of the patient within 30 days of discharge, and generalizations will likely lead to inaccurate predictions.
This project focuses on the Diabetes 130 Dataset which details US hospitalized patient records of those with diabetes from the years of 1999-2008 across 130 hospitals. It contains sensitive information regarding their age, gender, and race; our project aims to make visible how this infringes upon the autonomy of diabetic patients in their self-identification and selfhood.