Data Critique

Where is the dataset from?

Our dataset was pulled from the US Census website, with all data from 2021. The data was collected through the American Community Survey, and uses the Census Bureau’s Population Estimates Program to produce state population estimates.

What is included in the dataset?

Within the dataset is estimates of educational attainment for each state in the United States of America (including Washington, D.C. and Puerto Rico), divided between age and gender demographics. Numerical and percent estimates for total population, female population, and male population, along with margins of error, are given for each level of educational attainment. For example, the data for California would include, for each level of education in each age group, a total estimate of the educated population, a percent estimate of the educated population, a total female estimate, a percent female estimate, a total male estimate, a percent male estimate, and the margin of error for each of the estimates.

A big part of the dataset is divided between age groups and then split into different education levels. 

Age groups:
18 years to 24 years, 25 years and older, 25 to 34 years, 35 to 44 years, 45 to 64 years, 65 years and older

Educational attainment levels for the 18 to 24 years age group:
“less than high school graduate”, “high school graduate (includes equivalency)”, “some college or associate’s degree”, “bachelor’s degree or higher”

Educational attainment levels for the 25 years and older age group:
“less than 9th grade”, “9th to 12th grade, no diploma”, “high school graduate (includes equivalency)”, “some college, no degree”, “associate’s degree”, “bachelor’s degree”, “graduate or professional degree”, “high school graduate or higher”, and “bachelor’s degree or higher”

Remaining age groups:
“high school graduate or higher” and “bachelor’s degree or higher” 

-----
Also included is race or ethnicity by educational attainment data for each state. With the same estimate categories as the age by educational attainment previously listed, “high school graduate or higher” and “bachelor’s degree or higher” data are given for racial and ethnic groups rather than age groups.

Racial and ethnic groups:
White, White not Hispanic or Latino, Black, American Indian or Alaska Native, Asian, Native Hawaiian or Other Pacific Islander, Hispanic or Latino origin

-----
 After the race or ethnicity by educational attainment category is “poverty rate for the population 25 years and over for whom poverty status is determined by educational attainment level”.

Populations:
total population, male population, female population

Educational attainment levels:
“less than high school graduate”, “high school graduate (includes equivalency)”, “some college or associate’s degree”, “bachelor’s degree or higher”

-----
The final data category included in our dataset is “median earnings in the past 12 months (in 2021 inflation-adjusted dollars)” for the 25+ population.

Populations:

total population, male population, female population

Educational attainment levels:
“less than high school graduate”, “high school graduate (includes equivalency)”, “some college or associate’s degree”, “bachelor’s degree”, “graduate or professional degree”

What does the data reveal?

Our dataset is able to shed light on the trends in educational attainment by gender and race. The main discovery revealed by this dataset is that women in every state achieve higher educational attainment at each level than men. However, men consistently have higher median earnings than women, showing that there is a disconnect between educational attainment and earnings.

What are some data silences?

At a basic level, the educational attainment levels of age groups besides 18 years to 24 years and 25 years and older only have data on “high school graduate or higher” and “bachelor’s degree or higher." Additionally, in the race/ethnicity by educational attainment data, the racial/ethnic groups do not include mixed people, whose data is under nonspecific categories like “some other race alone” and “two or more races." Because there wasn't enough information on these groups, we had to exclude them from most parts of our narrative. 
 
Next, since information in the US Census is self-reported, inaccurate data and nonresponse bias may be present in our dataset. This can possibly affect the level of significance in parts of the data, and lower the accuracy of our insights. Naturally, people who chose to not participate or not answer all the questions on the 2021 Census are silenced.

Furthermore, despite our findings, our dataset cannot unveil any greater information on the disconnect between educational attainment and earnings, since it lacks insight to the sociocultural factors that disadvantage women in the workplace. Also revealed from our dataset is the fact that Asians have the highest numbers of bachelor’s degree or higher attainment, followed by Whites. Again, our dataset is strictly numbers, and is unable to convey the social pressures behind these educational decisions.