The IPEDS
package
contains data on Post Secondary Institution Statistics in 2021. Some
datasets have been filtered to exclude imputation variables, while other
datasets are included in full. Details are given below.
We wanted to create a package that can be used with just a basic R
understanding, for prospective students wanting to attend undergraduate
or graduate colleges and universities. The package IPEDS
allows easy access to a wide variety of information regarding
Postsecondary Institutions, its current students, faculty, and their
demographics, financial aid, educational and recreational offerings, and
completions. College search websites are sometimes a little vague in
it’s statistics for an institution; this package aims to provide a
closer idea of what their institution of interest is really like.
All the datasets are taken from [IPEDS] (https://nces.ed.gov/ipeds/use-the-data)
adm2021
: dataset of Admissions and Test Scores for Fall
2021complete2021
: dataset of Completions in 2021conference
: dataset of Conferences for sports (from
offerings2021
)dir_info2021
: dataset of Directory Information for
2021fall_enroll2021
: dataset of Fall Enrollment for
2021fin_aid1920
: dataset of Financial Aid Statistics for
2019-2020offerings2021
: dataset of Institutional offerings for
2021relig_aff
: dataset of Religious Affiliations (from
offerings2021
)staff2021
: dataset of Fall Staff for 2021staff_cat
: dataset of Staff Categories based on
staff2021$staff_cat
This package can be used by students, college counselors, or involved parents interested in pursuing higher education, considering their options, and securing admission into their school of choice. Additionally, anyone interested in educational statistics can use this data for their research.
Here’s the first 5 rows of the complete2021
dataset
head(complete2021)
#> INSTITUTION_ID AWARD_LVL TOTAL TOTAL_M TOTAL_W TOTAL_NATIVE TOTAL_ASIAN TOTAL_BLACK TOTAL_HISP TOTAL_NHPI TOTAL_WHITE TOTAL_MULT TOTAL_UNKNOWN TOTAL_NRA UND18 AGE18_24 AGE25_39 AGE40PLUS AGE_UNKNOWN
#> 1 100654 5 562 191 371 1 0 507 6 1 12 8 19 8 0 367 185 8 2
#> 2 100654 7 251 71 180 1 1 168 0 0 12 3 58 8 0 13 203 31 4
#> 3 100654 9 7 2 5 0 0 3 0 0 0 0 0 4 0 0 7 0 0
#> 4 100654 10 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0
#> 5 100663 2 71 34 37 1 3 10 1 0 52 2 0 2 0 39 21 11 0
#> 6 100663 5 2870 1047 1823 7 194 688 131 0 1630 131 13 76 0 2074 666 130 0
We can use this package to address many questions such as:
To answer our questions we can make use of the existing functionality the package provides, as well as data wrangling and data visualization techniques. Some examples that address the question are below:
Which institutions have the qualities I’d like in an institution?
Let’s say Sophia, a senior at high school, is interested in going to a private college of relatively small size in the New England area that will accept the AP credits she’s earned, but is also slightly diverse and helps it’s students afford college.
Using the school_preferences
function, Sophia can find a
school that perfectly fits her preferences.
school_preferences(size = 2, region = "New England", alt_credits = "Yes", diversity_students = 36, financial_aid = 70, affiliation = 3)
#> Institution Institution ID % of Students Recieved Aid Institution Size Student Diversity Staff Diversity % of Students Disabled Region Type of Institution Religious Affiliation Calendar System Open Admissions Policy Years Required For Entering Vet Programs Alternative Credit Alternative Tuition Payment Distance Education Counseling Services Employment Services Daycare Services Live On-Campus Room Price Board Price Undergraduate Application Fee Graduate Application Fee
#> 1 University of Bridgeport 128744 78 2 68 26 1 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 . . 0 0
#> 2 Goodwin University 129154 88 2 51 21 1 New England 3 -2 1 1 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 4500 1700 50 50
#> 3 American International College 164447 97 2 54 18 2 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 8044 7310 0 50
#> 4 Bay Path University 164632 81 2 41 11 1 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 . . 25 0
#> 5 Clark University 165334 91 2 40 34 2 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 6000 4150 60 100
#> 6 Mount Holyoke College 166939 76 2 49 39 2 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 8320 8260 60 50
#> 7 Smith College 167835 71 2 48 31 2 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers no distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 9700 9720 0 60
#> 8 Wentworth Institute of Technology 168227 84 2 37 28 2 New England 3 -2 1 2 -2 Programs Available Takes alternate credit Takes alternate tuition plans Offers distance education opportunities Offers counseling services Offers employment services Offers no daycare services 2 12120 3300 50 50
The output is a data frame that includes The Institution name, ID, the % of students that receive aid, the size of the institution, the percent of non-white students and staff, the % of disabled students, the region of the institution, type, and other relevant information about the institution.
We can select the columns Sophia is most interested in:
school_preferences(size = 2, region = "New England", alt_credits = "Yes", diversity_students = 36, financial_aid = 70, affiliation = 3) %>%
select(`Institution`, `Institution Size`, `Region`, `Alternative Credit`, `Student Diversity`, `% of Students Recieved Aid`, `Type of Institution`)
#> Institution Institution Size Region Alternative Credit Student Diversity % of Students Recieved Aid Type of Institution
#> 1 University of Bridgeport 2 New England Takes alternate credit 68 78 3
#> 2 Goodwin University 2 New England Takes alternate credit 51 88 3
#> 3 American International College 2 New England Takes alternate credit 54 97 3
#> 4 Bay Path University 2 New England Takes alternate credit 41 81 3
#> 5 Clark University 2 New England Takes alternate credit 40 91 3
#> 6 Mount Holyoke College 2 New England Takes alternate credit 49 76 3
#> 7 Smith College 2 New England Takes alternate credit 48 71 3
#> 8 Wentworth Institute of Technology 2 New England Takes alternate credit 37 84 3
What are the admission requirements for my preferred institution?
If Sophia is interested in what it takes to apply to one of her
preferred schools, Sophia can use the admission_reqs
function that provides her with a list of the application
requirements.
admission_reqs(167835)
#> # A tibble: 9 × 2
#> Requirements Priority
#> <chr> <chr>
#> 1 High School Record Required
#> 2 Recommendations Required
#> 3 High School GPA Recommended
#> 4 High School Rank Recommended
#> 5 Completion of College-Prepatory Program Recommended
#> 6 Test of English as a Foreign Language Recommended
#> 7 Formal Demonstration of Competencies Neither_required_nor_recommended
#> 8 Other Tests Neither_required_nor_recommended
#> 9 Admission Test Scores Considered_but_not_required
Now Sophia knows which application materials are required and recommended, and which ones are not necessary at all.
What’s the relationship between the diversity of students and the diversity of staff?
In another scenario, a educational statistician is interested in the
potential relationship between how diverse a student body is and the
diversity of their staff. We’ll data visualize the % of diversity from
the resulting dataframe output by the school_preferences
function.
data <- school_preferences()
ggplot(data, aes(x = `Staff Diversity`, y = `Student Diversity`)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Student Diversity vs. Staff Diversity",
y = "Student Diversity (%)",
x = "Staff Diversity (%)")
#> `geom_smooth()` using formula = 'y ~ x'
Due to it’s functionality, the statistician could also limit their research to explore this relationship to schools only located in the New England area:
data <- school_preferences(region = "New England")
ggplot(data, aes(x = `Staff Diversity`, y = `Student Diversity`)) +
geom_point() +
geom_smooth(method = "lm") +
labs(title = "Student Diversity vs. Staff Diversity in New England Institutions",
y = "Student Diversity (%)",
x = "Staff Diversity (%)")
#> `geom_smooth()` using formula = 'y ~ x'
In both cases, we can see a moderate to strong positive relationship between student and staff diversity; after noting this relationship the statistician could go further by observing the how the size of an institution, can possibly influence this relationship.
data <- school_preferences(region = "New England") %>%
filter(`Institution Size` != -1 &`Institution Size` != -2 )
data$`Institution Size` <- as.factor(data$`Institution Size`)
ggplot(data, aes(x = `Staff Diversity`, y = `Student Diversity`, color = `Institution Size`)) +
geom_point() +
scale_fill_viridis_c(option = "magma") +
geom_smooth(method = "lm", aes(color=`Institution Size`), se = FALSE) +
labs(title = "Student Diversity vs. Staff Diversity in New England Institutions by Size",
y = "Student Diversity (%)",
x = "Staff Diversity (%)")
#> `geom_smooth()` using formula = 'y ~ x'
And they can conclude here doesn’t seem to be much of a difference depending on Institution Size in New England Institutions.
What are the main similarities and differences between my two top college choices?
Amanda, a high school senior, has to decide where she will attend college soon, but is still debating between her top two choices.
Using the compare_int
function, Amanda can take the two
schools she is interested in and compare them side by side in a table
that lists some of the major qualities of each institution.
compare_int(100654, 100663)
#> Alabama A & M University University of Alabama at Birmingham
#> Size 3 5
#> Full Time Students 1459 2361
#> Part Time Students 75 54
#> Average Aid Awarded 9872 9344
#> Average Award Size 9679 10435
#> City Normal Birmingham
#> State AL AL
#> Region Southeast Southeast
#> Urbanization 12 12
#> Calendar System 1 1
#> Admission Test Scores Considered_but_not_required Considered_but_not_required
#> Room & Board Cost . .
#> Degrees Offered Yes Yes
#> AP Credit Accepted Yes Yes
#> Dual Enrollment Credit Accepted Yes Yes
#> Study Abroad Programs Yes Yes
#> Freshman Required to Live on Campus No No
#> Meals per Week 19 .