Journal of Education & Social Policy

ISSN 2375-0782 (Print) 2375-0790 (Online) DOI: 10.30845/jesp

Exploring Four Different Data Mining Models to Predict Community College First-Year Retention
Camille Gasaway Pace, Ed.D; Lantry L. Brockmeier, Ph.D; Michael J. Bochenko, Ed.D; Daesang Kim, Ph.D

This study aimed to create a predictive model for student retention using background, academic, and financial factors to guide other community colleges to use when investigating institutional retention. Four different data mining models (neural networks, random forest trees, support vector machines, and logistic regression) identified significant factors for retention. The number of credit hours was consistently the most crucial variable in retention. In addition, the interactions between the number of credit hours, GPA, and financial aid variables were significant in student retention in their first year. There were no consistent variables among the retention models that can predict students' nonretention in the freshman year. Background predictors (age, gender, race, or ethnicity) were not significant in predicting retained or nonretained students. The comparison of the retention models found that the random forest model had the best performance for accurately classifying the non-retained and retained students overall and the retained students individually.

Full Text: PDF