Skip to main content
Lorem ipsum dolor sit amet
Tecnológico de Monterrey Tecnológico de Monterrey
  • Living Lab
  • Data Hub
  • AIGEN
  • Calls
  • Dissemination
    • Events
    • News
    • Publications
    • YouTube Channel
  • Team
    • Data Hub Team
    • Living Lab Team
    • Special Projects

Call for Proposals: Bringing New Solutions to the Challenges of Predicting and Countering Student Dropout in Higher Education

imagen TEC

If a student loses learning during a specific year, the loss is forever for the student, the institution, and society. The high dropout rate costs students, educational institutions, and society as a whole [1, 2]. The loss cannot be counted in days, weeks, months, and years. When students drop out of college, they usually face personal regrets, financial hardships, and a decrease in their career aspirations.

Because of COVID-19, most colleges are closed or partially opened. Therefore, an overall average of two-thirds of an academic year has been lost. The closures of institutes of higher education put students and youth at risk of long-term consequences. According to a new UNESCO study [3], approximately 24 million students are at risk of dropping out of school or college. Over 100 million students will fall below the minimum proficiency level in reading due to the abrupt closure of schools and colleges. Hence, a lot of research needs to be conducted in this area. 

Student attrition was typically viewed from the perspective of psychology. In the past, those who did not stay in college were considered less talented, less motivated, and less willing to make the effort to delay earning college degrees [4]. Thus, the retention of students or the college's inability to retain them was linked to the individual skills, abilities, and motivation of the student. That is, if any students dropped out of college in their first year of enrollment, we said that the students failed, not the institutions. It is what we now know as victim-blaming [4].

To improve the quality of human resources, the development of the education sector is very important. Knowledge is power. The student enters the institution with many dreams and expectations. Therefore, the greatest responsibility of the institutions should be to fulfill the dreams of each student. For that, all the necessary requirements must be planned and organized with a defined learning pathway. Hence, the institution should be equipped with a tool to analyze potential student dropouts early in the semester and provide the appropriate support that is needed well in advance.

Significant studies have been developed applying Machine Learning in Higher Education based on concern for the student, with a specific focus on student academic performance, at-risk, and attrition [5]. To predict student performance and identify at-risk students, most papers use traditional Machine Learning algorithms, for instance, logistic regression, 𝑘-nearest neighbors, and decision tree-based ensemble models [6, 7]. Similarly, to predict student dropout and retention, the Naive-Bayes classification algorithm [8] and Support Vector Machines [9] have been applied, respectively. 

On the other hand, the student’s demographic and socioeconomic aspects [10-12], academic history as well as admission test scores [6, 13, 14] have been shown to be key variables to predict the student dropout at Higher Level [8, 9]. Other factors were also found to have incremental predictive power on academic performance and retention, such as first-semester university performance indicators [15] and psychological factors [16]. Furthermore, dimensionality reduction techniques have been used to identify the main factors that affect early dropout [17]. 

The importance of having an accurate model to predict student dropout is that its results could be used to improve and develop retention strategies in Universities [9]. This would benefit the students, by having timely and personalized strategies from their Institution that support their permanence in their career, as well as the Institution, by improving their statistics of student degree completion and their student investment costs.

Therefore, the purpose of this Call for Proposals is to implement new solutions that allow predicting the dropout of a student in a Higher Institution by using Machine Learning models based on a curated educational dataset. The accepted proposals will be invited to submit an article for inclusion in a Special Issue of a high-impact Journal that we are preparing for this call. The article will be subject to meeting Journal’s scope and quality requirements.

Audience: This call is open to all researchers, faculty, analysts, graduate students with an interest in educational data and knowledge in Machine Learning algorithms. Joint proposals are welcome, with at least one researcher.

Keywords: Educational Innovation; Student Dropout; Student Attrition; Machine Learning; Data Hub; Higher Education.

  1. Raisman, N. (2013). The Cost of College Attrition at Four-Year Colleges & Universities. Policy Perspectives. Educational Policy Institute.
  2. Latif, A., Choudhary, A. I., & Hammayun, A. A. (2015). Economic effects of student dropouts: A comparative study. Journal of Global Economics.
  3. UNESCO. One year into COVID-19 education disruption: Where do we stand? Available online: https://en.unesco.org/news/one-year-covid-19-education-disruption-where-do-we-stand (accessed on 12 October 2021).
  4. Tinto, V. (2006). Research and practice of student retention: What next?. Journal of college student retention: Research, Theory & Practice, 8(1), 1-19. 
  5. Fahd, K., Venkatraman, S., Miah, S. J., & Ahmed, K. (2021). Application of machine learning in higher education to assess student academic performance, at-risk, and attrition: A meta-analysis of literature. Education and Information Technologies, 1-33.
  6. Nagy, M., & Molontay, R. (2018). Predicting dropout in higher education based on secondary school performance. In 2018 IEEE 22nd international conference on intelligent engineering systems (INES) (pp. 389-394). IEEE.
  7. Rastrollo-Guerrero, J. L., Gomez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and predicting students’ performance by means of machine learning: A review. Applied sciences, 10(3), 1042.
  8. Hegde, V., & Prageeth, P. P. (2018). Higher education student dropout prediction and analysis through educational data mining. In 2018 2nd International Conference on Inventive Systems and Control (ICISC) (pp. 694-699). IEEE.
  9. Cardona, T. A. & Cudney, E. A. (2019). Predicting student retention using support vector machines. Procedia Manufacturing, 39, 1827-1833.
  10. Suresh, A., Rao, H. S., & Hegde, V. (2017). Academic dashboard—prediction of institutional student dropout numbers using a naïve Bayesian algorithm. In Computing and network sustainability (pp. 73-82). Springer, Singapore.
  11. Zwick, R., & Himelfarb, I. (2011). The effect of high school socioeconomic status on the predictive validity of SAT scores and high school grade‐point average. Journal of Educational Measurement, 48(2), 101-121.
  12. Freitas, F. A. D. S., Vasconcelos, F. F., Peixoto, S. A., Hassan, M. M., Dewan, M., & Albuquerque, V. H. C. D. (2020). IoT System for School Dropout Prediction Using Machine Learning Techniques Based on Socioeconomic Data. Electronics, 9(10), 1613.
  13. Lázaro Alvarez, N., Callejas, Z., & Griol, D. (2020). Predicting Computer Engineering students' dropout in Cuban Higher Education with pre-enrollment and early performance data. JOTSE: Journal of Technology and Science Education, 10(2), 241-258.
  14. Varga, E. B., & Sátán, Á. (2021). Detecting at-risk students on Computer Science bachelor programs based on pre-enrollment characteristics. Hungarian Educational Research Journal.
  15. Kiss, B., Nagy, M., Molontay, R., & Csabay, B. (2019). Predicting Dropout Using High School and First-semester Academic Achievement Measures. In 2019 17th International Conference on Emerging eLearning Technologies and Applications (ICETA) (pp. 383-389). IEEE.
  16. Séllei, B., Stumphauser, N., & Molontay, R. (2021). Traits versus Grades—The Incremental Predictive Power of Positive Psychological Factors over Pre-Enrollment Achievement Measures on Academic Performance. Applied Sciences, 11(4), 1744.
  17. Hegde, V. (2016). Dimensionality reduction technique for developing undergraduate student dropout model using principal component analysis through R package. In 2016 IEEE International Conference on Computational Intelligence and Computing Research (ICCIC) (pp. 1-6). IEEE.
  • Pre-launch Seminar

More information in our LL&DH Seminar: How can data science assist decision-making in higher education?

Higher Education Student Dataset for Predicting Dropout

The dataset that will be provided to applicants includes anonymized information related to undergraduate students who have enrolled and attended at least one semester at Tecnologico de Monterrey in Mexico from 2014 to 2020. Among the information categories available in this dataset are:

  • Sociodemographic information (age, gender, place of origin).
  • Enrollment information (program/school, region).
  • Academic information related to the student (previous level average, current average, periods completed).
  • Information associated with scores on admission tests (PAA, TOEFL, other initial evaluations).
  • Academic history (type of school, region, national/international, Tec system).
  • Student life (participation in sports, cultural, entrepreneurial activities).
  • Financial information (type of scholarship, percentage of scholarship).

You can download a simplified version of the data dictionary and a data teaser of the dataset below.

  • Data Dictionary

    Call on student dropout
  • Data teaser

    Call on student dropout

More information about the available dataset here.

Given the sensitivity of the data, the dataset to be released follows the purposes and specifications stipulated in our Data Policy.

  • Living Lab & Data Hub

    Data Policy

Advisory Board

For this Call for Proposals, the IFE Living Lab & Data Hub team counts on the valuable expertise of five academic advisors from Hungary, India, Chile, Portugal, and Mexico.

Roland Molontay

Roland Molontay

Budapest University of Technology and Economics
Vinayak Hegde

Vinayak Hegde

Amrita Vishwa Vidyapeetham, Mysuru Campus
Isabel Hilliger Carrasco

Isabel Hilliger Carrasco

Pontificia Universidad Católica de Chile
João Sarraipa

João Sarraipa

UNINOVA - Instituto de desenvolvimento de novas tecnologias
Luis Alberto Muñoz Ubando

Luis Alberto Muñoz Ubando

Tecnologico de Monterrey

Important dates

  • January 5, 2022 10:00 (GMT-6): Q&A session, view the recording here.
  • January 17, 2022 23:59 (GMT-6): Abstract submission: one page, a brief description of the proposal including mainly the techniques, models, algorithms or methods to use as well as, if required, the description of the data to add to those provided in the dataset for this call (Optional).
  • January 31, 2022 23:59 (GMT-6): Full proposal submission (extended deadline).
  • March 3, 2022: Result notification.
  • March 14, 2022: Sending the curated dataset to the selected applicants.
  • August 15, 2022: Full paper submission.
  • December 05, 2022: Implementation and testing of the best model selected in the call.

Proposal Submission

  1. You can send your proposal in a PDF document with the author(s) information.
  2. It is necessary to use the template defined by the IFE Living Lab & Data Hub for the proposal.
  3. Extended abstracts and proposals will be managed by correspondence to ife.datahub@servicios.tec.mx
  • Living Lab & Data Hub

    Student Dropout Call Template

Overall Process

  1. The research and development proposal document must follow the structure defined by the IFE Living Lab & Data Hub.
  2. It is recommended to base your experimentation on the data described in Section "Higher Education Student Dataset for Predicting Dropout". However, applicants may propose the use of other data for experimentation based on the literature. If new variables are proposed, references to the articles that use them, a description of what the purpose of using that data is, and how it is calculated/generated should be included in the proposal. This information will be analyzed and after an internal evaluation of the IFE Living Lab & Data Hub collaborators, in the acceptance response, the proposed fields that were selected will be indicated. These fields will also be included in the dataset that will be sent to the applicant.

  1. If the proposal is accepted, it is mandatory to acknowledge the support of the Tecnologico de Monterrey's IFE Living Lab & Data Hub in any publications, presentations, and technical reports that result from this call.
  2. The IFE Data Hub provides the dataset to researchers, whose proposals were accepted, on the condition that their codebooks will be available for disclosure.
  3. The academic advisors will act as mentors for the fifteen best-evaluated proposals throughout this stage (three proposals for each one).
  4. The results of the experimentation should be documented in the form of an article according to the guidelines of the Journal.

  1. Authors are requested to include the following legend in the acknowledgments of the papers they produce as a result of this call:
    The authors would like to acknowledge the Living Lab & Data Hub of the Institute for the Future of Education, Tecnologico de Monterrey, Mexico, for the data published through the Call "Bringing New Solutions to the Challenges of Predicting and Countering Student Dropout in Higher Education" used in the production of this work. Additionally, please cite the data descriptor published on the call dataset (https://www.mdpi.com/2306-5729/7/9/119), which describes in detail the dataset and the methodology followed for its collection and curation.
  2. To document the experimentation carried out by the applicant, the articles produced by the team will be sent to the Data Hub email ife.datahub@servicios.tec.mx. By August 2022, it is encouraged to submit your article(s) to the Special Issue of a high-impact Journal that we are preparing for this call. Note that the publication of the articles is not guaranteed since it depends on the review process established by the Journal for its acceptance. In case you decide to submit your work to another Q1/Q2 impact Journal, please share with us your pre-print and the acceptance notification. If you were unable to submit your paper to the call's Special Issue, please send us a letter indicating your publication plans at the latest in November 2022.
  3. The dataset processed and enriched by the applicant for experimentation can be deposited in the IFE's Educational Innovation collection of the Tecnologico de Monterrey's Data Hub. Please, contact us at ife.datahub@servicios.tec.mx for more information.
  4. The solution that presents the highest quality of the model (according to the evaluation metrics: precision, recall, accuracy) will be deployed in the Department of Institutional Intelligence and Analytics.

For any questions or comments regarding this Call for Proposals, please write to us at the following emails:

  • Dr. Joanna Alvarado-Uribe
    IFE Living Lab Coordinator and coordinator of this call
    joanna.alvarado@tec.mx
  • Dr. Héctor G. Ceballos
    Director of the IFE Living Lab & Data Hub
    ceballos@tec.mx
  • Ing. Paola Mejía
    IFE Data Hub Coordinator
    gabriela.almada@tec.mx
Logo Footer Tecnológico de Monterrey
  • Living Lab
  • Data Hub
  • AIGEN
  • Calls
  • Dissemination
  • Team

Living Lab & Data Hub | Institute for the Future of Education | 
Tecnológico de Monterrey | Av. Eugenio Garza Sada 2501 Sur Col. Tecnológico C.P. 64849 |
Monterrey, Nuevo Leon, Mexico.

Legal Notice | Privacy Policies | Privacy Notices