Skip to content
This repository was archived by the owner on Sep 29, 2025. It is now read-only.

Commit d68b307

Browse files
committed
New parts and clean the SoA section. New ML added inside the SoA to write about.
1 parent 5dc186b commit d68b307

19 files changed

+289
-200
lines changed

bib/references.bib

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,16 @@
11

2+
@book{durkheim_suicide_1951,
3+
title = {Suicide, a Study in Sociology},
4+
isbn = {978-0-02-908660-5},
5+
abstract = {There would be no need for sociology if everyone understood the social frameworks within which we operate. That we do have a connection to the larger picture is largely thanks to the pioneering thinker Émile Durkheim. He recognized that, if anything can explain how we as individuals relate to society, then it is suicide: Why does it happen? What goes wrong? Why is it more common in some places than others? In seeking answers to these questions, Durkheim wrote a work that has fascinated, challenged and informed its readers for over a hundred years. Far-sighted and trail-blazing in its conclusions, Suicide makes an immense contribution to our understanding to what must surely be one of the least understandable of acts. A brilliant study, it is regarded as one of the most important books Durkheim ever wrote.},
6+
pagetotal = {418},
7+
publisher = {Free Press},
8+
author = {Durkheim, Émile},
9+
date = {1951},
10+
langid = {english},
11+
note = {Google-Books-{ID}: {ZoraAAAAMAAJ}},
12+
}
13+
214
@inreference{noauthor_bootstrap_2023,
315
title = {Bootstrap aggregating},
416
rights = {Creative Commons Attribution-{ShareAlike} License},

main.tex

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,9 +43,7 @@ \section*{abbreviation}
4343
\item[] SMOTE : Synthetic Minority Oversampling TEchniques
4444
\end{itemize}
4545

46-
4746
\vspace{16pt}
48-
4947
\begin{abstract}
5048
In the wake of the COVID-19 pandemic and the release of the new \textit{baccalaureate} reform, French education authorities in higher studies faces a surge of enrolments and higher dropouts numbers. Higher grade from students in the baccalaureate as lead, the French registration system in place to accept more and more students in higher degrees paths. Sadly, these new reforms did not take into account the difficulty step created between secondary and higher studies. Thus augmenting the number of dropouts in students who don't have the capacity, motivation and/or will to continue in their path.
5149

@@ -79,7 +77,12 @@ \section{Conceptual implementation}
7977
\subfile{sections/conceptual_proposal}
8078

8179
\vspace{16pt}
82-
\section*{Acknowledgment}
80+
\section{Acknowledgment}
81+
I would like to thanks all the faculty for their welcome, help and support throughout this year. Helping me both on my paper and on my work at the lab. I would like to personally thanks Dr. Ernesto Exposito, Dr. Mamadou Lamine Gueye and Dr. Houssam Kansso as well as PhD. student Nicolas Evain for their help and welcome. A special thanks goes to Dr. Mamadou Lamine Gueye for following me as a tutor for this last year of Master.
82+
83+
I would also like to thanks the entirety of the 2023-2024 SIGLIS and Industry 4.0 promotion with whom I've been able to share great times, knowledge and some fun moments during the year. A special thanks goes to the SIGLIS promotion for supporting me for two year, especially my flatmates Dorian Cazabat, Lise Laville and Gabriel Das Neves Rodrigues.
84+
85+
Finlay, I would like to thanks my friends and family for their support, their motivation and for being here in the time of need.
8386
\vspace{12pt}
8487

8588
\bibliographystyle{plain}

sections/introduction.tex

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,11 @@
2424
But what can be categorized as student success, and how to tell if a student enters the profile of excellence? Most of the time, success is, for institute, the number of student who graduate their degree. \cite{weatherton_success_2021}. However, even if this correlation can indicate a good rate of success for an institute or university, is it really indicative of real success? For our research, we have chosen to stay with the simple success definition of how many students can graduate to stay within a binary model for our ML. However, to really answer the problematic of highlighting an excellent student, we will take into account other factors.
2525
Another definition we have to pin is the definition of excellent student. By definition, it could be written like so, “an excellent student is one with good grades, a good understanding of the concepts and a general interest in the field of study.” Independent of students with ease for learning, an excellent student may not perform well in a casual course cursus, but out stand in a specific field he or she is interested in.
2626

27-
This research endeavours to explore and validate the potential of ML and data analytics in revolutionizing the admission process. The motivation is twofold: to enhance the success rate of students by ensuring they are placed in programs where they are most likely to excel, and to reduce dropout rates by minimizing mismatches between students and programs. We also thrive to found more excellence students within the mass of registration.
27+
This research endeavours to explore and validate the potential of ML and data analytic in revolutionizing the admission process. The motivation is twofold: to enhance the success rate of students by ensuring they are placed in programs where they are most likely to excel, and to reduce dropout rates by minimizing mismatches between students and programs. We also thrive to found more excellence students within the mass of registration.
28+
29+
We are going to base our experiments and result on the french academic system. However, we are going from the principal that any academic system could use this research to build such registration helping systems. Because of a lack of literature on the french system, we have extended it to the entire world, including all different academic system from different countries. Because the academic process matters less than the actcual need and hope from a student and institution point of view, we can exploit these different datas for our research. Yet, as researcher and as readers, we suggest that a line been drawn and remember that a big part of this system is the culture of the country in which this system is based. We are using universal factors to feed our system, but some may vary from country to country.
30+
31+
Another point we need to clarify is that this research is not made to discriminate student nor help the "elite" by creating an even bigger chasm in societal problematic. It is in fact a way to reduce this gap and give each student a chance of getting into higher education and earn some sort of diploma that will suit their need and hope.
2832

2933
This research will explore and proposes of different approach, starting with a detailed methodology, followed by a case study made within the University of Pau et des Pays de l'Adour.
3034
We will then conclude, taking into account our findings and the result of our experiment

sections/soa.tex

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,10 +6,6 @@
66

77
We have cut our review in two big sections. The first one will explore the importance of student choice for success, then, in the second part, we will review the field of student dropout prediction using ML. This second section itself cut into two parts, the first one focusing on the analytical approach to the problem, then, in the second part, on the predictive models used to predict student dropout in the literature.
88

9-
\subsection{Importance of student's choice for success in higher education}
10-
\label{subsec:soa_importantestudentchoice}
11-
\subfile{soa/subsec:soa_importantestudentchoice}
12-
139
\subsection{Predicting student dropout}
1410
\label{subsec:soa_predictingstudentdropout}
1511
\subfile{soa/subsec:soa_predictingstudentdropout}
@@ -20,8 +16,8 @@ \subsubsection{Analytical approach}
2016

2117

2218
\subsubsection{Predictive approaches}
23-
\label{subsec:soa_predictiveapproach}
24-
\subfile{soa/subsec:soa_predictiveapproach}
19+
\label{subsubsec:soa_predictiveapproach}
20+
\subfile{soa/subsubsec:soa_predictiveapproach}
2521

2622
\vspace{8pt}
2723
\subsection{State of the art conclusion}
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
\documentclass[../../../main.tex]{subfiles}
2+
\graphicspath{{\subfix{../../../res/}}}
3+
\begin{document}
4+
The Decision Tree model, in ML, stand out as a fundamental and versatile algorithm. It is widely recognized for their simplicity and efficacy for classification and regression tasks alike.
5+
Their tree-like structure gave them their name to the algorithm, comprised of nodes and branches (symbolizing decisions and their possible consequences). This helpful visualization and interpretation makes complex decision-making processes easier and intuitive for the user. This visualization is really powerful and is an invaluable tool in various applications, ranging from data mining to advanced research in AI.
6+
7+
Decision trees have been used in a wide range of fields such the medical, game-theory weather prediction and much more.\cite{quinlan_induction_1986}.
8+
From the literature, decision tree seems to have a precision of around 83\% based on multiple papers (73\% \cite{viloria_integration_2019}, 87.27\% \cite{ramirez_prediction_2018}, 83\%\cite{kemper_predicting_2020} and around 90\% \cite{tenpipat_student_2020}). This research paper has evaluated multiple algorithm within decision tree and graphed out a ROC Curve for each of their algorithm. In a similar domain as this study, which is students dropout in South Korea schools and how to predict them. Their ROC curves showed that for a low rate of false positive, we get an outstanding true positive rate. Approximately attaining their limits around 0.2 false positive rate\cite{lee_machine_2019}. This study shows the predictive efficiency of the decision tree model.
9+
One downside of each study is the granularity of student prediction. They have shown which variable can be used to determine which kind of students may be at risk of dropout, but they lack a micro vision to alert staff about a specific student at risk.
10+
\end{document}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
\documentclass[../../../main.tex]{subfiles}
2+
\graphicspath{{\subfix{../../../res/}}}
3+
\begin{document}
4+
5+
\end{document}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
\documentclass[../../../main.tex]{subfiles}
2+
\graphicspath{{\subfix{../../../res/}}}
3+
\begin{document}
4+
5+
\end{document}
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
\documentclass[../../../main.tex]{subfiles}
2+
\graphicspath{{\subfix{../../../res/}}}
3+
\begin{document}
4+
K-Nearest Neighbors (KNN) is fundamental in the field of ML. As a non-parametric, instance-based learning algorithm, it is particularly renewed for its simplicity and effectiveness for tasks such as classification and regression.
5+
The core principle of KNN is to predict labels of new data points by looking at the closest labelled data points 'K' and - for classification, take a majority vote - or an average in the case of regression. This algorithm basically learns by itself with the more data it is fed.
6+
It can be really powerful in our case, as it excels in scenarios where the decision boundary is irregular and blurry.
7+
8+
The following study \cite{shiful_machine_2021}, compared to other methods, was the second most accurate models from all other models (apart from the Random Forest algorithm).
9+
\begin{table}[H]
10+
\centering
11+
\caption{Training and testing Accuracy\cite{shiful_machine_2021}}
12+
\begin{tabular}{|c|c|c|}
13+
\hline
14+
\textbf{Model name} & \textbf{Training accuracy} & \textbf{Testing accuracy}\\
15+
\hline
16+
Decision Tree & 80\% & 80\% \\
17+
\hline
18+
KNN & 83\% & 84\% \\
19+
\hline
20+
Random forest & 94\% & 86\% \\
21+
\hline
22+
\end{tabular}
23+
\label{tab:training_testing_acc_shiful}
24+
\end{table}
25+
26+
This review shows us that, even though it might not be the most accurate model (at least in this study \cite{shiful_machine_2021}), it is an interesting model to chose for comparing students between them, and potentially discover groupings of students which may be at risk of dropout.
27+
28+
From this same study, the searchers have compared each classifier for each model. For KNN, they found these results :
29+
\begin{table}[H]
30+
\centering
31+
\caption{Comparison of all classifier\cite{shiful_machine_2021}}
32+
\begin{tabular}{|c|c|c|c|}
33+
\hline
34+
\textbf{Classifier} & & \textbf{Precision} & \textbf{recall}\\
35+
\hline
36+
KNN & Not Dropout & 86\% & 84\% \\
37+
\cline{2-4}
38+
& Dropout & 69\% & 26\% \\
39+
\hline
40+
\end{tabular}
41+
\label{tab:compar_classifier_shiful}
42+
\end{table}
43+
44+
This result implies that the model is pretty accurate, with an estimated 85\% of accuracy with these new results.
45+
\end{document}
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
\documentclass[../../../main.tex]{subfiles}
2+
\graphicspath{{\subfix{../../../res/}}}
3+
\begin{document}
4+
5+
\end{document}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
\documentclass[../../../main.tex]{subfiles}
2+
\graphicspath{{\subfix{../../../res/}}}
3+
\begin{document}
4+
Logistic regression, different from linear regression by its ability to handle binary outcomes (scenarios where the outcome is dichotomous), is an important part of the statistical method in the field of ML. It is particularly interesting for its role in classification tasks. This method works by modeling the probability of a binary response, based on one or more predicator variables (independent variables).
5+
Our scenario of student dropout is not dichotomous by nature. However, this method can be a first step for data cleansing and classification of at risk student and out of risk students.
6+
7+
An interesting approach by this study \cite{lan_sparse_2014} - which change the granularity of the prediction to a student based one - is to create a “grade book” (a source of information commonly used in the context of classical test theory \cite{novick_axioms_1966}), filed with ones if the student answer correctly to question \textit{i} or 0 if not. Then, weights are added to each question depending on their difficulty to define if a student is at risk of failing and dropout. However, this approach does include some sort of e-learning method, and does solve the problem from a per-student level. Yet, it does not take into account other factors cited above to paint a bigger picture of a life of a student, and thus, defining if this student is at risk or not. It could simply be failing this course, and even if a pattern arises for a specific student (of failing), can we conclude he is at risk of dropping out?
8+
\end{document}

0 commit comments

Comments
 (0)