Skip to content
This repository was archived by the owner on Sep 29, 2025. It is now read-only.

Commit 7ae04d3

Browse files
committed
Implementation started
1 parent 9d1cb7f commit 7ae04d3

File tree

10 files changed

+124
-40
lines changed

10 files changed

+124
-40
lines changed

bib/glossaries/abbreviation.tex

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,10 @@
1414

1515
\newacronym{rf}{RF}{Random Forest}
1616

17+
\newacronym{if}{IF}{Isolation Forest}
18+
19+
\newacronym{lr}{LR}{Lasso Regression}
20+
1721
\newacronym{smote}{SMOTE}{Synthetic Minority Oversampling TEchniques}
1822

1923
\newacronym{roc}{ROC}{Receiver operating characteristic}

main.tex

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
\usepackage{float}
1212
\usepackage{subfiles}
1313
\usepackage[toc]{glossaries}
14+
\usepackage{listings}
1415

1516
%Style coding
1617
%\restylefloat{table}
@@ -76,12 +77,9 @@ \section{Conceptual proposal}
7677
\label{sec:conprop}
7778
\subfile{sections/conprop}
7879

79-
%Didn't had the time to recieve and experiment with the different datasets.
80-
%So the implementation will have to be added in a second time (version)
81-
%TODO: Add the implementation when experimentations are done
82-
% \section{Implementation}
83-
% \label{sec:imp}
84-
% \subfile{sections/imp}
80+
\section{Implementation}
81+
\label{sec:imp}
82+
\subfile{sections/imp}
8583

8684
\section{Conclusion}
8785
\label{sec:conclusion}
86.5 KB
Loading
194 KB
Loading
13.7 KB
Loading
129 KB
Loading

sections/conclusion.tex

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -10,20 +10,19 @@ \subsection{Summary of Literature Review Findings}
1010
First, let's talk about our literature review and conclude on the state, of our state of the art on the subject.
1111
We saw that many articles and research had been done on the subject or similar subject of predicting students success or dropout using statistical analysis and/or \acrfull{ml} algorithm and \acrfull{ai}.
1212
We have consider the need to predict student's success or failure the same and coming from the same human factors. When looking at the analytical part of the literature (not focusing on any machine learning model / algorithm), we were able to withdraw a list of factors commonly proven to be identifiable in someone's success and failure, and more particularly including factors targeting students. As put inside our analytical predictive approach from the state of the art \ref{subsubsec:soa_analyticalapproach}, the factors were :
13-
\cite{opazo_analysis_2021,tinto_dropout_1975,caspersen_teachers_2015,lidia_problema_2006,bejarano_caso_2017,sinchi_acceso_2018,cavero_voluntad_2011,velasco_alisis_nodate}:
1413

1514
\begin{itemize}
16-
\item Family : Does that person got support from their family? Do they still have a family, are they in good term, are they living with them?
17-
\item Previous educational background : What is this individual background on an educational level? What was their last diploma, which level are they on?
18-
\item Academic potential : Do they have already been approached as potential excellent student?
19-
\item Normative congruence : Does the individual conform to societal rules?
20-
\item Friendship support : Does the individual have good support from friends? Do they have friends? How are they social life with other person (preferably from within their age range)?
21-
\item Intellectual development : Has the individual been able to process and have a \textit{regular} intellectual development? Do they have a condition impacting this factor?
22-
\item Educational performance : Have they proven performant on an educational level already? How were they previous performance?
23-
\item Social integration : Have they integrated fine with other student, staff and their new academic environment?
24-
\item Satisfaction : Are they satisfied with their life's choice (More precisely, are they happy with their study choice?)
25-
\item Institutional commitment : Do they commit to their success and to the institutional life? Or do they only go in class and do the bare minimum?
26-
\item Student adaptation : Just like \textbf{Social integration} and \textbf{Normative congruence}, how does that individual adapt to its new environment and life?
15+
\item Family
16+
\item Previous educational background
17+
\item Academic potential
18+
\item Normative congruence
19+
\item Friendship support
20+
\item Intellectual development
21+
\item Educational performance
22+
\item Social integration
23+
\item Satisfaction
24+
\item Institutional commitment
25+
\item Student adaptation
2726
\end{itemize}
2827

2928
When we had gather enough data to found what factors from our datasets we needed to extract to feed our models, we needed to search which models had already been tested and proven within the literature.

sections/conprop.tex

Lines changed: 2 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -14,29 +14,11 @@
1414
\subsection{Feeding data}
1515
\label{subsec:conprop_feedingdata}
1616
Our literature survey \ref{subsubsec:soa_analyticalapproach} has identified several key factors influencing student retention and success. We can extrapolate and hypothesis such wide factors could be used to determine student's success.
17-
These factors, hypothesized to be critical in predicting student trajectories, are: \cite{opazo_analysis_2021,tinto_dropout_1975,caspersen_teachers_2015,lidia_problema_2006,bejarano_caso_2017,sinchi_acceso_2018,cavero_voluntad_2011,velasco_alisis_nodate}:
18-
19-
\begin{itemize}
20-
\item Family : Does that person got support from their family? Do they still have a family, are they in good term, are they living with them?
21-
\item Previous educational background : What is this individual background on an educational level? What was their last diploma, which level are they on?
22-
\item Academic potential : Do they have already been approached as potential excellent student?
23-
\item Normative congruence : Does the individual conform to societal rules?
24-
\item Friendship support : Does the individual have good support from friends? Do they have friends? How are they social life with other person (preferably from within their age range)?
25-
\item Intellectual development : Has the individual been able to process and have a \textit{regular} intellectual development? Do they have a condition impacting this factor?
26-
\item Educational performance : Have they proven performant on an educational level already? How were they previous performance?
27-
\item Social integration : Have they integrated fine with other student, staff and their new academic environment?
28-
\item Satisfaction : Are they satisfied with their life's choice (More precisely, are they happy with their study choice?)
29-
\item Institutional commitment : Do they commit to their success and to the institutional life? Or do they only go in class and do the bare minimum?
30-
\item Student adaptation : Just like \textbf{Social integration} and \textbf{Normative congruence}, how does that individual adapt to its new environment and life?
31-
\end{itemize}
32-
3317

3418
\subsection{Data workflow}
3519
\label{subsec:concimp_dataworkflow}
3620
Our workflow, as depicted in Figure \ref{fig:dataworkflow}, is designed to systematically transform raw data into actionable insights. Even though we are looking to find excellence in registration for students, this model could be used and/or improved as a security measure to detect students at risk of dropping out.
3721

38-
39-
4022
Each component of the workflow serves a strategic purpose:
4123

4224
\begin{enumerate}
@@ -60,6 +42,7 @@ \subsection{Available dataset}
6042
\item Institutional commitment : Do they commit to their success and to the institutional life? Or do they only go in class and do the bare minimum?
6143
\end{itemize}
6244

45+
6346
\subsection{Validation and Expected Outcomes}
6447
\label{subsec:concimp_validexcpecoutcomes}
6548
We anticipate that this workflow will yield a robust model capable of identifying excellent students. We will gauge the efficiency of our model through rigorous validation techniques such as \acrfull{roc}, \acrfull{pca}, etc. to ensure the reliability of our predictions.
@@ -93,7 +76,7 @@ \subsection{Usage on the field}
9376
\label{fig:imp_fonc}
9477
\end{figure}
9578

96-
This diagram\ref{fig_imp_fond} shows a possible way for institution on one implementation possibility following a pretty basic process. It starts from the data collection at the beginning, from one or multiple database from the institution. A pre-cleanup (and if needed data aggregation) should be deployed by the institution. We have left this part free of choice at the moment.
79+
This diagram \ref{fig:imp_fonc} shows a possible way for institution on one implementation possibility following a pretty basic process. It starts from the data collection at the beginning, from one or multiple database from the institution. A pre-cleanup (and if needed data aggregation) should be deployed by the institution. We have left this part free of choice at the moment.
9780
Then, whenever this new dataset is constructed from the institution's data following our factors list, we can send it through are framework model and wait for the output(s). Depending on the institution's \textbf{need} and \textbf{definition of success}, we can provide one or more outputs. We can also outputs model evaluation metrics if wanted / needed.
9881
The dataset fed to the machine should include in a certain way these factors seen in subsection \ref{subsec:conprop_feedingdata}, for our framework to be able to create its student profiles and evaluate them.
9982

sections/imp.tex

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,102 @@
22
\graphicspath{{\subfix{../res/}}}
33
\begin{document}
44

5+
Let's begin this implementation by studying our raw dataset (cleaned) and defining the outcomes we want from our system. When these two sections will have been discussed, and choices made, we are going to explore the construct and results from our experimental system.
6+
7+
\subsection{Analysing the raw dataset}
8+
We made a first study of our raw dataset (after some light cleaning of it) to try and understand the data we are working with. First, we wanted to understand the data universe itself. How many entry have we available on our hand. This is the result of this first analysis :
9+
\begin{table}[h]
10+
\centering
11+
\begin{tabular}{|c|c|}
12+
\hline
13+
Academic Year & Number of Students \\
14+
\hline
15+
2018-2019 & 14 \\
16+
2019-2020 & 13 \\
17+
2020-2021 & 13 \\
18+
2021-2022 & 17 \\
19+
2022-2023 & 32 \\
20+
\hline
21+
Total & 90 \\
22+
\hline
23+
\end{tabular}
24+
\caption{Number of Students per Academic Year}
25+
\label{tab:students_per_year}
26+
\end{table}
27+
28+
Our dataset is small for our need, so we will need to exclude some parts of our conceptual model to test at least our basis hypothesis of being able to cut out dataset into excellence, average and at risk student.
29+
Here is the model we are going to work with for this first implementation, awaiting more data to arrive to test more modules of our system :
30+
31+
\begin{figure}[H]
32+
\includegraphics[width=1\linewidth]{res//diagram/simplifiedmdl-Imp_mdl.drawio.png}
33+
\caption{Simplified algorithmic workflow used in this first implementation.}
34+
\label{fig:dataworkflow_simp} %lol
35+
\end{figure}
36+
37+
Continuing on our analytical analysis, we wanted to see the average age of our sample using their year of birth available in our dataset. Here is a heatmap of the different academic year and the hotspots by year of birth :
38+
\begin{figure}[H]
39+
\includegraphics[width=1\linewidth]{res/graph/data_analysis/raw/heatymap_year.png}
40+
\caption{Heatmap of year of birth by academic year}
41+
\label{fig:heatmap_dob_acayear}
42+
\end{figure}
43+
44+
As we can see, a strong hotspot is the year 1998 with a frequency of 16 over 90 samples. And, for the academic year 2022/2023, we have another strong hotspot for the year of birth 2000 (with a frequency of 9 over 9 for this year of birth) which indicated all students born in 2000 have registered on the same year (2022/2023).
45+
46+
Another variable we wanted to study is the academic mention obtained by students for each academic year. Thus, giving us a view on which was the best and worst years academically in the dataset.
47+
\begin{figure}[H]
48+
\includegraphics[width=1\linewidth]{res/graph/data_analysis/raw/academicmentions_academic_year.png}
49+
\caption{Histogram of academic mentions by academic years}
50+
\label{fig:hist_acament_acayear}
51+
\end{figure}
52+
53+
54+
We can detect one \textit{outlier} for this five available academic years. The year 2022/2023 have been the best year so far with an outstanding 4 \textbf{Très bien} (very good) and 15 \textit{Bien} (good). While the year 2018/2019 lagging behind with 3 students over 14 with a \textit{Passable} mention (only average). However, for both years 2021/2022 and 2022/2023, we have more student with no mention at all.
55+
56+
Finaly, we wanted to see the number of student admitted by year, and do a comparative table of mean of student's final grade for each year.
57+
\begin{figure}[H]
58+
\includegraphics[width=1\linewidth]{res/graph/data_analysis/raw/nbadmissions_year.png}
59+
\caption{Evolution on the number of admissions by year}
60+
\label{fig:evol_nb_admis}
61+
\end{figure}
62+
\begin{table}[h]
63+
\centering
64+
\begin{tabular}{|c|c|c|c|}
65+
\hline
66+
Academic Year & Admited & Adjourned & Total \\
67+
\hline
68+
2018-2019 & 12.89 & 0.00 & 12.89 \\
69+
2019-2020 & 13.82 & 0.00 & 13.82 \\
70+
2020-2021 & 13.53 & 0.00 & 13.53 \\
71+
2021-2022 & 13.98 & 11.49 & 13.83 \\
72+
2022-2023 & 14.41 & 8.67 & 14.05 \\
73+
\hline
74+
\end{tabular}
75+
\caption{Admission Statistics (global grade mean) per Academic Year}
76+
\label{tab:admission_statistics}
77+
\end{table}
78+
79+
Because our data is not normalized for a \textit{at risk} model, we will only consider predicting the excellence in our dataset with our model.
80+
As we can see, for each year, the global mean is quite similar, orbiting around 13.5/20.
81+
82+
\subsection{Defining the outcomes}
83+
84+
As discussed per our \ref{sec:soa} State of the Art and \ref{sec:analysis} Analysis sections, one hardening point in this kind of study is the definition of success. This broad question needs to be answered before constructing the system as the parameters of it will be influenced by our needed outcomes.
85+
For this first implementation, and according to our available variables and data in our dataset, we will define our success simply based on the final overall grade of the student. Thus, we can define \textbf{our} success as follow :
86+
\begin{quote}
87+
\textbf{Success :} A student who has follow through the registration correctly, is not dispensed of diploma (forbidden students) and who have an overall grade :
88+
\begin{equation}
89+
grade \geq 16
90+
\end{equation}
91+
\end{quote}
92+
93+
We've chose 16 at the minimum value for success depending on our dataset, which have a maximum of 17 for this variable and to have as much data available to train our system.
94+
95+
\subsection{Building and setting up the system}
96+
97+
From figure \ref{fig:dataworkflow_simp}, we must build a system composed of four different \acrshort{ml} algorithm (\acrfull{knn}, \acrfull{if}, \acrfull{lr}), setted up in order to find definition of success within our dataset. This means creating a profile of excellent student with an average grade of at least 16, not forbidden and with a full registration process completed.
98+
Then, after training our model, we have to teach him correlations between the profile we have setted-up to train it to find excellency within our registration datasets.
99+
100+
101+
5102

6103
\end{document}

sections/soa/subsec:soa_predictingstudentdropout.tex

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@
1212
\end{figure}
1313

1414
If we dive deeper into the analytical search on these platform (we are going to concentrate on Scopus for now), using this search term :
15-
\begin{verbatim}
15+
\begin{lstlisting}[breaklines]
1616
TITLE-ABS-KEY ( student AND dropout ) AND ( LIMIT-TO ( SUBJAREA , "SOCI" ) OR LIMIT-TO ( SUBJAREA , "COMP" ) OR LIMIT-TO ( SUBJAREA , "PSYC" ) OR LIMIT-TO ( SUBJAREA , "ENGI" ) OR LIMIT-TO ( SUBJAREA , "MATH" ) )
17-
\end{verbatim}
17+
\end{lstlisting}
1818
We can follow the trend on the number of publication each year about the subject of student dropout prediction and we can also once again notice the longevity of the subject in research, dating all the way back since the 1950's.
1919
Here is the analysis as a graph extracted from Scopus :
2020
\begin{figure}[H]
@@ -41,7 +41,10 @@
4141

4242
We this understand how universal this problem is from all the different top countries publishing about that subject since the 1950's. At least one country from the five continent have one publication in this subject. Moreover, many fields have looked into the subject, giving us a lot of interesting point of view to analyze from.
4343

44-
Now, if we look for the same subject but adding the \acrshort{ml} or \acrshort{ai} to it :TITLE-ABS-KEY ( student AND dropout AND ( machine AND learning OR artificial AND intelligence ) ) AND ( LIMIT-TO ( SUBJAREA , "SOCI" ) OR LIMIT-TO ( SUBJAREA , "COMP" ) OR LIMIT-TO ( SUBJAREA , "PSYC" ) OR LIMIT-TO ( SUBJAREA , "ENGI" ) OR LIMIT-TO ( SUBJAREA , "MATH" ) OR LIMIT-TO ( SUBJAREA , "DECI" ) )|
44+
Now, if we look for the same subject but adding the \acrshort{ml} or \acrshort{ai} to it :
45+
\begin{lstlisting}[breaklines]
46+
TITLE-ABS-KEY ( student AND dropout AND ( machine AND learning OR artificial AND intelligence ) ) AND ( LIMIT-TO ( SUBJAREA , "SOCI" ) OR LIMIT-TO ( SUBJAREA , "COMP" ) OR LIMIT-TO ( SUBJAREA , "PSYC" ) OR LIMIT-TO ( SUBJAREA , "ENGI" ) OR LIMIT-TO ( SUBJAREA , "MATH" ) OR LIMIT-TO ( SUBJAREA , "DECI" ) )
47+
\end{lstlisting}
4548

4649
We obtain the following graph :
4750
\begin{figure}[H]

0 commit comments

Comments
 (0)