Student's Online Learning Evaluation Prediction Model

Project Overview

Massive open online courses (MOOC) have been on the rise amid the recent pandemic. The majority of students' education was delivered online due to COVID-19, which turned into a motivation to further enhance the quality of online classes. Pedagogy experts suggest that one of the most effective ways to strengthen student performance is to reform the evaluation technique. This project will make use of MOOCCube, a large-scale open data repository for massive online courses, to develop a student performance prediction model from students' online behavior by employing k-means clustering, fuzzy logic, and multiple regression.

GitHub

Objectives

To design an academic performance prediction model from students' online learning behaviors

To apply a K-means clustering to classify students’ online learning behavior.
To develop a fuzzy inference system in order to predict a student’s academic performance based on rules regarding learning behaviors.
To compare the effectiveness of fuzzy logic and multiple regression in predicting students’ academic performance.

Data Preparation

MOOCube has 706 Courses, 38,181 Videos, 114,563 Concepts, 199,199 Users.

WHAT

Among 700 courses, 38K videos, 114K concepts, 199K users, and additional resources such as concept graphs and academic papers stored in MOOCube, this project extracts four features of student online learning behavior from MOOCCube:

the average watching count: the average number of times each student opens a course
the average completion percentage: the average time duration to complete each course
courses count: the number of courses that each student registered
average enroll time: the time when each student registered for a course

HOW

The user id takes a role as the primary key to uniquely identify each piece of data. There are 199,199 user ids in ‘user.json’ and 48,640 user ids in 'user video act.json'. Through inner join, the common 48,640 user records in two JSON files have been extracted. Two features, average watching count and average completion percentage, are obtained from ‘user video act.json’ while the rest of the two features, courses count and average enroll time, are from ‘user.json’.

Since MOOCube does not provide students' grades, it had to be randomly assigned. The random grade was generated as follows: four online behaviors are all normalized using the min-max method and then summed up. It is multiplied by 100 to scale up and added 50 to assume the minimum grade that students can get is 50. With the randomly assigned grade, these data have been stored in 'master data.csv' (Table 1).

Table 1

Multiple Regression Model

Diagram 1

The online lecture completion rate has the biggest impact on the grade

This project assumes that each four online learning behavior is closely and linearly related to the other. The multiple regression model will take all four online learning behavior features at the same time as shown in the above model architecture diagram. After training the multiple regression model for predicting students’ academic performance, it turns out that all four parameters (online learning behaviors) are positively correlated with the student's academic performance. The result is summarized below:

One more second watching the video: 0.191 INCREASE in grade
One more percent of completing the video: 1.215 INCREASE in grade
One more registered course: 0.160 INCREASE in grade
One hour late for enrolling courses: 1.087 INCREASE in grade

It seems obvious that a video completion rate affects students’ academic performance the most among the four online learning behaviors.

The accuracy is 99.9%, and the percent error for test data is 5.11%.

Let's assume that there is a student who

watched a video lecture 5.8 times on average

completed 61.67% of an online lecture on average

registered for 4 courses on average normally at 3 p.m.

The multiple regression model concludes that this student's grade is predicted to be 64.5. Actually, this student was the last student in master_data.csv and was excluded from the train data. This student's grade was supposed to be 67.97. The accuracy of this multi-regression model (r2 score) is 99.9%, and the percent error is 5.12%.

Fuzzy Model

Model Architecture

The fuzzy model in this project takes only two online learning behaviors at a time; therefore, two fuzzy models have been developed. Fuzzy model 1 relates to online learning behaviors regarding watching lectures: average video watching count and average video completion rate. On the other hand, Fuzzy model 2 accounts for course registration habits: course count and average enrollment time. Diagram 2 describes Fuzzy model 1's architecture as a reference.

Diagram 2

K-means Clustering

K-means clustering is utilized in order to determine how many levels should be to classify students' performance in a fuzzy system. With the Elbow Method, this project decides to classify the students' grades into four levels. Chart 1 show the clustering results for Fuzzy model 1; Chart 2 for Fuzzy model 2. Both have four clusters.

Chart 1

Chart 2

Fuzzy Rules

Table 2

Table 3

The percent error is only 3.66%

Let's predict the student's grade that was also used to test the multi-regression model.

According to Fuzzy model 1, this student's lecture-watching habit will result in the average performance (grade = 57.34) as shown in Chart 3. Fuzzy model 2 which addresses this student's course registration habit also results in average performance (grade = 73.62) in Chart 4. The average value of these two grades is 65.48. Again, this student's grade was supposed to be 67.97. Therefore the percent error for the fuzzy model is 3.66%. It has less percent error than the multi-regression model.

Chart 3

Chart 4

Limitation

Randomly Generated Grade

MOOCube does not offer each student's grade. Therefore, this project generated students' grade data by normalizing each online learning behavior data. It could have affected the prediction model's performance in the process of training.

High Variability of Online Learning Behaviors

On top of tentatively decided students' grades, another limitation in this project occurs from neglecting unpredictable variables that might affect students’ online learning behaviors. Unlike face-to-face courses, students’ learning behavior in online courses is unmanageable. For example, even though a certain student has a perfect video completion rate, it cannot guarantee that a student perfectly digests the contents of courses because he or she may not fully concentrate on video lectures. It is impractical for educational institutions to control the behavior of students directly. This project assumes that statistical data on students' behavior perfectly reflects students' actual learning attitudes. Since it is far from reality, this assumption is expected to exert errors in the result of data analysis

SUMMARY