Difference between revisions of "ANLY482 AY2016-17 T2 Group15 Analysis & Findings"

From Analytics Practicum
Jump to navigation Jump to search
Line 142: Line 142:
 
<h3>Composition of Students</h3>
 
<h3>Composition of Students</h3>
 
<center>[[image:edufy_eda_table.png]]</center>
 
<center>[[image:edufy_eda_table.png]]</center>
 +
 
<center>2014: [[image:edufy_eda_composition_2014.png | 600px]]</center>
 
<center>2014: [[image:edufy_eda_composition_2014.png | 600px]]</center>
 +
 
<center>2015: [[image:edufy_eda_composition_2015.png | 600px]]</center>
 
<center>2015: [[image:edufy_eda_composition_2015.png | 600px]]</center>
 +
 
<center>2016: [[image:edufy_eda_composition_2016.png | 600px]]</center>
 
<center>2016: [[image:edufy_eda_composition_2016.png | 600px]]</center>
  
Line 153: Line 156:
  
  
2014:<center>[[image:edufy_eda_olevel_subjectcombi_2014r4.png | 600px]]</center>
+
<center>2014: [[image:edufy_eda_olevel_subjectcombi_2014r4.png | 600px]]</center>
<center>[[image:edufy_eda_olevel_subjectcombi_2014r5.png | 600px]]</center>
+
 
 +
<center>2014:[[image:edufy_eda_olevel_subjectcombi_2014r5.png | 600px]]</center>
 +
 
 +
<h3>Prelims & 'O' Levels Performance by Class</h3>
 +
We also attempted to compare the Prelims and 'O' Levels performance by class to see if there is any deviation in trend. However, the general trend remains that the class with students taking the 'Triple Science' subject combination tends to do better than students in other classes taking other subject combinations.
 +
 
 +
 
 +
<center>2014: [[image:edufy_eda_prelims_olevel_2014r4.png | 600px]]</center>
 +
 
 +
<center>2014:[[image:edufy_eda_prelims_olevel_2014r4.png | 600px]]</center>
  
<h3>'O' Levels Performance by Class</h3>
 
  
  

Revision as of 04:23, 24 February 2017

Edufy back.png Back to Project Main Page

Edufy icon.png

Edufy homeicon.png Home

Edufy projectoverviewicon.png Project Overview

Edufy analysisicon.png Analysis & Findings

Edufy projectmanagementicon.png Project Management

Edufy documentationicon.png Documentation


Data Source

The data that we obtained were all provided by Edufy Secondary School. In total, we received data covering three batches of students from 2014 to 2016. Each batch of data covers the four years of secondary school that the student have been through. Just to make it clear, the data we have will be consist of the following:

Batch of 2014 Batch of 2015 Batch of 2016
Secondary 1 (2011) Secondary 1 (2012) Secondary 1 (2013)
Secondary 2 (2012) Secondary 2 (2013) Secondary 2 (2014)
Secondary 3 (2013) Secondary 3 (2014) Secondary 3 (2015)
Secondary 4 (2014) Secondary 4 (2015) Secondary 4 (2016)

And for each year, we are also given the breakdown of the various examinations that each student has to take in a year. Here is the breakdown of the various data for each year:

  • Secondary 1: CA1, SA1, CA2, SA2, Overall (5 sets of data)
  • Secondary 2: CA1, SA1, CA2, SA2, Overall (5 sets of data)
  • Secondary 3: CA1, SA1, CA2, SA2, Overall (5 sets of data)
  • Secondary 4: CA1 OR CA2, SA1, SA2 aka Prelims, Overall (4 sets of data)


The 'Overall' refers to the overall score a student gets for that entire academic year. It is calculated by taking a combined score for CA1 & SA1 (37.5% CA1, 62.5% SA1) which makes up 40% of the total and CA2 & SA2 (25% CA2, 75% SA2) which makes up the remaining 60% of the total.


Edufy sample data.png


You can see a small glimpse of the data that we have received from our sponsor in the above image. This data is the first few columns of the Batch of 2016 CA1 data that we received. So this file will mainly contain the Secondary 1 CA1, Secondary 2 CA1, Secondary 3 CA1 and Secondary 4 CA1 from the Batch of 2016.


Each individual student's name is being coded. For example, in the image shown, the first student is a Secondary 4 student from the class S4-1 and his index number is 1. This protects the identity of the students that we are analyzing. Besides the main academic results, we also have other columns such as the second language of the student, the results of PSLE and 'O' Levels (our main objective), the gender of the student and the student's class in Secondary 1 and Secondary 2 (for inter-class analysis). There is approximately 800 columns per file, it varies based on the subjects offered during the particular year that the student is in.


After asking our sponsor for more data, we managed to get the CCA data of the students as well but only the CCA data during the students' graduating year. Here is a sample data of the CCA for the 'Batch of 2016':


Edufy sample data cca.png


As you can see, we are given the name of the CCA the student is involved in and also the number of points and the corresponding grade that the student received at the end of the four years of their secondary school. We are not given the CCA records at the end of each of their academic year.


Data Preparation

For our entire data preparation and analysis, we will be using the following softwares:

Edufy excel.png Edufy jmppro.png


Removed unnecessary columns from the data

Before: Edufy before remove columns.png
After: Edufy after remove columns.png


Some columns are unnecessary and it will only add on to the size of the data and make things confusing. Such columns can be the grade of a particular subject for a student. The letter grade is derivable from the numerical score and thus we feel that it is unnecessary to keep the grade column. The name of the subject teacher is also unnecessary as we do not need to know the name of the teacher. Also, it is to protect the privacy of the teacher.

One other possible reason to remove a column can be that a particular subject is not being offered at all in that academic year. One of the signs of this is that the data for that particular subject column is all empty. And after clarifying with our sponsor on which are the subjects not offered in the various academic years, we can safely remove those subject columns.


Reorganized and restructured the data

Before: Edufy before reorganize.png
After: Edufy after reorganize.png


The original columns format of the data is not friendly for software to analyze and process it. The naming of the columns needs to change and the structure needs to change. If we were to upload the raw data to JMP Pro 13, the different columns will just appear as 'Column 65', 'Column 66', 'Column 67' for example. After we reorganized and restructured the data, it is now clearer and we can now pass the file into JMP Pro 13 to perform analysis.


Removed rows with missing data

Before: Edufy before remove rows.png
After: Edufy after remove rows.png


As we require the GCE 'O' Levels L1R4 and L1R5 score for our analysis, any rows without this field will be removed. In addition, the data consisted of a few students who retained and did not take their GCE ‘O’ Levels in the same year as his or her cohort, which resulted in missing data. As such, to prevent skewing the results, we removed these unnecessary rows that we cannot make use of.


Replace hyphens with blanks

Before: Edufy before replace hyphens.png
After: Edufy after replace hyphens.png


In JMP Pro 13, columns with hyphens will be treated as a nominal variable even though the columns is a numerical one (e.g. scores of subjects). As such, to make these columns appear as numerical variables so that we can use it to plot certain graphs, we need to replace the hyphens with blanks.


Exploratory Data Analysis

For our Exploratory Data Analysis (EDA), we did some general descriptives to better understand the data before even venturing into analyzing it. Here are some of the general descriptives that we did:

Composition of Students

Edufy eda table.png
2014: Edufy eda composition 2014.png
2015: Edufy eda composition 2015.png
2016: Edufy eda composition 2016.png


This table and the three graphs helps us understand the composition of each of the classes in each of the batches. This is so that we know what subject combinations are the students taking in each classes so that we can expect a certain score when we look at the later descriptives/analysis.

'O' Levels Performance by Subject Combination

Moving on, we checked on the 'O' Levels performance (L1R4 & L1R5) by subject combination for all 3 batches. Generally, the trend is the same, with students in 'Triple Science' performing better than students in 'Double Science' who in turn perform better than students in '1 Pure 1 Combined' who in turn perform better than students in 'Combined Science'.


2014: Edufy eda olevel subjectcombi 2014r4.png
2014:Edufy eda olevel subjectcombi 2014r5.png

Prelims & 'O' Levels Performance by Class

We also attempted to compare the Prelims and 'O' Levels performance by class to see if there is any deviation in trend. However, the general trend remains that the class with students taking the 'Triple Science' subject combination tends to do better than students in other classes taking other subject combinations.


2014: Edufy eda prelims olevel 2014r4.png
2014:Edufy eda prelims olevel 2014r4.png


Time-series Analysis

To further analyze the performance of students, we selected a few students from each of the batch with similar PSLE scores and similar overall Secondary 2 scores. We drew overlay plots using JMP Pro 13 to see the Secondary 1 to Secondary 4 scores of these students. We want to see if these students, who ended up with different subject combinations, had any variations in their performance that is above or below the expectations of them.

This example is from the 'Batch of 2014': We chose the following students as shown in this table.


Edufy timeseries table.png


Then here are the overlay plots of their results from Secondary 1 to Secondary 4, from top-left to top-right to bottom-left to bottom-right. The 'Exam' mentioned in the x-axis here refers to the CA1, SA1, CA2, SA2, Overall scores as described much earlier.


Edufy timeseries sec1.pngEdufy timeseries sec2.pngEdufy timeseries sec3.pngEdufy timeseries sec4.png
Edufy timeseries legend.png


What we can observe is that the 'Double Science' student here performed better than the 'Triple Science' student. The 'Triple Science' student performed equally well as the student taking '1 Pure 1 Combined' science subject combination. What we can draw is that students with similar PSLE scores and Secondary 2 overall scores who take different subject combinations can end up with very different scores.