T15 Final Delivery

From Analytics Practicum
Jump to navigation Jump to search

G15PISA HOME.png

HOME

 

ABOUT US

 

PROJECT OVERVIEW

 

FINAL DELIVERY

 

PROJECT MANAGEMENT

 

DOCUMENTATION


Dataset

Data Retrieval

The data used in this project is questionnaire result from the latest PISA survey in 2012. All raw data files are publicly available on PISA website (https://pisa2012.acer.edu.au/downloads.php). However, the raw data is in flat file text format, where a fix number of characters represent a value (eg. first 3 letters indicate country code), as follows:

3 1.1.png

The raw data in this form is not ready for cleaning and analysis. PISA database has scripts to convert the raw text data into table forms.

  • Download raw questionnaire results (zipped text files) from PISA 2012 website and extract
  • Retrieve SAS programs for appropriate data files
  • Open the SAS scripts in SAS Enterprise Guide
  • Ensure that the path to raw text files are correct

3.1.2.png

  • Run the programs in SAS Enterprise Guide to get output SAS data table
  • Export the output SAS data table in desired formats (.sas7bdat, .csv and so on); display labels as column names for easy interpretation later on.

Data Extraction

Only Singapore data is of interest for our scope of project, therefore only the records with Country code ‘SGP’ are extracted. This process gives us the following for analysis:

  • School survey results - 172 secondary schools
  • Student survey results - 5,546 records, approx. 35 students per school
  • Student test score in Math, Science, Reading, Computer-based assessment

Each of the 3 mentioned data tables have a rich set of attributes. The summary below shows an overview of aspects that the data covers.
3.2.1.png

Data Preparation

Methodology

Framework of analysis

Techniques of analysis and variable selection

K-means clustering Partition analysis for school profiling Constructing regression model

School-level Findings

Exploratory Data Analysis

Profiling high performance and low performance schools

Multiple regression model

Student-level Findings

Exploratory Data Analysis

Multiple regression model

Discussion

General recommendation

Gaps identified in data for future research efforts

Conclusion