Difference between revisions of "T15 Overview"

From Analytics Practicum
Jump to navigation Jump to search
Line 78: Line 78:
  
 
== Methodology ==
 
== Methodology ==
=== Technology ===
+
=== Frequency Analysis ===
<p>As KTPH prefers a versatile tool that Health Population team can just use without the need for complex setup and installation, d3.js was used to develop a web application in Apache server. D3.js is a JavaScript library for developing visualizations on the web. D3.js is coded in Javascript and use SVG objects for visualization, which allows for more flexibility. SVG objects are also scalable and support visualization on mobile devices. It is convenient as a JavaScript library can run on all modern browsers without users having to install additional software.</p>
+
* To examine resources given to different secondary schools in Singapore
<p>In addition to that, we propose to explore dc.js library which is a closely related tool to d3.js. Dc.js allows effective cross-filtering across different charts and has improved performance compared to d3.js. This addition will boost the story-telling capability of the current dashboard and allow users to formulate their own queries in the process of data discovery.</p>
+
* To understand the purposes for which school resources are used
  
=== Visualization ===
+
=== Correlation Analysis ===
==== Treemap ====
+
* To examine the extent that resource availability in schools affect student performance/ To evaluate if problems of shortages in school (teaching staff, facilities) are associated with low performance
<p>Treemap is a powerful tool to simultaneously show the big picture, comparison of related items and allow navigation to the details. One important aspect of healthcare visual analytics is the ability to drill-down to details for further investigation. Using treemap to show the health indicators as the example below can provide a bird-eye’s view for users, such that they can observe patterns among the indicators before drilling down to study the details. This technique will be used in Screening Result module.</p>
+
* To examine if family background and socio-economic status are linked to student performance
<br>
+
* Is emotional well-being necessarily associated with good performance?
<center> [[file: treemap.png|500px]] </center>
 
  
==== Parallel Coordinates ====
+
=== Tools ===
<p>This technique can be used to analyze multiple clinical variables. Each axis represents one numerical clinical variable (eg. BMI, cholesterol level, systolic and diastolic levels). Users can look at the lines and quickly spot the sample line that is outside the normal range. A separate line representing national average could be used as a benchmark; alternatively, expert-defined healthy level for each indicators could also be used. This technique should be used  in the intermediate level of drill-down so that the number of lines does not get too large and clutter the chart.</p>
+
Main tool for analysis is SAS Enterprise Guide. In particular, the project makes use of features such as Data exploration, Correlation Analysis, Frequency Analysis, Regression, Sorting and Filtering, Query Builder, Table Analysis and Graph functions.
<br>
 
<center>[[file: parallel_coordinates.png|500px]]</center>
 
 
 
==== Chord Visualization ====
 
<p>This chart is to study the association between clinical variables. More often than not, clinical variables are likely to have some relationship with one another, for example, a patient with overweight level of BMI is more likely to have high cholesterol level and higher risk of diabetes. Chord visualization allows data exploration that reveals such a pattern, and potentially helps to identify individuals at risk of diseases like diabetes based on their other health indicators. </p>
 
<br>
 
<center>[[file: chord_visualisation.png|300px]]</center>
 
 
 
==== Funnel Plot ====
 
<p>Funnel plot is essentially a scatter plot with 2 sets of boundary lines: one set for 95% confidence and one for 99.8% confidence. The points that lie outside the boundaries will be highlighted as non-random variations that are extremely rare and should be examined more closely, compared to points that lie inside the boundaries that are random variations that happen by chance. In our case, the data points will represent households, x-axis is %population above a certain age and y-axis is %population above a certain age that responds to alignment program. Thus this chart can show penetration rate of KTPH health initiatives to improve public health.</p>
 
<br>
 
<center>[[file:funnel_plot.png|500px]]</center>
 
 
 
==== Geospatial Intelligence ====
 
<p>The current version does not show the percentage of households participating in KTPH health initiatives; instead it shows the absolute number of households reached out. We will modify the current OpenStreetMap view of the module to reflect the percentage and penetration rate by regions.</p>
 
  
 
== Scope of Work==
 
== Scope of Work==

Revision as of 22:49, 28 February 2016

G15PISA HOME.png

HOME

 

ABOUT US

 

PROJECT OVERVIEW

 

PROJECT MANAGEMENT

 

DOCUMENTATION


Project Introduction

Introduction of PISA

The Programme for International Student Assessment (PISA) is a international survey which aims to evaluate education systems worldwide by testing the skills and knowledge of 15-year-old students. To date, students representing more than 70 economies have participated in the assessment. The most recently published results are from the assessment in 2012.

Around 510,000 students in 65 economies took part in the PISA 2012 assessment of reading, mathematics and science representing about 28 million 15-year-olds globally. Given PISA is an ongoing triennial survey, countries and economies participating in successive surveys can compare their students' performance over time and assess the impact of education policy decisions.

Since the year 2000, every three years, fifteen-year-old students from randomly selected schools worldwide take tests in the key subjects: reading, mathematics and science, with a focus on one subject in each year of assessment. Students take a test that lasts 2 hours. The tests are a mixture of open-ended and multiple-choice questions that are organized in groups based on a passage setting out a real-life situation. A total of about 390 minutes of test items are covered. Students take different combinations of different tests. The students and their school principals also answer questionnaires to provide information about the students' backgrounds, schools and learning experiences and about the broader school system and learning environment.

Project Introduction

Our project makes use of PISA data collected during the latest survey of 2012 with regards to Singapore. The aim of project is to explore the relationship between computer use in school and secondary-school student performance in reading and mathematics. Building on the current international work done by PISA, our project brings the analysis to Singapore national level and studies various aspects of student performance relative to their access to computer in and outside of school, in order to provide insights for education policy makers of Singapore Ministry of Education (MOE).

Business Problem

The Ministry of Education (MOE) of Singapore collects and analyses data from schools island wide to continually improve policies and practices in Education. However, most of this data are not publicly available for research and analysis by those outside the Ministry. Hence, the sponsor seeks to gain insights about education in Singapore from the publicly available data collected by the OECD through their “Programme for International Student Assessment” (PISA) survey. The PISA is a triennial international survey which aims to evaluate education systems worldwide by testing the skills and knowledge of 15-year-old students. The most recently published results are from the assessment in 2012.

Project Objectives

Business Objectives

The project aims to investigate effect of various personal and environmental factors on the cognitive abilities of secondary school students, such as their family background and academic environment. The key findings can be served as insights for policy makers and educators to improve current practices to help improve the learning environment and process.

Analytical Problems

To achieve the above business object, the following questions need to be addressed

  • In terms of school resource allocation:
  1. How are resources in schools allocated to different aspects, such as teachers, computers, network, learning facilities and activities?
  2. What is the outlook of student-teacher relationship and how might it affect student’s emotional well-being at school?
  3. What are the prominent problems schools in Singapore are facing and their relationship with student academic performance?
  4. Does student-teacher ratio matter to student well-being and teacher morale? Overall does it affect student performance in school?
  • In terms of student profile
  1. Is student performance correlated with their socio-economic status and parents’ education level?
  2. Is student performance affected by emotional well-being or vice-versa?
  3. What lifestyle factors might contribute to students’ emotional well-being in school?

Data

Data Preparation

Text data is retrieved from PISA2012 database (https://pisa2012.acer.edu.au/downloads.php). The data provided comprises of results of 5 questionnaire: student, school, parent, cognitive item and score cognitive item. Of which, only Singapore data is of interest.

Format of text data is not readily readable by SAS or any analytics tool. Each record is stored as a long sequence of characters, where a fixed length of character would represent a value. To convert the data into format that SAS Enterprise Guide can read, a program is created in SAS to read the character sequences, separate them into smaller parts and assign to appropriate attributes.

After which, only data for Singapore is selected and exported to a separate set of tables for further analysis.

Tables of data to be used in the analysis

  • Student questionnaire data (stu): this table contains student demographic information, parents’ education, interest in school subjects, engagement with activities outside school, access to information communication technology (ICT), familiarity with academics concept and sense of belonging in school.
  • School questionnaire data (sch): this table contains information about school sources of funding, size of student population, staff headcounts, availability of ICT for student and teacher use, shortage of resources (if any), co-curricular activities, parent participation, teacher morale and learning hindrance (if any).
  • Score cognitive item data (cogs): this table records the score of students subjected to PISA tests in Reading, Mathematics and Science. Since student grades are not available to us as far as the scope is concerned, we will use this table as a measure for student academic performance

Integrating tables to support more queries

  • SchoolID and StudentID can be used to link the 3 tables above. This integration is crucial in discovering possible correlation between various aspects of school environment and student performances, which offers valuable insights to education policy makers such as MOE and school management.

Preliminary Findings

  • Public secondary schools in Singapore receive different degrees of fundings from Singapore government
  • The number of scoring students differs significantly across secondary schools
  • Secondary schools in Singapore does not allocate resources efficiently by PISA standards, with a mean index of -0.36797 (range is -0.80 to 9999.0). There is also a slight negative correlation between percentage funding given by government and index of school responsibility for resource allocation.

Methodology

Frequency Analysis

  • To examine resources given to different secondary schools in Singapore
  • To understand the purposes for which school resources are used

Correlation Analysis

  • To examine the extent that resource availability in schools affect student performance/ To evaluate if problems of shortages in school (teaching staff, facilities) are associated with low performance
  • To examine if family background and socio-economic status are linked to student performance
  • Is emotional well-being necessarily associated with good performance?

Tools

Main tool for analysis is SAS Enterprise Guide. In particular, the project makes use of features such as Data exploration, Correlation Analysis, Frequency Analysis, Regression, Sorting and Filtering, Query Builder, Table Analysis and Graph functions.

Scope of Work

The visualization should allow KTPH users in Health Population team to see an overview of public health condition, based on screening results, and then drill down to region and patient group level to further investigate the various factors that contribute to the status quo. Users can also examine the possible correlations between said factors.

The components to be examined and improved thus are:
Screening Result Module

  • Stratification & Visual Presentation of Health Screening Results

Health Classification Module

  • Health Classification
  • Risk Analysis for Disease
  • Summary of Unhealthy Screening Results

Geospatial Intelligence Module

  • Public Health Screening Penetration Rate
  • Public Health Status Ratio

Repeat Analysis Module (secondary)

  • Flow Analysis of Population Health Screening Results
  • Trend Analysis of Key Health Indicators

Patient Journey Module (secondary)

  • Individual Resident Progress View
  • Temporal Event Sequence Analysis

References

http://www.oecd.org/pisa/aboutpisa/