Difference between revisions of "T15 Overview"

From Analytics Practicum
Jump to navigation Jump to search
 
(17 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
<!--Logo-->
 
<!--Logo-->
[[File:Ap_background.png|1300px|]]<br>
+
[[File:G15PISA_HOME.png|1300px|]]<br>
 
<!--/Logo-->
 
<!--/Logo-->
  
Line 16: Line 16:
 
| style="padding:0.3em; font-size:100%; background-color:#1a2155;  border-bottom:0px solid #1e708d; text-align:center; color:#F5F5F5" width="10%" |  
 
| style="padding:0.3em; font-size:100%; background-color:#1a2155;  border-bottom:0px solid #1e708d; text-align:center; color:#F5F5F5" width="10%" |  
 
[[T15_Overview |<font face = "Palatino Linotype" color="#FFFFFF" size=2><b>PROJECT OVERVIEW</b></font>]]
 
[[T15_Overview |<font face = "Palatino Linotype" color="#FFFFFF" size=2><b>PROJECT OVERVIEW</b></font>]]
 +
 +
| style="border-bottom:0px solid #3D9DD7; background:none;" width="1%" | &nbsp;
 +
| style="padding:0.3em; font-size:100%; background-color:#e5e5e5;  border-bottom:0px solid #3D9DD7; text-align:center; color:#1a2155" width="10%" |
 +
[[T15_Final_Delivery| <font face = "Palatino Linotype" color="#1a2155" size=2><b>FINAL DELIVERY</b></font>]]
  
 
| style="border-bottom:0px solid #3D9DD7; background:none;" width="1%" | &nbsp;
 
| style="border-bottom:0px solid #3D9DD7; background:none;" width="1%" | &nbsp;
Line 28: Line 32:
  
 
== Project Introduction ==
 
== Project Introduction ==
=== Sponsor Introduction ===
+
=== Introduction of PISA ===
<p> Khoo Teck Puat Hospital (KTPH) is a 590-bed general and acute care hospital, managed by Alexandra Health System. Alexandra Health, together with the School of Information Systems (SIS) at the Singapore Management University (SMU) have established a partnership to work together to demonstrate fresh and better ways to serve and satisfy patients whenever they are interacting with the Alexandra Health system. Through this partnership, a joint mechanism known as the “T-Lab” has been established that enables students, staff and faculty of SMU’s School of Information Systems (SIS) to team with professionals from Alexandra Health to work on a continuing series of projects to improve service delivery, quality, productivity and experience.</p>
+
<p>The Programme for International Student Assessment (PISA) is a international survey which aims to evaluate education systems worldwide by testing the skills and knowledge of 15-year-old students. To date, students representing more than 70 economies have participated in the assessment. The most recently published results are from the assessment in 2012. </p>
  
<p>This partnership also provides for academic exchanges where SIS faculty will play an active role in research studies and consultancy for KTPH. Students will be able to tap the adjunct faculty’s extensive experience for insights into the healthcare operations and service delivery. In turn, Alexandra Health staff will benefit from interacting with SMU faculty on ways to improve processes and operations. </p>
+
<p>Around 510,000 students in 65 economies took part in the PISA 2012 assessment of reading, mathematics and science representing about 28 million 15-year-olds globally. Given PISA is an ongoing triennial survey, countries and economies participating in successive surveys can compare their students' performance over time and assess the impact of education policy decisions.</p>
 +
<p>Since the year 2000, every three years, fifteen-year-old students from randomly selected schools worldwide take tests in the key subjects: reading, mathematics and science, with a focus on one subject in each year of assessment. Students take a test that lasts 2 hours. The tests are a mixture of open-ended and multiple-choice questions that are organized in groups based on a passage setting out a real-life situation. A total of about 390 minutes of test items are covered.  Students take different combinations of different tests. The students and their school principals also answer questionnaires to provide information about the students' backgrounds, schools and learning experiences and about the broader school system and learning environment.</p>
  
 
=== Project Introduction ===
 
=== Project Introduction ===
<p> The project is part of an ongoing effort to adopt data visualization to monitor public health by KTPH. Based on data from health screening and KTPH alignment programs, a dashboard can be constructed to assist health officers of KTPH to observe public health conditions, single out unhealthy individuals, and monitor their health progress. Additionally, the dashboard serves as a means to evaluate effectiveness of KTPH alignment programs to improve public health, and provide insights that can refine these programs to better target the population in future.</p>
+
<p>Our project makes use of PISA data collected during the latest survey of 2012 with regards to Singapore. The aim of project is to explore the relationship between computer use in school and secondary-school student performance in reading and mathematics. Building on the current international work done by PISA, our project brings the analysis to Singapore national level and studies various aspects of student performance relative to their access to computer in and outside of school, in order to provide insights for education policy makers of Singapore Ministry of Education (MOE). </p>
  
== Motivation ==
+
== Business Problem ==
<p>KTPH manages a huge amount of data from public health screening. These data potentially contains valuable insights about individuals’ health relative to their lifestyles and medical background, but knowledge is not being extracted effectively from these data. At the same time, KTPH initiates many programs to promote public health, but there is room for improvement, especially when there is a lack of a data-driven method to make decisions during the execution process. Last but not least, KTPH currently has no means to measure the penetration rate of their past alignment programs, and likewise, no means to fine-tune future programs to achieve higher penetration rate.  
+
<p>The Ministry of Education (MOE) of Singapore collects and analyses data from schools island wide to continually improve policies and practices in Education. However, most of this data are not publicly available for research and analysis by those outside the Ministry. Hence, the sponsor seeks to gain insights about education in Singapore from the publicly available data collected by the OECD through their “Programme for International Student Assessment” (PISA) survey. The PISA is a triennial international survey which aims to evaluate education systems worldwide by testing the skills and knowledge of 15-year-old students. The most recently published results are from the assessment in 2012.</p>
Data visualization is thus adopted by KTPH in collaboration with T-Lab as an ongoing effort to derive insights from their data-rich operations.</p>
 
  
 
== Project Objectives ==
 
== Project Objectives ==
<p>This project is a follow-up of an IS480 project by team Cinquefoil. Our aim is to improve the KTPH dashboard by adopting a richer set of visualization techniques, so as to enable a more user-centric data querying and discovery process. KTPH users will be able to use the dashboard to identify unhealthy individuals of the population, the areas they are in, take appropriate actions and monitor the results of such actions. </p>
+
=== Business Objectives ===
<p>As such, the objectives of our analytics project consist of the following: </p>
+
<p>
* To visualize effectively the current health condition of the public across various regions of Singapore
+
The project aims to investigate effect of various personal and environmental factors on the cognitive abilities of secondary school students, such as their family background and academic environment. The key findings can be served as insights for policy makers and educators to improve current practices to help improve the learning environment and process.</p>
* To allow health officers to track the health progress of individual at risks
+
 
* To assist health officers in monitoring the penetration rate of KTPH alignment programs targeted at the general public
+
=== Analytical Problems ===
* To allow users to interact with visualizations, thereby forming their own query and arriving at their own findings
+
To achieve the above business object, the following questions need to be addressed
 +
* In terms of school resource allocation:
 +
# How are resources in schools allocated to different aspects, such as teachers, computers, network, learning facilities and activities?
 +
# What is the outlook of student-teacher relationship and how might it affect student’s emotional well-being at school?
 +
# What are the prominent problems schools in Singapore are facing and their relationship with student academic performance?
 +
# Does student-teacher ratio matter to student well-being and teacher morale? Overall does it affect student performance in school?
 +
* In terms of student profile
 +
# Is student performance correlated with their socio-economic status and parents’ education level?
 +
# Is student performance affected by emotional well-being or vice-versa?
 +
# What lifestyle factors might contribute to students’ emotional well-being in school?
  
 
== Data ==
 
== Data ==
<p>The data is provided by KTPH Health Population team, consisting of 6,744 patient records with the following attributes:</p>
+
=== Data Preparation ===
'''Demographics'''
+
<p>Text data is retrieved from PISA2012 database (https://pisa2012.acer.edu.au/downloads.php). The data provided comprises of results of 5 questionnaire: student, school, parent, cognitive item and score cognitive item. Of which, only Singapore data is of interest.</p>
* Gender
+
<p>Format of text data is not readily readable by SAS or any analytics tool. Each record is stored as a long sequence of characters, where a fixed length of character would represent a value. To convert the data into format that SAS Enterprise Guide can read, a program is created in SAS to read the character sequences, separate them into smaller parts and assign to appropriate attributes.
* Age/Age group
+
</p>
* Race
+
<p>After which, only data for Singapore is selected and exported to a separate set of tables for further analysis.</p>
* Education level
 
* Occupation
 
* Home address
 
'''Health measurements'''
 
* Weight
 
* Height
 
* waist
 
* BMI
 
* Glucose measure
 
* Cholesterol level
 
* Blood pressure
 
* Systolic
 
* Diastolic
 
* Instances of strokes, heart attacks, diabetes
 
* Other health measurements
 
'''Lifestyle'''
 
* Smoking habit
 
* Stress level
 
* Exercise
 
* Diet
 
'''Intervention records'''
 
* Nurse intervention
 
* Doctor outcome
 
* Doctor revisit
 
* Follow up at clinics
 
'''Sample dataset'''
 
  
== Methodology ==
+
=== Tables of data to be used in the analysis ===
=== Technology ===
+
* Student questionnaire data (stu): this table contains student demographic information, parents’ education, interest in school subjects, engagement with activities outside school, access to information communication technology (ICT), familiarity with academics concept and sense of belonging in school.
<p>As KTPH prefers a versatile tool that Health Population team can just use without the need for complex setup and installation, d3.js was used to develop a web application in Apache server. D3.js is a JavaScript library for developing visualizations on the web. D3.js is coded in Javascript and use SVG objects for visualization, which allows for more flexibility. SVG objects are also scalable and support visualization on mobile devices. It is convenient as a JavaScript library can run on all modern browsers without users having to install additional software.</p>
+
* School questionnaire data (sch): this table contains information about school sources of funding, size of student population, staff headcounts, availability of ICT for student and teacher use, shortage of resources (if any), co-curricular activities, parent participation, teacher morale and learning hindrance (if any).
<p>In addition to that, we propose to explore dc.js library which is a closely related tool to d3.js. Dc.js allows effective cross-filtering across different charts and has improved performance compared to d3.js. This addition will boost the story-telling capability of the current dashboard and allow users to formulate their own queries in the process of data discovery.</p>
+
* Score cognitive item data (cogs): this table records the score of students subjected to PISA tests in Reading, Mathematics and Science. Since student grades are not available to us as far as the scope is concerned, we will use this table as a measure for student academic performance
 +
 
 +
=== Integrating tables to support more queries ===
 +
* SchoolID and StudentID can be used to link the 3 tables above. This integration is crucial in discovering possible correlation between various aspects of school environment and student performances, which offers valuable insights to education policy makers such as MOE and school management.
  
=== Visualization ===
+
=== Preliminary Findings ===
==== Treemap ====
+
* Public secondary schools in Singapore receive different degrees of fundings from Singapore government
<p>Treemap is a powerful tool to simultaneously show the big picture, comparison of related items and allow navigation to the details. One important aspect of healthcare visual analytics is the ability to drill-down to details for further investigation. Using treemap to show the health indicators as the example below can provide a bird-eye’s view for users, such that they can observe patterns among the indicators before drilling down to study the details. This technique will be used in Screening Result module.</p>
+
* The number of scoring students differs significantly across secondary schools
<center> [[file: treemap.png|500px]] </center>
+
* Secondary schools in Singapore does not allocate resources efficiently by PISA standards, with a mean index of -0.36797 (range is -0.80 to 9999.0). There is also a slight negative correlation between percentage funding given by government and index of school responsibility for resource allocation.
  
==== Parallel Coordinates ====
+
[[File:Finding1.png|800px]]
<p>This technique can be used to analyze multiple clinical variables. Each axis represents one numerical clinical variable (eg. BMI, cholesterol level, systolic and diastolic levels). Users can look at the lines and quickly spot the sample line that is outside the normal range. A separate line representing national average could be used as a benchmark; alternatively, expert-defined healthy level for each indicators could also be used. This technique should be used  in the intermediate level of drill-down so that the number of lines does not get too large and clutter the chart.</p>
+
<p>From Fig 1 above, we observe that there seems to be a significant discrepancy in government funding provided to the public schools. This is a potentially useful observation, as differences in funding will affect the amount of resources available to the schools and hence contributes to different learning environments and opportunities available to students. Student performance may be affected by such difference in allocation of education resources. </p>
==== Chord Visualization ====
+
<br/>
<p>This chart is to study the association between clinical variables. More often than not, clinical variables are likely to have some relationship with one another, for example, a patient with overweight level of BMI is more likely to have high cholesterol level and higher risk of diabetes. Chord visualization allows data exploration that reveals such a pattern, and potentially helps to identify individuals at risk of diseases like diabetes based on their other health indicators. </p>
+
[[File:Finding2.png|800px]]
==== Funnel Plot ====
 
<p>Funnel plot is essentially a scatter plot with 2 sets of boundary lines: one set for 95% confidence and one for 99.8% confidence. The points that lie outside the boundaries will be highlighted as non-random variations that are extremely rare and should be examined more closely, compared to points that lie inside the boundaries that are random variations that happen by chance. In our case, the data points will represent households, x-axis is %population above a certain age and y-axis is %population above a certain age that responds to alignment program. Thus this chart can show penetration rate of KTPH health initiatives to improve public health.</p>
 
==== Geospatial Intelligence ====
 
<p>The current version does not show the percentage of households participating in KTPH health initiatives; instead it shows the absolute number of households reached out. We will modify the current OpenStreetMap view of the module to reflect the percentage and penetration rate by regions.</p>
 
  
== Scope of Work==
+
From Fig 2, we observe that scores of students differ significantly across secondary schools. In addition, secondary schools in Singapore do not allocate resources efficiently by PISA standards, with a mean index of -0.36797 (range is -0.80 to 9999.0). There is also a slight negative correlation between percentage funding given by government and index of school responsibility for resource allocation.
<p>The visualization should allow KTPH users in Health Population team to see an overview of public health condition, based on screening results, and then drill down to region and patient group level to further investigate the various factors that contribute to the status quo. Users can also examine the possible correlations between said factors.</p>
 
The components to be examined and improved thus are:<br>
 
'''Screening Result Module'''
 
* Stratification & Visual Presentation of Health Screening Results
 
  
'''Health Classification Module'''
+
== Methodology ==
* Health Classification
+
=== Frequency Analysis ===
* Risk Analysis for Disease
+
* To examine resources given to different secondary schools in Singapore
* Summary of Unhealthy Screening Results
+
* To understand the purposes for which school resources are used
  
'''Geospatial Intelligence Module'''
+
=== Correlation Analysis ===
* Public Health Screening Penetration Rate
+
* To examine the extent that resource availability in schools affect student performance/ To evaluate if problems of shortages in school (teaching staff, facilities) are associated with low performance
* Public Health Status Ratio
+
* To examine if family background and socio-economic status are linked to student performance
 +
* Is emotional well-being necessarily associated with good performance?
  
'''Repeat Analysis Module (secondary)'''
+
=== Tools ===
* Flow Analysis of Population Health Screening Results
+
Main tool for analysis is SAS Enterprise Guide. In particular, the project makes use of features such as Data exploration, Correlation Analysis, Frequency Analysis, Regression, Sorting and Filtering, Query Builder, Table Analysis and Graph functions.
* Trend Analysis of Key Health Indicators
 
  
'''Patient Journey Module (secondary)'''
+
== Scope of Work==
* Individual Resident Progress View
+
<p>The scope of this project is largely determined by the available data from the PISA2012 survey, specifically for Singapore. The survey takes sample from 172 secondary schools in Singapore, each with randomly selected 35-40 students. Data collected during the survey includes family background, parents education, student possession at home, school funding, staff headcounts and profile, facilities and prominent issues (truancy and shortage of resources). A 2-hour test was conducted to measure students’ competency in Mathematics, Science and Reading. Overall, the PISA2012 survey results serve as a rich source of data to conduct our analysis for the education landscape in Singapore.
* Temporal Event Sequence Analysis
+
</p>
 +
<p>At the end of our project, we aim to deliver a storyboard of all key findings from the PISA2012 data and possible recommendations for improved school resources management and practices based on insights from our analysis.
 +
</p>
  
 
== References ==
 
== References ==
Reddy, C. (n.d.). Introduction to Visual Analytics and Medical Data Visualization. In Healthcare data analytics <br>
+
http://www.oecd.org/pisa/aboutpisa/
Rowell, K. (2013, September 6). Category Archives: Design Basics. Retrieved January 10, 2015, from http://www.healthdataviz.com/category/design-basics/  
 
(n.d.). Retrieved from http://vizhub.healthdata.org/gbd-compare/england
 

Latest revision as of 11:05, 14 April 2016

G15PISA HOME.png

HOME

 

ABOUT US

 

PROJECT OVERVIEW

 

FINAL DELIVERY

 

PROJECT MANAGEMENT

 

DOCUMENTATION


Project Introduction

Introduction of PISA

The Programme for International Student Assessment (PISA) is a international survey which aims to evaluate education systems worldwide by testing the skills and knowledge of 15-year-old students. To date, students representing more than 70 economies have participated in the assessment. The most recently published results are from the assessment in 2012.

Around 510,000 students in 65 economies took part in the PISA 2012 assessment of reading, mathematics and science representing about 28 million 15-year-olds globally. Given PISA is an ongoing triennial survey, countries and economies participating in successive surveys can compare their students' performance over time and assess the impact of education policy decisions.

Since the year 2000, every three years, fifteen-year-old students from randomly selected schools worldwide take tests in the key subjects: reading, mathematics and science, with a focus on one subject in each year of assessment. Students take a test that lasts 2 hours. The tests are a mixture of open-ended and multiple-choice questions that are organized in groups based on a passage setting out a real-life situation. A total of about 390 minutes of test items are covered. Students take different combinations of different tests. The students and their school principals also answer questionnaires to provide information about the students' backgrounds, schools and learning experiences and about the broader school system and learning environment.

Project Introduction

Our project makes use of PISA data collected during the latest survey of 2012 with regards to Singapore. The aim of project is to explore the relationship between computer use in school and secondary-school student performance in reading and mathematics. Building on the current international work done by PISA, our project brings the analysis to Singapore national level and studies various aspects of student performance relative to their access to computer in and outside of school, in order to provide insights for education policy makers of Singapore Ministry of Education (MOE).

Business Problem

The Ministry of Education (MOE) of Singapore collects and analyses data from schools island wide to continually improve policies and practices in Education. However, most of this data are not publicly available for research and analysis by those outside the Ministry. Hence, the sponsor seeks to gain insights about education in Singapore from the publicly available data collected by the OECD through their “Programme for International Student Assessment” (PISA) survey. The PISA is a triennial international survey which aims to evaluate education systems worldwide by testing the skills and knowledge of 15-year-old students. The most recently published results are from the assessment in 2012.

Project Objectives

Business Objectives

The project aims to investigate effect of various personal and environmental factors on the cognitive abilities of secondary school students, such as their family background and academic environment. The key findings can be served as insights for policy makers and educators to improve current practices to help improve the learning environment and process.

Analytical Problems

To achieve the above business object, the following questions need to be addressed

  • In terms of school resource allocation:
  1. How are resources in schools allocated to different aspects, such as teachers, computers, network, learning facilities and activities?
  2. What is the outlook of student-teacher relationship and how might it affect student’s emotional well-being at school?
  3. What are the prominent problems schools in Singapore are facing and their relationship with student academic performance?
  4. Does student-teacher ratio matter to student well-being and teacher morale? Overall does it affect student performance in school?
  • In terms of student profile
  1. Is student performance correlated with their socio-economic status and parents’ education level?
  2. Is student performance affected by emotional well-being or vice-versa?
  3. What lifestyle factors might contribute to students’ emotional well-being in school?

Data

Data Preparation

Text data is retrieved from PISA2012 database (https://pisa2012.acer.edu.au/downloads.php). The data provided comprises of results of 5 questionnaire: student, school, parent, cognitive item and score cognitive item. Of which, only Singapore data is of interest.

Format of text data is not readily readable by SAS or any analytics tool. Each record is stored as a long sequence of characters, where a fixed length of character would represent a value. To convert the data into format that SAS Enterprise Guide can read, a program is created in SAS to read the character sequences, separate them into smaller parts and assign to appropriate attributes.

After which, only data for Singapore is selected and exported to a separate set of tables for further analysis.

Tables of data to be used in the analysis

  • Student questionnaire data (stu): this table contains student demographic information, parents’ education, interest in school subjects, engagement with activities outside school, access to information communication technology (ICT), familiarity with academics concept and sense of belonging in school.
  • School questionnaire data (sch): this table contains information about school sources of funding, size of student population, staff headcounts, availability of ICT for student and teacher use, shortage of resources (if any), co-curricular activities, parent participation, teacher morale and learning hindrance (if any).
  • Score cognitive item data (cogs): this table records the score of students subjected to PISA tests in Reading, Mathematics and Science. Since student grades are not available to us as far as the scope is concerned, we will use this table as a measure for student academic performance

Integrating tables to support more queries

  • SchoolID and StudentID can be used to link the 3 tables above. This integration is crucial in discovering possible correlation between various aspects of school environment and student performances, which offers valuable insights to education policy makers such as MOE and school management.

Preliminary Findings

  • Public secondary schools in Singapore receive different degrees of fundings from Singapore government
  • The number of scoring students differs significantly across secondary schools
  • Secondary schools in Singapore does not allocate resources efficiently by PISA standards, with a mean index of -0.36797 (range is -0.80 to 9999.0). There is also a slight negative correlation between percentage funding given by government and index of school responsibility for resource allocation.

Finding1.png

From Fig 1 above, we observe that there seems to be a significant discrepancy in government funding provided to the public schools. This is a potentially useful observation, as differences in funding will affect the amount of resources available to the schools and hence contributes to different learning environments and opportunities available to students. Student performance may be affected by such difference in allocation of education resources.


Finding2.png

From Fig 2, we observe that scores of students differ significantly across secondary schools. In addition, secondary schools in Singapore do not allocate resources efficiently by PISA standards, with a mean index of -0.36797 (range is -0.80 to 9999.0). There is also a slight negative correlation between percentage funding given by government and index of school responsibility for resource allocation.

Methodology

Frequency Analysis

  • To examine resources given to different secondary schools in Singapore
  • To understand the purposes for which school resources are used

Correlation Analysis

  • To examine the extent that resource availability in schools affect student performance/ To evaluate if problems of shortages in school (teaching staff, facilities) are associated with low performance
  • To examine if family background and socio-economic status are linked to student performance
  • Is emotional well-being necessarily associated with good performance?

Tools

Main tool for analysis is SAS Enterprise Guide. In particular, the project makes use of features such as Data exploration, Correlation Analysis, Frequency Analysis, Regression, Sorting and Filtering, Query Builder, Table Analysis and Graph functions.

Scope of Work

The scope of this project is largely determined by the available data from the PISA2012 survey, specifically for Singapore. The survey takes sample from 172 secondary schools in Singapore, each with randomly selected 35-40 students. Data collected during the survey includes family background, parents education, student possession at home, school funding, staff headcounts and profile, facilities and prominent issues (truancy and shortage of resources). A 2-hour test was conducted to measure students’ competency in Mathematics, Science and Reading. Overall, the PISA2012 survey results serve as a rich source of data to conduct our analysis for the education landscape in Singapore.

At the end of our project, we aim to deliver a storyboard of all key findings from the PISA2012 data and possible recommendations for improved school resources management and practices based on insights from our analysis.

References

http://www.oecd.org/pisa/aboutpisa/