Difference between revisions of "IS428 2016-17 Term1 Assign2 Zheng Xiye"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 17: Line 17:
 
===Data Preprocessing===
 
===Data Preprocessing===
 
====1. Data Selection====
 
====1. Data Selection====
 +
Out of 48 data variables given, only those relevant to the context of this data exploration process are selected and grouped into 4 distinct data categories based on its nature.
 
{| class="wikitable" style="background-color:#FFFFFF; width: 1000px;" align="center"
 
{| class="wikitable" style="background-color:#FFFFFF; width: 1000px;" align="center"
 
|-
 
|-
Line 27: Line 28:
 
|Accident Time Data
 
|Accident Time Data
 
|
 
|
#Reported Date
+
*Reported Date
#Accident Date
+
*Accident Date
  
 
|-
 
|-
Line 34: Line 35:
 
|Accident Industry Data
 
|Accident Industry Data
 
|
 
|
#Major Industry
+
*Major Industry
#Sub Industry
+
*Sub Industry
  
 
|-
 
|-
Line 41: Line 42:
 
|Accident Type Data
 
|Accident Type Data
 
|
 
|
#Accident Type Category
+
*Accident Type Category
  
 
|-
 
|-
Line 47: Line 48:
 
|Accident Victims Data
 
|Accident Victims Data
 
|
 
|
#Victims' Gender
+
*Victims' Gender
#Victims' Age
+
*Victims' Age
#Occupation
+
*Occupation
#Injuried (Overtime)
+
*Injured (Overtime)
#Injuried (Official Work Duties)
+
*Injured (Official Work Duties)
 
|}
 
|}
  
 +
====2. Lowercase Formatting - Sub Industry & Occupation ====
 +
After browsing through the data, I noticed that both Sub Industry and Occupation has data variables of different cases, resulting in 'double-counting' due to duplication. As such, I have made use of JMP Pro's formula: Character - Lowercase to standardize casing of all data variables in both columns. This is achieved by:
 +
*Importing data to JMP Pro and creating a new column - 'Sub Industry LC'.
 +
*Right-click at the top of 'Sub Industry LC' and choose 'formula'.
 +
*In the 'formula' panel, click on 'Character' followed by 'Lower Case' and lastly, 'Sub Industry' as the parameter of 'Lowercase' function.
 +
*The same process is repeated to convert all 'Occupation' data variables to lower case.
 +
[[File: Lowercase Pre-processing.JPG]]
 +
 +
====
  
 
==Conclusion==
 
==Conclusion==

Revision as of 17:36, 25 September 2016

Theme of Interest

Workplace security and safety has always been one of the essential areas of consideration in Singapore when structuring government policies. Singapore government's high concerns over its workforce's physical well-being has not only reassured its people of conducive working environment but also attracted talents from nearby countries to maintain the sustainable growth of Singapore's economy. However, SMRT Track Accident on 22nd March earlier this year, has sparked off another round of discussion centering around the extend of safety measures that should be implemented in Singapore's workplace. Disputes from which went beyond the context of SMRT Operations Safety Protocols to every aspects of measures, necessary in preventing workplace injuries. On top of which, according to WSHI National Statistics Report 2014, although the number of workplace fatal injury cases has decreased from 73 to 60, number of major and minor injuries in 2014 remain as high as 2013 and going on an increasing trend to 672 and 12,863 respectively from 640 and 11,740. From this perspective, exploring potential correlations between various factors and injury rate may be beneficial in structuring preventive and mitigation actions accordingly.
Workplace Stats.JPG

Analytical & Investigation Questions

Questions listed below are potential correlations hoping to be substantiated with concrete data exploration outputs:

  1. Which periods of the year are having the highest work injuries rate?
  2. Which sub-industries did most of the work injuries fall under?
  3. Which groups of victims with certain characteristics combination are experiencing largest number of work injuries?

Tools

  1. Tableau Desktop
  2. JMP Pro 12
  3. Microsoft Excel

Approaches

Data Preprocessing

1. Data Selection

Out of 48 data variables given, only those relevant to the context of this data exploration process are selected and grouped into 4 distinct data categories based on its nature.

S/D Data Categories Data Variables
I. Accident Time Data
  • Reported Date
  • Accident Date
II. Accident Industry Data
  • Major Industry
  • Sub Industry
III. Accident Type Data
  • Accident Type Category
IV. Accident Victims Data
  • Victims' Gender
  • Victims' Age
  • Occupation
  • Injured (Overtime)
  • Injured (Official Work Duties)

2. Lowercase Formatting - Sub Industry & Occupation

After browsing through the data, I noticed that both Sub Industry and Occupation has data variables of different cases, resulting in 'double-counting' due to duplication. As such, I have made use of JMP Pro's formula: Character - Lowercase to standardize casing of all data variables in both columns. This is achieved by:

  • Importing data to JMP Pro and creating a new column - 'Sub Industry LC'.
  • Right-click at the top of 'Sub Industry LC' and choose 'formula'.
  • In the 'formula' panel, click on 'Character' followed by 'Lower Case' and lastly, 'Sub Industry' as the parameter of 'Lowercase' function.
  • The same process is repeated to convert all 'Occupation' data variables to lower case.

Lowercase Pre-processing.JPG

==

Conclusion

Reference

WSHI National Statistics Report 2014 SMRT Track Accident

Comments