Difference between revisions of "IS428 2016-17 Term1 Assign2 Lim Ting Zhi"

From Visual Analytics for Business Intelligence
Jump to navigation Jump to search
Line 101: Line 101:
  
 
== Visualisation Process ==
 
== Visualisation Process ==
 +
<font size=3><h3>Schedule Metrics</h3></font>
  
 +
'''1.How did the workers get injured?'''
 +
2. Where did the workers get injured at? What are the injuries?
 +
3. What is the profile of the worker?
 +
4. When did the workers get injured?
 +
 +
'''2. Check through missing data patterns'''
 +
{| class="wikitable" style="text-align:left; background: white; margin: 0px; width: 100%"
 +
|-
 +
|[[File:LTZ_VA2_DP_1_Missing_Data.PNG|500px|center]]
 +
|There is a total of 33 rows that have some missing data. The data attributes that have missing data are Supposed Start Work Timing on Day Injured, Victim's Employment Start Date and Months Worked. As the missing data is relatively small, I had chosen to ignore them.
 +
|}
  
 
== Final Visualisation ==
 
== Final Visualisation ==

Revision as of 06:05, 26 September 2016

Theme of Interest

Regardless of the organisation or industry, having a safe work environment is very important. Any injury could cause a major issue to the victim’s family, co-workers or the organisation itself. Other than the mental and physical suffering, the financial burden could be placed on other family members. For the organisation, having a safe environment would lessen unnecessary costs. If an employee is injured, the organisation’s productivity will decline. Other co-workers' workload will increase to cover for the victim. Safety also improves the quality of work produced as employees would be focusing on their work than their safety. Therefore, the following analysis would be to find out the factors that contribute to workplace injuries by using the dataset provided, Workplace Injuries Data 2014. So that related organisations or industries could keep in mind to reduce the number of workplace injuries occurred.

Questions for Investigation

  1. How did the workers get injured?
  2. Where did the workers get injured at?
  3. What are the injuries?
  4. What is the profile of the worker?
  5. When did the workers get injured?

Tools Utilized

The following tools were utilized during the analysis:

  1. Microsoft Excel 2016
  2. JMP Pro 12 – Data preparation
  3. Tableau 10.0 – Create visualizations and interactive dashboards, and publish final visualization online

Data Attributes Used

  1. Accident Agency Level 1 Desc
  2. Accident Agency Level 2 Desc
  3. Accident Day
  4. Accident Time
  5. Accident Type level 2 Category
  6. Accident Type Level 2 Desc
  7. Accident Weekday
  8. Body Parts Injured
  9. Cause
  10. Injured When Working Overtime
  11. Injured While Performing Official Work Duties
  12. Major Industry (SSIC 2010)
  13. Major Injury Indicator
  14. Months worked
  15. Nature of Injury
  16. Occupation
  17. Pct Manual Work
  18. Sub Industry (SSIC 2010)
  19. Supposed Start Work Timing on Day Injured
  20. Victim's Age (Reported Yr - Birth Yr)
  21. Victim's Employment Start Date
  22. Victim's Gender

Data Preparation

1. Remove unnecessary data attributes
2. Check through missing data patterns

LTZ VA2 DP 1 Missing Data.PNG
There is a total of 33 rows that have some missing data. The data attributes that have missing data are Supposed Start Work Timing on Day Injured, Victim's Employment Start Date and Months Worked. As the missing data is relatively small, I had chosen to ignore them.

3. Transform text formats

LTZ VA2 DP 2 Text Formats 1.PNG
For the three attributes, Body Parts Injured, Nature of Injury and Major Injury Indicator, some of the data have uppercasing, some lowercasing and some used underscores as spaces. To make the data standardised for nicer representation, there is a need to transform those values.
LTZ VA2 DP 2 Text Formats 2.PNG
For Major Injury Indicator, I have created a new column and used the formula as shown on the left. The formula is to change the values to title case.
LTZ VA2 DP 2 Text Formats 3.PNG
For Body Parts Injured and Nature of Injury, I have created new columns and used the formula as shown on the left. The formula is to change the values to title case and replace underscores with spaces.

4. Remove unnecessary words

LTZ VA2 DP 3 Split Words 1.PNG
For the two attributes, Sub Industry (SSIC 2010) and Accident Agency Level 2 Desc, they both contain values of their above hierarchy which are Major Industry (SSIC 2010) and Accident Agency Level 1 Desc respectively. To reduce the repeated words and for better representation, some transformation will be done.
LTZ VA2 DP 2 Split Words 2.PNG
For Sub Industry (SSIC 2010), I have created a new column and used the formula as shown on the left. The formula will first check if it contains a hyphen. This is because not all of the data have the repeated words of the Major Industry (SSIC 2010) attribute. If it contains a hyphen, it will then remove the words and the hyphen of Major Industry (SSIC 2010) in the Sub Industry (SSIC 2010).
LTZ VA2 DP 2 Split Words 3.PNG
For Accident Agency Level 2 Desc, I have created a new column and used the formula as shown on the left. The formula will first check if it contains the values in Accident Agency Level 1 Desc and then, remove the repeated words and hyphen. This is because not all of the data follows the same format whereby the Accident Agency Level 1 Desc attribute is in front of the Accident Agency Level 2 Desc. An example will be the “Moving Vehicles” as shown in the first image.

5. Recode occupation attribute

LTZ VA2 DP 4 Recode 1.png
For the Occupation attribute, some occupations have different formats, spelling errors or in plural forms. However, they all are related to the same occupation. An example is shown the left. To clean up this attribute, I used the recode function in JMP.

Reference: Using Recode in JMP for data preparation – http://blogs.sas.com/content/jmp/2016/07/27/19692/

LTZ VA2 DP 4 Recode 2.png
Firstly, I used the “Convert to Titlecase”, “Trim Whitespace” and “Collapse Whitespace” functions. “Collapse Whitespace” is to replace two spaces with one. Then, I used “Group Similar Values” function and change the “Max Character Difference” to 1. This is to group the occupations that are in plural forms.
LTZ VA2 DP 4 Recode 3.PNG
The example on the left shows the result of what I have done in the previous step. However, it can be seen there is still a value that has yet to be grouped together. I briefly check the values and manually group the occupations if there is a need. Then, I save it as a formula column.

6. Transform time formats

LTZ VA2 DP 5 Time 1.PNG
When exporting over to Tableau for visualisation, the attributes, Accident Time and Supposed Start Work Timing on Day Injured, are not in the format that Tableau required.
LTZ VA2 DP 5 Time 2.PNG
To represent properly, I did a quick fix by first changing the attributes’ data type to “character” and then, using the formula on the left. After exporting to Tableau, I change the data type to Date & Time.

7. Setting up Hierarchy

LTZ VA2 DP 6 Hierarchy.PNG
In Tableau, I set up some hierarchy that I know from the start. This is to make it easier to use the attributes for visualisation.

Visualisation Process

Schedule Metrics

1.How did the workers get injured? 2. Where did the workers get injured at? What are the injuries? 3. What is the profile of the worker? 4. When did the workers get injured?

2. Check through missing data patterns

LTZ VA2 DP 1 Missing Data.PNG
There is a total of 33 rows that have some missing data. The data attributes that have missing data are Supposed Start Work Timing on Day Injured, Victim's Employment Start Date and Months Worked. As the missing data is relatively small, I had chosen to ignore them.

Final Visualisation

Final Visualisation Link: https://public.tableau.com/profile/lim.ting.zhi#!/vizhome/LimTingZhi-VAAssignment2/FinalVisualisation

The main question stated at the start of this documentation is to find out the factors that contribute to workplace injuries. Thus, the final visualisation consists of two tabs, Victim Profile and Injury Period. First, for Victim Profile, the dashboard shows an example of an occupation, cook. The factors that are highly contributing to workplace injuries would be the occupation, industry and age. This will also determine the cause and type of injuries occurred. The number of months worked is not a significant factor as no matter how experienced the worker, he/she is still relatively prone to injuries. The gender is also not quite a significant factor as it could be due to the victim’s occupation. For example, the top few occupations are mainly male dominant. Second, for Injury Period, the dashboard shows the period of injuries occurred as a whole. From the overview, the time factors are not significant factors. This is because accidents will still occur, regardless of the time period and there are not many distinct patterns can be seen.