IS428 2016-17 Term1 Assign2 Lim Zi Yu Jouta
Introduction
I have chosen to work on the Workplace Injuries Data 2014. Looking through the WSHI National Statistics Report 2014, I noted that compared to 2013, the total number of workplace injuries have increased with types of injuries having increased occurrences except for fatal injuries, which has decreased, which I found rather ironic that it was not the case that injuries increased or decreased uniformly. Beyond the workplace injuries, Occupational Diseases (OD) attributed to the workplace has also seen an increase. Through the investigation of this data, I hope to uncover causation or correlation between variables that could be used to explain the occurrences of workplace injuries so that effective steps can be taken to reduce it.
Data Visualization System Design Process
Step 1: Identify a theme of interest
Given the workplace injuries dataset, my theme of interest would be on reasons for workplace injuries. I believe that Singapore as a safe country would have had a number of regulations to ensure the safety levels of workplaces. Yet the incidence rates of workplace injuries continue to increase, which brings up the question of why is this the case.
Step 2: Define questions for investigation
- Numbered list item
Is there a relationship between severity of injury and industry?
- Numbered list item
Is there a relationship between time and day of the week with injury occurrence rates?
- Numbered list item
Are there any outliers with high incidence rates of injuries?
- Numbered list item
Is there a relationship between gender and injuries?
Step 3: Find appropriate data attributes
After looking through the data, I noted some data formats can be improved by first making some simple alterations in Microsoft Excel.
The column for Accident Day, has a mix of dates saved in excel format as serial numbers while others are saved as string. To make all data format in the column uniform, I used text to columns to select the column data format to be date in DMY format
The column for Accident Month shows the month in a date, such as 15/04/1904 and 15/06/1904, to show the month of the accident. To make it clearer, I added in another column “Accident Month Num” to take only the month number from the date. I did the same for the accident day. The accident date column data appears to only have dates that are the first day of the month, which I don’t think is accurate in reflecting the actual accident date. Hence, I created a new column Accident Date Combined with the date created from the day, month year numbers, which I believe to be a more accurate reflection of the actual accident date.
The Sub Industry column has data in the format of “major industry – sub industry”, hence I used text to columns to split them by the - delimiter. I also made sure that the “Non-metallic Mineral Pdts” sub industry was formatted correctly despite my actions. I also ran the trim function on them to remove redundant spacing in the string. Also did something similar for the column Accident Agency Level 2 Desc, but this time round I tried using the search function to find the character number of the hyphen and used it to get the string on the left and right of it instead. It was only much later that I realized you can right click on the column in Tableau to “split”, however it also split the “Non-metallic Mineral Pdts” into two so I did not use that function as a result.
Added column Major Injury Boolean based on column Major Injury Indicator such that if the value is “MAJOR INJURIES”, value in Major Injury Boolean will be TRUE, else FALSE. Did the same for columns Pct Manual Work, Hospitalized for at least 24 hours, Injured when Working Overtime, Injured While Performing Official Work Duties. Even though I am not sure now if I will be using those variables or if the transformations are needed, but I think it is better to prepare them in advance to ensure smooth progression when exploring the data later on.
Added calculated field to split victim age to 3 groups, under 30, 31-55 and over 55. The reason why I chose these 3 group is due to the attitudes at the work place from my personal experience. Those below 30 were the young people, 31 – 55 were the main age group at work, after 55 was the more elderly group of people at work. Given that people of the similar age groups then to group together and engage in similar activities, I believe that this would help aid in the data exploration by allowing us to gain insights into the possible habits and behaviors that give rise to the risk of injuries at work.
Data Exploration
Initial steps to explore the data
As I was just starting out, I started to play around with the data based on the questions I had listed out above. One of the first graphs I created was a bar chart showing the count of injuries per Employer name sorted from largest to smallest. To add in more detail, I created a filter with range of values and use color to show the injury by major industry, and made major industry an attribute to the data in the tree map. Adding it to a dashboard allowed me to link it with an action to another graph, such that as you hover over the bar for the specific employer, a table below will give you the breakdown of the major/minor injury and the nature of the injury.
I constructed a tree map to illustrate the percentage of injury types by the Sub Industries. I set the color gradient for the tree map to be only of blue color, and made it a 7 step color change as I think it gives a clearer representation of the differences between values like 202 and 125 by making a clear 1 step distinction in color compared to a smooth gradient. I added filters for Sub Industry 1 (sub industry header) and Major Injury Indicator so that people can adjust to see in more detail as they like.
I created two graphs, one to show the distribution of injury counts by weekday of each month and another by the month, with color to indicate major or minor injury.
Visuals
Injury count per Employer
Visit this link for the full interactive visual https://public.tableau.com/views/Assignment2_248/EmployerxInjuries?:embed=y&:display_count=yes
From this dashboard, we are able to see the injury count per employer, with some null values as the employer field was left blank for some data rows. By hovering over each employer row, we can see the breakdown of injuries by Major/Minor, and the Nature of the Injury. We can also filter by the number of injuries per employer to narrow down on specific employers.
From this graph, we see that majority of the employers see only 1 injury count, and only a handful of them have more than 6 injury counts in 2014. Those with 6 and more injury counts are from the major industries of Construction and Others, with others having more than 1 major industry. Most of the injuries in the employers with multiple injuries are minor injuries instead of major injuries, which may hint on some issues in the working environment resulting in some easily caused minor injuries, such as placement of equipment with sharp edges near a walkway whereby people who are not careful would easily injure themselves there.
% Injury Type per Industry
Visit this link for the full interactive visual https://public.tableau.com/shared/7NPDCT2SN?:display_count=yes