IS428 2016-17 Term1 Assign2 Chua Feng Ru
Theme of Interest
The theme of interest for IDEAL assignment is to understand the accidents reported in the workplace of Singapore. It strives and aims to explore and analyse reported accidents in 4 general aspects:
- What is the distributions of Industry?
- What is the distributions of Injury?
- When did the accident happened?
- What is the relationship between variables?
Questions for Investigation
From the above questions, I am able to generate the specific questions:
- What is the relationship between number of accidents reported and number of MC days, in terms of each major or sub-industry?
- What is the relationship between number of accidents reported and number of MC days, in terms of the nature of injury?
- At which hour of a day is the accidents reported the highest or lowest?
- What are the patterns of accident reported for each hour in a day?
- What contributes to the patterns of accident reported for each hour in a day?
- What is the prevalence of the injuries in each sub-industry ?
- Does working experience minimize the number of MC Days?
- Does the size of company affect the number of MC Days?
IDEAL Process
From the initial question :
- What is the distributions of Industry?
- What is the distributions of Injury?
The 2 questions above made me plotted a graph which shows the distributions as below:
As I explored further, I realised that I am able to generate Scatter Plots of the both No. of MC Days and No. of Accidents Reported.
This allows me to find out what is the relationship between No. of Accidents Reported and No. of MC Days for each industry and injury,
and thus also evolve the question into:
- What is the relationship between number of accidents reported and number of MC days, in terms of each major or sub-industry?
- What is the relationship between number of accidents reported and number of MC days, in terms of the nature of injury?
Graphs Here...
From the graphs above, I am able understand which industry is highly correlated in both variables in terms of industry and nature of injury.
This allows me to understand as well, which industry and injury is affecting safety of workplace.
Through the scatter plots, I realised that an addition of the question "What is the prevalence of the injuries in each sub-industry ?" would be good to provide a full picture on
the relevant variables.
Graphs Here...
Through the treemap, I am able to understand from a bigger picture, the prevalence (in terms of number of records and MC Days) of the incidents for each sub-industry.
In this phase of investigation, I proceed on to answer the "When did the accident happened?" question. Initially, I plotted a time-series graph across the months in 2014, however I realised that I am able to get more insights by narrowing down on the timeframe.
Thus, I drilled down further to look at the Accident Time in terms of Hour of a day, and this prompted me to change the question to:
- At which hour of a day is the accidents reported the highest or lowest?
- What are the patterns of accident reported for each hour in a day?
Graphs Here ...
The first graph allow me scan through the general trend of accidents happening in a specific hour of a day. And, through the combination of the trellis plot in the second graph, I am able to
understand what contributes to the rise and fall of the general trends.
This part of the investigations arise from my personal beliefs...
As from my belief that the more experience you are, the less likely you are to be badly hurt. And in this case, the degree of hurt or injured is being measured by the No. of MC Days.
With this, I proceed with the plotting of visualisations to answer "Does experience reduces injury?".
Graph Here...
Contrary to my belief that experienced workers will result in lesser injury, the coefficient of determination (R-Square) value shows that there is simply no correlation between experience and MC Days.
My other belief was that if the company has a large number of employees, it would also meant that accidents are more likely to happened. This prompted the question of "Does the size of company affect the number of MC Days?".
From this question, I proceed to plot No. of Accidents reported by Informant No. of Employees, where each data point is the Informant Company Name.
Graph Here...
This result actually shows that there is a positive correlation as shown by the R-Square value. The highest correlation is shown in Construction and Others industry (approximately 30%), while slight positive correlation is shown in Manufacturing industry. While there is a strong correlation, it should be noted that this is no causation.
Tools Utilized
- Excel 2013 for data preparation
- Tableau for visualization