IS428 2016-17 Term1 Assign2 Chua Feng Ru

From Visual Analytics for Business Intelligence
Revision as of 04:24, 26 September 2016 by Fengru.chua.2013 (talk | contribs)
Jump to navigation Jump to search

Theme of Interest

The theme of interest for IDEAL assignment is to understand the accidents reported in the workplace of Singapore. It strives and aims to explore and analyse reported accidents in 4 general aspects:

  • What is the distributions of Industry?
  • What is the distributions of Injury?
  • When did the accident happened?
  • What is the relationship between variables?


Questions for Investigation

From the above questions, I am able to generate the specific questions:

  • What is the relationship between number of accidents reported and number of MC days, in terms of each major or sub-industry?
  • What is the relationship between number of accidents reported and number of MC days, in terms of the nature of injury?
  • What is the prevalence of injuries and accidents across industries?
  • At which hour of a day is the accidents reported the highest or lowest?
  • What are the patterns of accident reported for each hour in a day?
  • What contributes to the patterns of accident reported for each hour in a day?
  • Does working experience minimize the number of MC Days?
  • Does the size of company affect the number of MC Days?



IDEAL Process

From the initial question :

  • What is the distributions of Industry?
  • What is the distributions of Injury?


The 2 questions above made me plotted a graph which shows the distributions as below:

MA2-1.png
MA2-2.png


As I explored further, I realised that I am able to generate Scatter Plots of the both No. of MC Days and No. of Accidents Reported. This allows me to find out what is the relationship between No. of Accidents Reported and No. of MC Days for each industry and injury, and thus also evolve the question into:

  • What is the relationship between number of accidents reported and number of MC days, in terms of each major or sub-industry?
  • What is the relationship between number of accidents reported and number of MC days, in terms of the nature of injury?


MA2-3.png
MA2-4.png


From the graphs above, I am able understand which industry is highly correlated in both variables in terms of industry and nature of injury. This allows me to understand as well, which industry and injury is affecting safety of workplace. For example, I will be able to focus on those data points towards the diagonal upwards towards the right, as these are the industries or injuries where there are above average Accidents Reported and No. of MC Days.




Through the scatter plots, I realised that an addition of the question "What is the prevalence of injuries and accidents across industries?" would be good to provide a full picture on the relevant variables.

MA2-5.png


Through the treemap, I am able to understand from a bigger picture, the prevalence (in terms of number of records and MC Days) of the incidents for each sub-industry. The color signifies the severity of the accident in terms of total MC Days, while the size of each box signifies the number of accidents reported. From the treemap, I am able to determine that Crushing in Construction is the most prevalent, as it has the biggest box and the deepest color shade. While in Manufacturing, Cut Bruises in MetalWorking is the most prevalent. Lastly, in Others, Crushing and Cut Bruises are the most prevalent.




In this phase of investigation, I proceed on to answer the "When did the accident happened?" question. Initially, I plotted a time-series graph across the months in 2014, however I realised that I am able to get more insights by narrowing down on the timeframe.


Thus, I drilled down further to look at the Accident Time in terms of Hour of a day, and this prompted me to change the question to:

  • At which hour of a day is the accidents reported the highest or lowest?
  • What are the patterns of accident reported for each hour in a day?


MA2-6.png
MA2-7.png


The first graph allow me scan through the general trend of accidents happening in a specific hour of a day. And, through the interactive visualsation, the trellis plot in the second graph will reflect what contributes to the rise and fall of the general trends. As from the interactive visualisations, I can conclude that the rise in overall accidents reported from 10-11AM can be attributed to Crushing, Cut & Bruises in all 3 major industry, Sprain & Strains from the Other industry, and Other injury from Other industry.




This part of the investigations arise from my personal beliefs...

As from my belief that the more experience you are, the less likely you are to be badly hurt. And in this case, the degree of hurt or injured is being measured by the No. of MC Days. With this, I proceed with the plotting of visualisations to answer "Does experience reduces injury?".

MA2-8.png


Contrary to my belief that experienced workers will result in lesser injury, the coefficient of determination (R-Square) value shows that there is simply no correlation between experience and MC Days.




My other belief was that if the company has a large number of employees, it would also meant that accidents are more likely to happened. This prompted the question of "Does the size of company affect the number of MC Days?". From this question, I proceed to plot No. of Accidents reported by Informant No. of Employees, where each data point is the Informant Company Name.

MA2-9.png


This result actually shows that there is a positive correlation as shown by the R-Square value. The highest correlation is shown in Construction and Others industry (R-Square of 32% and 37% respectively), while slight positive correlation (R-Square of 10%) is shown in Manufacturing industry. While there is a strong correlation, it should be noted that this is no causation.

Tools Utilized

  1. Excel 2013 for data preparation
  2. Tableau for visualization