IS428 2016-17 Term1 Assign3 Bong Jun Hao

From Visual Analytics for Business Intelligence
Revision as of 18:02, 25 October 2016 by Junhao.bong.2014 (talk | contribs)
Jump to navigation Jump to search

Data Exploration/Cleanup

Employees.xlsx

By itself, the dataset provided was not enough to provide any insights, except for to identify the department and floor/zone of a particular employee. Issue: It is mentioned that an employee could possible hold multiple proxID as they would pick up additional prox card(s) if they have misplaced their old one. This is not reflected in the employees file provided.

Solution: Upon further investigation of the proxOut file, it can be seen that the proxID of an employee is based on the combination of the first character of an employee's First Name followed by his or her entire Last Name as well as a 3 number digit code that is in running order from 001 onwards.

An example would be Anda Ribera, where Anda is the first name and Ribera is the last name. Hence he would have the id Aribera001 and any subsequent id issued would be aribera002, aribera003 so on and so forth. As such, an additional column is added to the employees.xlsx file based on the proxOut file to see what are the various ids that each employee holds.

CaptureOfCleanedEmployee.JPG

Mobile and Fixed Proximity Data

Issue: The problem with the mobile and fixed proximity data is that ProxOut (Fixed) shows the floor and zone of each area whereas for the mobile ProxOut, it shows the X and Y coordinates in replacement of the zone. Hence without a common set of values as well as mappings, it is not possible to connect the fixed and mobile proximity sensor on the map to give us an indication of the location of the employee.

Solution: The solution is to find a way to be able to map out the areas on the proximity zones images provided and give a X and Y coordinate mapping which can be then added to the mobile Proxout data file. I referred to Gwendoline's work where she used this online tool, from Bryant Howell to draw out custom polygons on the map provided and to get the centre point X and Y coordinates of these zones.

The online tools is available at this link: https://github.com/bryantbhowell/tableau-map-pack

Following Gwendoline's idea, i cropped out the map images provided so as to remove white spaces and to show only the map area only as well. Other points to note when plotting the polygons are:
1) For zones that encompasses a large area of the map such as Zone 1 of Floor 1 which refers to all the corridors and white spaces in that area, I only plotted the polygon of the corridor.
2) For zones that includes multiple areas such as Zone 4 of Floor 1, only one area is drawn, as it refers to all the staircases and stairwells and it makes sense to just map the X and Y coordinate to one staircase as it is assumed that it is not important to track the exact staircase location for each employee who are at it.

With the polygon mappings of the cropped image, i then proceeded to calculate the center point of the coordinate using Excel. What i did was to use the formula: "=MEDIAN(MIN(C$2:C$5),MAX(C$2:C$5))" to find the exact center X and Y value for each polygon. Calculation of Center Point.JPGOffCenteredMidPointsFloor2.JPG

This was repeated for floor 2 and 3. However, due to floor 2 and 3 having irregular shapes, some of the mid points were off and had to be manually changed. OffCenteredMidPointsFloor2.JPGCorrectedMidPointFloor2.JPG

The mid point mappings for each floor and zone is then combined as seen.
Midpoints Mapping.JPG

  • NOTE* I realized i made a mistake when initially counting the midpoints as i've sent the range of X and Y to 3085 and 1019, without rescaling it to the required dimensions of 189 (X) and 111 (Y). I have since resize the map images and redone/readjust the plotting and calculation of the mid points according.

With this mapping we then proceed to combine the Proximity Data together. As the mobile proximity data already has the x and y coordinates mapped, only the fixed proximity sensor will be mapped, using a V-lookup function. Prior to that function, a column called identifier which combines the floor and zone is created so that the correct x and y coordinates can be determined from the mapping table. This is then combined with the mobile proxOut data.

MappingFixProxData.JPGCombinedProxData.JPG

Building Data

For the building data that was provided, i split it into various segments, namely: 1. The building with all its unique attributes alone. 2. By floor when it contains all the attributes of a building floor without attributes from zones. 3. Lastly, all the elements which contains both floor and zone attributes are grouped together.

After splitting the data up, the data were reshaped for easier processing in Tableau using the Tableau plug in.

Hazium Concentrations

For the hazium dataset, it was processed similarly to that of the building data. It is slightly different in the sense that instead of splitting them up, i proceeded to join all the datasets together and perform a data reshaping after that.

Data Import

The data sets used are as follows: 1.BuildingCleaned - This is the cleaned up data file containing all attributes and data pertaining only to the building.
2.BuildingReshaped - This is the reshaped data file based on BuildingCleaned for easier processing in Tableau.
3.combinedProxData - This is the cleaned up combined proximity data which also contains the x and y coordinate mapping for the fixed proximity sensor.
4.EmployeeCleaned - This is the list of employees with the id added in so that they can be used in conjunection with the promixity data.
5.FloorCleanReshaped - This is the cleaned up and reshaped data file containing all attributes and data pertaining only to the floor of the building.
6.FloorZoneCleanedReshaped - This is the cleaned up and reshaped data file containing all attributes and data pertaining only to the floor by zones of the building.
7.HaziumCleanedReshaped - This is the cleaned up and reshaped data file containing the Hazium data. It has the combined data of all the hazium floor data.

I encountered a difficulty in which instead of importing each of the data set used as a new data source, i imported only 2 data sources, the first one being the file that combines the fixed and mobile proximity sensor data, and adding a new connection and doing a left join to the employee list so as to map the sensor data with the employees.

The second one was to use the building data as a primary data source and adding all the other data such as floor, floorzone and the hazium concentration excel as a data connection. This was a HUGE mistake as it resulted in me being unable to publish the workbook as in the end all the data had to first be extracted, and due to the data connection joining, it still could not be extracted after more than 15 hours as shown:
TakingTooLong2.JPG


In the end the faster way of importing the data set would be to add each excel/csv file as a new data source in Tableau, and if a relationship needs to be established, then to go to "Edit Relationships" to link them if required. The combined promixity sensor data is left untouched as it is required to do a joining with the employee list in order to successfully track and plot their movement later.

Tableau Link

The tableau workbook is published here: https://public.tableau.com/profile/bong.jun.hao#!/vizhome/BongJunHaoAssignment3/Visualization

Task 1

This is based on the proximity data for by combining both mobile and fixed promixity sensors data and linking it to the employeeId and proximityId to see if there is any patterns within departments or movement patterns.

The typical patterns in the proximity card data set is that: 1. From 12am - 6am, the only data collected by the fixed sensor is from the level 1 fixed proximity sensor, indicating that is it quite likely that no employees are moving around level 2 and 3 between this period of time. 2. Outside of the time frame between 7am-5pm, there is no movement of any staff from the security department.

A typical day of an employee of GASTech would be:
1. Employees reach the office from 7am onwards.
2. By 9am most employees are already in their office.
3. Lunch time is about 12 noon - 1 where we see quite a bit of people at the Deli after which there is movement around the offices.
4. Employees tend to end work/knock off at 5pm with a few exceptions who end work from 4pm onwards.
5. Employees from the IT and Engineering department tend to stay and work till much later than the others, going home at about 11-12 midnight.

Task 2

This is with regards to the building data, where i split it into data concerning only the building itself, data that involves the various floors of the building and lastly data in connection to each respective zone and floor of the building.

1. The water heater temperature set point is at a consistent level of 60 degrees, and water temperature entering the heater(inlet) tend to decrease after 5am and increase after 12 noon, with a consistent inlet mass flow rate.
2. There was a noted peak in the total electric power demand on the weekend over June 10th - 13th. This is peculiar as there should be lesser demand for energy due to lesser staff being in the office instead of experiencing a sudden spike in demands.
3. The level of carbon dioxide concentration requires attention at level 1, at the zone 5 and 7 level. It is considered acceptable for the other levels.
4. The supply fan consume higher levels of power between 7am to 10pm, the outlet mass flow rate does not fluctuate as the day pass. This is in line with the new building being energy efficient by consuming higher levels of power when there is more people during the working hours.
5. It seems that the HVAC fan demand for power increases or is higher during the weekends rather than during the normal working week.
6. The deliFan consumes least power between 6-7am and 5-9pm. It also seems to consumer quite a bit of power on Sunday.
7. The bathroom exhaust fan is consistently consuming power one level 1 and 2 throughout the day. On level 3, the fan consumes power between 7am to 10pm, but this is peculiar as most employees should have left level 3 after 7pm.

Task 3

The notable anomalies/unusual events are as follows:
1.The fan supply uses high power at 4am, as well as over the weekend on floor 3. This is abnormal as no one or very little people would be working there over the weekend.
2.The hazium gas concentration was abnormally high for all levels of the building on 11 June from 1pm till late. Even though the effects are not known, this makes it a possible health issue for employees.
3.The carbon dioxide concentration level spiked dangerously high on the 5th and 6th of Jun, reaching above 1800ppm.
4. On the 7th and 8th of June the supply fans consumed lower than normal levels of power between 7am - 10pm, resulting in a drop in the supply outlet mass flow rate, and an increase in temperature.The supply fan then consumed higher than average levels of power of 10pm.
5. There is a high supply fan outlet mass flow rate on 11 and 12 June for floors. Coincentally the HVAC system fan also consumed high amounts of fan power on both days. This pattern continues until the early morning of 13 June.
6. From June 2 onwards. the CEO's office had high temperatures of 32 to 40 degrees from 1pm till 4am the next morning.
7. Following the high temperature in the CEO office, on June 3 the CEO office had the highest concentration level of Hazium gas for the whole day.
8. The Hazium gas concentration spiked again at the CEO office on the 9th and 11th June, and these spikes were rather sudden.

Task 4

There is cause of suspicion that the high levels of Hazium gases does not occur naturally. and it seems that the Hazium spikes are targeted at the CEO, which raises alarm and that it could potentially be one of the employees wanting to cause harm to the CEO.

Knowing the time where the CEO's office had large spikes in the Hazium concentration level gives us an indication of where or who to look at, based on the prox sensors data. Knowing that the CEO office experienced the spike on the 3rd of June, we can deduce that anyone wanting to plant or release the gas would be gone before 7am (Height of spike), and the person should be present just before the Hazium levels start to increase. (i.e from 4pm onwards on 11 June)
Secondly, the attacker would not want to be present when the gas is released to avoid harming himself. Hence we are looking for an employee who:
1. Is present at level 3 before the increase in Hazium levels (I.e 2nd June, 4pm onwards)
2. Away throughout the period from the spike (i.e 7am on 3rd June) in Hazium

Suspect Reduction

Using the two criteria as mentioned above, I decided to reduce the pool of employees possible by using excel and filtering out employees who were present at Floor 3 on 9th June as well as on the 3rd of June based on the spikes shown. Employees who were present are assumed not to be the suspect as they would not want to be poisoning themselves, reducing the number of suspects to 97 in total out of a initial 125.


It is with this in mind that more investigation would have to be carried out to determine if the spike in Hazium gas is indeed man made and cause by perhaps a disgruntled employee of GasTech. PresentEmployees9June.JPGSuspectCount.JPG

References

https://tableauandbehold.com/2015/04/13/creating-custom-polygons-on-a-background-image
http://www.tableau.com/learn/tutorials/on-demand/polygon-maps-8.2

Comments