ANLY482 AY2017-18T2 Group03 Data Analysis Old
HOME | ABOUT US | PROJECT OVERVIEW | DATA ANALYSIS | PROJECT MANAGEMENT | DOCUMENTATION | MAIN PAGE |
Previous | Current |
---|
Contents
Methodology
In this section, we will explain the methodology which our team will implement to perform analysis on the data provided by our sponsor.
We will be using JMP Software and Microsoft Excel for Exploratory Data Analysis (EDA) to better understand the dataset given and its characteristics. As part of data preprocessing, our team will be performing the following steps to obtain a clean dataset. The steps will eventually be converted into a script which will be used to clean the data that is uploaded into the dashboard which we will develop for our sponsor.
Data Preprocessing
With every new dataset, we first have to clean the data to remove irrelevant data that should not be included in our analysis. For data cleaning, the steps include:
- Handling missing values. If there are missing values in a row of record, the entire row will be excluded because it will be inaccurate to include it.
- Handling duplicate data. Duplicate data could occur when the employees double scan the barcode upon inbound of goods. Similarly, in the event of duplicate data, we will remove the entire row as well.
- Resolving redundancies caused by data integration.
With the clean dataset, we will proceed to further explore the data and find out potential visualizations and analysis that can be done with the dataset to provide a more in-depth analysis and dashboard that will be useful for our sponsor.
Visualizations
The final product of our project is to create an Operations Dashboard to visualize the following KPIs:
- Operations Productivity Chart
- Product Ranking Chart
- Product Seasonality Chart
- Actual Overtime Hours Chart
- Overtime Performance Analysis Chart
At the start of each day, the Operations Manager will upload 3 CSV files namely: Handling In Report, Handling Out Report and Overtime Hours Report for the day before. Upon uploading these files, the data cleaning script will be run and the relevant data will be stored into the database which will be used for data visualizations on the dashboard.
The visualizations to be used on our dashboard include Time Series Line Chart and Treemap. The Operations Manager will have the ability to select the time period to view as well.
Bar Chart
Operations Productivity Chart
Using a bar chart we can visualize the various operators productivity and performance level during each hour of the day to see if there is a generic trend throughout the day, or whether the productivity levels are in line with the break timings allocated to the operators. From here we can further identify which time of the day could be better made use of to increase the overall productivity level, or whether there is an unexpected unproductive time period in the day.
Treemap
Product Ranking Chart
A treemap can be used to identify which product is the best-selling item and which is the least selling item. They are being ranked according to the size and proportion of their sales as compared to one another. As seen from the example below, we can identify the best selling product by the proportion of its sales.
Time Series Line Chart
Product Seasonality Chart
After identifying the ranking of the products by its sales volume through a Treemap, we can further drill down into the individual product seasonality performance through a Time Series Line Chart, where we can observe the sales of the product over the months or years.
Actual Overtime Hours Chart
Actual Overtime Hours Chart (Dual-axis chart), these charts will be made up of 2 lines, one for the number of overtime hours and one for the number of containers.
The purpose of this chart is to visualize the number of overtime hours for the month in a timeline view so that managers can see the overtime performance of the operations in relation to the number of additional containers that need to be serviced for the day.
Overtime Performance Analysis Chart
With the chart, we can tell from historical data on how much time in hours is taken to fulfil and handle the number of specified containers that exceeds the threshold limit of a daily operation.