ANLY482 AY2017-18T2 Group11 Project Overview Old

From Analytics Practicum
Revision as of 02:19, 28 February 2018 by Jianhua.wu.2014 (talk | contribs) (Created page with "<!-- WIKI Banner --> 1080px | center <!--End of WIKI Banner--> <!--Main Navigation--> <center style="margin-top: 1px"> {| width=1080px cellspacing...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search
T.W.O Banner.png
HOME PROJECT OVERVIEW ANALYSIS & INSIGHTS PROJECT MANAGEMENT DOCUMENTATION ANLY482 MAIN


Updated Previous
Motivation


Technology has opened the floodgates for users globally to amass and transmit data, providing millions with unprecedented opportunities and benefits. This has given rise to the field of Data Analytics, which has become an important tool for companies to improve the efficiency in their operations. However, some companies are still lacking data analysis capabilities and often find it time-consuming to visualise, modify and interrogate data – especially when data is arising from more than one source such as machinery sensor, mobile devices, wearables, weblogs etc.

While big data presents opportunities for many companies to leverage on, it requires a certain level of technical skill in order to successfully capitalise on this opportunity. The lack of technical capabilities within the company to derive operational solutions from data is a problem that resonates strongly with the sponsor company, much like many others. With the company slowly gaining foothold in the industry, it is imperative for the company to enhance itself by analysing its data and obtain solutions to counter its high operational costs.


Data Provided


The data provided is obtained from Company ABC’s database server and this database is updated whenever parcels are being received and delivered. The data given will be 2 years’ worth of delivery data in the Central and Western parts of Singapore and this adds up to approximately 750,000 rows of data points. It will contain details such as the delivery address, quality, date, etc…


Project Objective & Goal


Based on the problem statement our sponsor have given to us, we have derived 3 main objectives for this project. The 3 objectives are:

  1. Identify other possible ways to minimise operational costs for the company
  2. Identify the optimal number of Drivers that Company ABC would require
  3. Minimise failed delivery by identifying erroneous forms before goods are being dispatched

The objectives and problems listed can be summarised as following : T.W.O Objective.JPG


Methodology


To provide operational recommendations from the given dataset, we will thoroughly examine the dataset via the following four-step approach:

1. Data Exploratory

As the dataset is provided in Excel format, little data preparation is required by the team. Following which, the team would use methods such as summary statistics, to determine if there are any inconsistencies, missing and invalid values in the dataset.

2. Data Cleaning

As errors such as outliers and invalid values could lead to inaccurate results, the data must be cleaned to ensure that it is suitable for further analysis. Based on the dataset, the two most probable data errors are inconsistency data and missing or invalid values.

3. Data Analysis

After cleaning up the relevant data, an in-depth analysis will be performed on the data to gain meaningful insights. Based on preliminary discussion, we will be looking into these 4 analytical methods in analysing the data:

a) Time series analysis – As the data provided contains time-series variables, the team will be performing Time Series Analysis on the data. This will allow the team to identify many trends such as those pertaining to the number of Drivers. This would then aid the team in forecasting the optimal number of drivers required for future deliveries.

b) Frequency distribution & Maximum likelihood – During the data cleaning process, the causes for invalid values will be recorded. Frequency distribution will be used to determine the frequency of each causes. After which, maximum likelihood will be performed to identify which reasons contributed most significantly to the invalid values. These factors will be analysed in greater depth as they are the primary reasons for a failed delivery attempt.

c) Cluster Analysis – By using variables such as delivery area, number of parcel and size of parcel, the team will be able to profile its consumer segments and analyse how it changes over time. With this information, the team can then better estimate the number of drivers required for each region. This will help the company to optimise the number of driver required for each area, and potentially reduce the operation costs.

d) Correlation & Regression – Correlation Analysis will be performed to identify the relationship between explanatory variables. Besides correlating the variables, the team will also attempt to perform regression on explanatory variables against outcome variables. This analysis will allow the team to determine the relationship between the variables and derive many conclusions pertaining to operational costs. For example, the team will be able to understand how variables such as quantity and weight of parcel can significantly impact operation cost.

4. Data Visualisations

Lastly, we will also be looking at creating a dashboard with the following visualisations which will ultimately help the team present its recommendation. Some of the visualisations that will be derived are:

a) Spider Chart – To visualise the reason that contribute to data inconsistency

b) Time Series Line Chart – To forecast the optimal number of drivers needed in the future based on past data

c) Heatmap – To identify the number of drivers needed at the various location


Technology Used


T.W.O Toolused.png