IS480 Team wiki: 2016T1 IPMAN Final Wiki
- 1 Project Progress Summary
- 2 Project Management
- 3 Quality of Product
- 4 Reflection
Project Progress Summary
Note: We have deployed live on Hook Coffee's existing application. However, as Team IPMAN has signed an NDA with Hook Coffee, the above link to the deployed site is only available to existing administrators in Hook Coffee. To view the application, we have created a staging server in which the credentials are available in the deployment section below.
- Implemented new chart technology such as D3.js to display customer segmentation in Sprint 11
- Implemented the coffee demand prediction using Stochastic Knapsack Model in Sprint 11
- Utilised KMeans Clustering to obtain customer segments that would be added into mailing lists on Mailchimp automatically in Sprint 11
- Implemented access rights control using Role-Based Access Control in Sprint 12
- Conducted our User Testing 3 with 10 business owners to validate the functions we have implemented for sponsors
As seen below, Team IPMAN has delivered 100% of the agreed-on project scope to our sponsor.
Project Schedule (Plan vs. Actual)
No major changes were made to the project schedule since IPMAN's Mid-Term presentation. Firstly, the team shifted our user testing 3 from 18 October 2016 to 3 November 2016. The team settled on this decision as we wanted to ensure that all functions of our platform would be covered and tested. As such, it was pushed forward from Sprint 11 to 12 when all functions would have been completed. Next, Team IPMAN added in new function in sprint 13 which is the integration of gifts tab because it was a new feature that the freelancer developers have built on the old application. In order for sponsors to be using the new application that Team IPMAN has built, Team IPMAN ported over the entire code that the freelancer developers have created onto the new application as soon as they were built. Lastly. the team also added in the project handover procedure and implementation as a new task in sprint 14 to ensure that there would be a proper handover timeline that the team would adhere closely to. This will allow all documentations to be properly handed over to the sponsors and freelancer developers. We believe proper documentation and instructions is vital for our sponsors and their freelance developers now and also, in the future.
Planned Project Schedule
The planned project schedule shows the Team IPMAN's timeline during Mid-Term presentation:
Actual Project Schedule
The actual project schedule reflects the changes made to Team IPMAN's timeline between Mid-Term presentation and Final presentation:
Team IPMAN has applied the following 4 metrics to enable greater efficiency and effectiveness in project planning. No new project metrics was introduced between Mid-Term presentation and Final presentation.
Below are the graphical representations of the various metrics that IPMAN has applied:
1. Sprint Velocity
Sprint velocity is a measurement of the amount of work Team IPMAN has completed during each sprint and it a key metric. Velocity is calculated at the end of each sprint by summing the total points of completed user stories. Only stories that are completed at the end of the iteration are counted. As seen from the Sprint Velocity chart above, from mid-terms onwards, the team has completed all planned story points.
2. Scrum Burndown Chart
The Scrum Burndown Chart is a visual measurement tool to display the amount of work completed each day by Team IPMAN against the ideal projected rate of completion for the current sprint.
Ideal: Total planned story points to be completed over number of days in a sprint
Actual: Actual story points remaining after completion of story points each day in a sprint
The team will like to highlight several key points for the following sprints after Mid-Term presentation as shown below:
Team IPMAN completed all the planned story points before the end of the sprint. This shows that Team IPMAN has been consistent in the implementation and delivered the product within the desired schedule.
3. Bug Metrics
4. Mean Time to Recover (MTTR)
(Hours spent on analysing the issue + Hours spent to implement the changes) / Number of issue in the sprint
Based on the MTTR in sprint 10, Team IPMAN spent 5.5 hours to implement the changes based on the feedbacks given by our supervisor for the mid-terms slides draft. Therefore, Team IPMAN has learnt from this incident. As such, Team IPMAN discussed about the final slides during internal meeting each week ever since mid-terms presentation. Currently, we took only 2 hours to implement the changes based on supervisor’s feedback.
|S/N||Sprint||Issue / Change||Description||Suggested by||Hours spent on analysing the issue/brainstorm||Hours spent on to implement the changes||Total Hours to Recover|
|12||11||Sponsor requested for additional inputs in the dashboard summary page||Sponsors would like to view the numbers of customers who cancelled subscriptions and number of active customers on the dashboard||Sponsors||0.1||1.25||1.35|
|13||12||Sponsor requested for the additional feature to be integrated||Sponsor requested for the freelancer's codes (gift tab) implemented on the old application to be integrated with our new application||Sponsors||1||2||3|
|14||13||Sponsor informed team about the issue with the order processing (gifts tab)||After team integrated the gifts tab, there was an issue with the customer's gift history; gifts are showing up even though it had been sent previously||Sponsors||1||0.5||1.5|
|15||13||Make changes to the final slides draft based on supervisor's comments||Supervisor suggested to provide examples when we mentioned about the different models used, provide a one-page summary to show the values we have provded for sponsors and show how we used the metrics to mitigate issues in PM.||Supervisor||1||1||2|
From the period of Mid-Terms till Final Presentation, we will like to highlight that our concerns were 1) Technical Risk and 2) Stakeholder Management Risk as described below:
|S/N||Risk Type||Risk Event||Likelihood||Impact Level||Category||Strategy Adopted||Actions|
|11||Technical Risk||Team IPMAN had the risk to reformulate the analysis models because of sparse data sets.||High||High||A||Mitigate||Team IPMAN consulted one of our Professors for the feasibility of the models created and also did a lot of prior research before the implementation of analysis modules. Subsequently, the team created a quick mock up for the prototype so that changes can be made easily.|
|12||Stakeholder Management Risk||Freelancer developer created new feature on the old dashboard. Sponsors were inclined to use the old dashboard if the codes were not ported over to the new application.||High||High||A||Mitigate||Team IPMAN ported over the entire code that the freelancer developers have created onto the new application as soon as they were built. Team IPMAN will schedule for a handover ceremony to present to the freelancer developers all the proper documentation we have created and assist any questions from them so as to ensure the continuity of the application.|
The diagram below illustrates our project handover procedure to our sponsors and freelance developers.
We also foresee a potential integration risk that can occur post-handover of our project, hence planning our mitigation plan in advance:
|S/N||Risk Type||Risk Event||Likelihood||Impact Level||Category||Mitigation|
|13||Integration Risk||The new application that we have built is using the database models. If the database models change (such as when the internationalization functionality was built by the freelancers), these changes would impact our system.||Low||High||B||Team IPMAN took the initiative to understand what changes have been made and whether the changes are necessary by discussing with the freelancers early on. We also work to communicate with freelancers to modify models in a way such that any impact is minimized, e.g. adding fields instead of changing fields in the model.|
Note: Comparison Using Hashes and Fortnightly Live Deployment have been created prior to our Mid-term presentation and it is accessible via our Mid-Term Wiki.
Monte Carlo Stochastic Knapsack and Understanding Customer Segments with K Means Clustering [Work in Progress] is Team IPMAN's new technical complexity after mid-terms.
1. Deciding how much budget for each marketing component with Monte Carlo Stochastic Knapsack Problem
(Reference: Morton, D. P., & Wood, R. K. (1997). On a Stochastic Knapsack Problem and Generalizations. In ResearchGate (pp. 149–168). https://doi.org/10.1007/978-1-4757-2807-1_5)
Right now, HookCoffee has a range marketing activities – Google Adwords, Facebook campaigns, introducing coffees, writing blog articles, sending email campaigns and doing roadshows - but they don’t know how much to spend on each component, and they have no way of keeping track how well their campaigns have done viz-a-viz their Key Performance Indicators, namely, new signups, active customers, churn, and most importantly, the number of orders for the month. In this analysis module, we hope to solve the problem of deciding how much to spend on each component so that they maximize their growth in orders.
Here, because the full marketing budget varies and is not known, we make certain assumptions so that the problem at hand can be modelled as a Knapsack problem with a fixed budget. We assume that their marketing budget grows in proportion with the weighted moving average percentage increase in number of orders, as below:
In layman’s terms, the budget for this month is equal to the budget for last month multiplied by the weighted average percentage change in coffee demand.
The distribution of demand is very similar to that of stock prices, and can be said to be an Ito Process, following the Geometric Brownian Motion i.e.
This means that we can model demand at each time period under the lognormal distribution, as below (Reference: Magee, J. F., Copacino, W. C., & Rosenfield, D. B. (1985). Modern Logistics Management: Integrating Marketing, Manufacturing and Physical Distribution. John Wiley & Sons.):
Finding the Parameters
The Monte Carlo Stochastic Knapsack Problem
We reject the null hypothesis at the 10% level of significance.
There is sufficient evidence to show our MCSKP agent performs better than the random agent (p-value = 0.05679) over the 12-month period at the 10% level of significance.
Empirically, however, our agent has yet to be tested in the real world. We hope that it will perform similarly, but we have put into place preventive measures to allow our sponsor to validate whether the recommendations still make sense. We have provided a table below that if they follow exactly, i.e. low deviation and it still does not hit the expected demand, our model may not be working as expected in practice, and they should look at the historical data to see if there are any other conclusions that can be drawn.
2. Understanding Customer Segments with Clustering
We wanted to deliver real business value, providing our sponsor with insights they didn’t know about their customers. We suggested doing automatic segmentation of customers, because we realized they had difficulty doing segmentation manually using external tools (like Intercom). Even if they manually found out about their segments, there was little data and they still had to process the data manually to find out if there’s anything that they can use to target that specific segment.
Understanding their Customers
We first tried to find what metric we could use to cluster customers. From previous visualization of data, we found out that there was a huge portion of their customer base who were extremely loyal to them, and these are the customers who order more. We also understood from our sponsor that there was an issue with churn and people exploiting the voucher codes, so we wanted to find out who were these customers as well.
We concluded using the following variables as segmentation variables:
The following will the profiling variables used, which are things that our sponsor can act to target the segments involved:
Finding Number of Segments
We will be using two popular methods of clustering: hierarchical clustering and k-means clustering. We will first sue hierarchical clustering to find the number of clusters, k, that we can fit into k-means cluster algorithm later.
We first perform Principal Component Analysis to reduce the dimensionality of our segmentation variable to a n X 2 vector (where n=2207), and performed hierarchical clustering using Ward’s method, which works well for quantitative variables. This is described in scipy’s documentation as follows:
The resulting dendrogram is as follows:
We truncate the dendrogram to get more readable results, as follows:
At first, we attempted to use the elbow method to find the clustering step where the acceleration of distance growth is the biggest (the "strongest elbow") of the blue line graph below, which is the highest value of the orange graph, as below:
Here, we get k = 2, however we found that if we do that, we would be ignoring a large group of high value customers, where the threshold is shown in grey, and the lost segment is shown in purple below:
We set our threshold at 200 and will be taking these 3 segments instead, and will feed k=3 into our k-means clustering.
We use scikit-learn library to conduct the k-means. The K-means algorithm aims to choose centroids that minimise the inertia, or within-cluster sum of squared criterion:
We set k=3, and the following table shows the profiles of the different clusters.
From the number of orders and ratio, we conclude the following:
Segment 1 are the "Surfers", they tend to jump around from one new thing to the next, trying out new stuff for a while before moving on to the next hip thing.
Segment 2 are the "Benefit Seekers", they quickly churn after their first order, after using the voucher provided on signup.
Segment 3 are the "Committed Customers", they remain loyal to HookCoffee even after a long time, making as many as 14 orders, and are the biggest average spenders among the 3 spenders. Hookcoffee’s job is to keep them committed, and to move more people to this segment/get more people to join this segment.
We calculated the ratio for each variable of each segment and compared it against the population; we see that there is low similarity between the profiling attributes of the segments found, as shown below:
The closer the value is to 1, the less different it is from the population, and the less significant it is. The further it is from 1, for example one-offs for segment 3, the more significant and different it is from the population. From this, we can see that we can target segment 3 with new coffees that are one-off exclusive, for example.
Robustness of Clusters
We validate the robustness of our cluster by comparing our results against that of hierarchical clustering using Ward’s Method. We get the following overlap in the clusters:
We see that our method is fairly robust, especially for Segments 1 and 2.
Quality of Product
Note: Contents for Software Architecture Redesign can be found in our Mid-Term Wiki. The Use of Adapter Pattern and Capturing Mailchimp statistics illustrated here has been refined after mid-terms to ensure clearer and better representation.
1. Use of Adapter Pattern
In order to communicate with Mailchimp services from our web application, we have to make calls to Mailchimp’s RESTful API services. However, we do not want to call these APIs directly from our controller, as it would mean that API calls to fetch similar data would be repeated.
How do we ensure that our main application can reuse the code that calls these RESTful services, as well as protect our main application from changes to Mailchimp’s API?
We solve this by using the Adapter Pattern. The high-level overview is shown below:
Python itself uses modules rather than classes, where each file is a module. This is analogous to your Utility classes in Java, and a module diagram for how it is represented as an Adapter is provided below.
Using the above as an example, we delegate the responsibility of calling MailChimp’s RESTful API to the module mailchimp_api.py. In doing so, we can centralize all of the Mailchimp API related code. If Mailchimp decides to change their API interface (e.g. add a new field), only one module will be affected (Adapter – mailchimp_api.py), and we would only have to make changes there. Below is an example of the code from mailchimp_api.py module.
When the actual call is made from the browser to our application, the resulting call graph is as below. As you can see, because all the methods in our app call Mailchimp’s RESTful services through the mailchimp_api.py module, we are able reduce change effort by reducing coupling, and the impacted modules from a change in Mailchimp’s API would only be at mailchimp_api.py. This can be seen from a modified call-graph generated using pycallgraph, as below:
2. Decorator Pattern: Capturing MailChimp Statistics
Hookcoffee extensively uses email marketing to push out their products. However, while Mailchimp provides the capability to keep track of who has opened the email and who has clicked through, for any given campaign, the activities that the prospective customer has on the website is invisible to the owners.
Mailchimp provides an eCommerce tracking option which enables web developers to record revenue generated from their campaigns, but fail to provide granularity in the sales funnel. For instance, did the customer register their interest on the site? Or did they make a purchase or was engaged in the campaign in a positive way (switching to a different type of coffee).
Previously, it was invisible to HookCoffee what happens at that level. However, by building on top of the eCommerce function, we are able to fetch who the user is and from which campaign he was from by interacting with Mailchimp’s RESTful APIs. We also recorded what the user did AFTER they clicked through the email campaign on the local database.
However, the issue came when we found out that Mailchimp actually stores critical details after the clickthrough on the request itself as parameters on the request, as shown below.
This meant that for every end point, we have to capture these parameters and store them in session, as shown below.
The naïve method is to go into every single url view endpoint and add a method to capture such data, but this proved to be impossible, due to the large number of possible endpoints. Furthermore, as the project went on, we cannot guarantee that all these endpoints will capture the data as other freelancers are also working on the same website.
We used the Decorator pattern to handle this. The decorator is an additional functionality we can add without affecting current implementations. Think of it as a plugin for a web browser.
Django handles decorators differently from an object-oriented language. Each decorator is actually a list of functions that are invoked (before it returns the original function). Here, we use an analogous model to display how it is.
The decorator function that we have defined is shown below:
There exists a plugin to bind decorators to a url.py file, however, there was no indication of how we would do it on the same file. We dig deeper and realize the urls.py is automatically exported as a list in Django. Now, we store all URL patterns as a list of unwrapped (or undecorated) URL patterns. We then hook our own decorator (processMailchimpParams, which takes a request and stores the mailchimp campaign id and resolve the email address if it exists) to the list of unwrapped URL patterns, thereby causing it to execute for every endpoint (with the exception of static pages, e.g. images). This enables an easy to use solution that is easily extensible by other developers on the team, as modifications are made on the same file.
Issues arising from ordering of decorators
We were contacted by one of the freelancers working on the internationalization, and he was trying to figure out how to make an external library which made use of decorators for translating between languages (solid_i18n_patterns).
We worked together with the freelancer to resolve the issue. The first issue our found was that the solid_i18n_patterns were being applied first before our decorator, causing the internationalized URL to be lost. In this case, the Chinese version of the site could not be accessed.
We that made sure that the decorator was only applied for URLs that we needed to capture, as the regular expression pattern that matches URLs in the internationalization had some side effects that prevented us from applying it to all URLs using a regular expression, hence by specifying the customer facing endpoints in a separate list (here we put it in the unwrapped_translated_patterns), we managed to solve the issue in a timely fashion.
|Project Management||Project Schedule||Project Schedule|
|Meeting Minutes||Internal, Sponsor and Supervisor Meeting Minutes|
|Risk Management||Risk Management|
|Change Management||Change Management|
|Requirements||Project Overview||Project Overview|
|Team's Motivation||Team's Motivation|
|Project Scope||Project Scope|
|User Stories||User Stories|
|Analysis||Personas and Scenarios||Personas & Scenarios|
|Design||Prototypes||Low & Mid Fidelity Prototypes|
|Project Implementation||Technologies Used||Technologies Implemented|
|Testing||User Test Plan and Results||Testing Documentation|
|Handover||Handover Procedure Timeline||See project handover timeline in 'Project Risks' section of Final wiki page|
|User Manual||Delivered via Private Folder on Dropbox|
|Developers Manual||Delivered via Private Folder on Dropbox|
|Setup Manual||Delivered via Private Folder on Dropbox|
Note that due to the NDA signed with Hook Coffee, the user manual, developers manual and setup manual will be provided via private access.
Because we cannot optimize the Python libraries (after trying both PyPDF and pdfrw), we had to render the labels with customer information, in .pdf format, in a way that ensured high performance during the regular business hours. We generated new tasks for every order to pre-process the PDFs using a task scheduler that was built on top of Python (Celery) so that new orders would have their order label pdfs ready by the time they must be printed. This sped up the process for each label from 0.47s to 0.05s for our experiment with one label, and led to an overall reduction for printing all labels by 68.09%.
In pages that require extended or large use of data, long page loading times are expected. Asynchronous calls help to alleviate this issue, rendering the longest-loading endpoint to be the page-load critical path. Also, to minimise the user’s idle time, we display data as they load, without waiting for the other endpoints. This allows the user to start using parts of the page first while waiting for other endpoints with longer loading times to be returned. Apart from improved user experience, this methodology also builds fault-tolerance into each page. This way, even if a particular endpoint fails to load, other parts of the page can continue to function.
With constant communication between our front-and-back end developers, we have also managed to minimise the use of promises/callbacks on the frontend to process data. Our endpoints allow for batch-updates, where foreseen, to allow at most one HTTP call to process each user action. This reduces the amount of round-trip time (RTT) for communication between the client and server, saving bandwidth and loading times for both parties.
We ensure high maintainability for our code base through various ways. For example, we maximize the use of functionalities that are built into Django itself, from the Role-Based Access Control System to the Decorator functionalities that we use for each endpoint. This is as our code would have to be maintained by a third party in the future, so following conventions would enable them to easily take over our work. We also configured PyCharm to use PEP8, a style guide for Python, and ensured that the majority of our code is PEP8-compliant. We have documentation for each API-endpoint so that they can be easily reused, and that is documented in the developer’s manual. We also ensure that both the development branch and production branches are separate during our development so that any urgent bug changes in the production branch can be quickly rectified and redeployed, as we are deploying live and running it for the system. We also made sure that messages in the each of the commits were succinct and documented what changes were made to the codebase, and isolated each change to each commit, so that it is easy to refer to later on by any party.
For every functionality that we build, we have ensured maximum usability by using the following design methodology:
We improved the usability of their system, for instance in the resend order functionality - from 20 clicks down to 4 clicks - an 80% improvement.
We meet our sponsor at the start and end of every sprint for the following reviews, meeting every fortnight to ensure that any problems with our system can be made known to us immediately. We also communicate remotely via Telegram to validate our work and to communicate urgent matters to our sponsor. This is also a medium for us to gather feedback on our system if our sponsor is not available to meet physically, so that we still have timely feedback to continue our work that will also meet his business needs.
Note: Facebook login will not work as it is tied to the hostname: hookcoffee.com.sg
To view application, visit: http://220.127.116.11/manager
User Acceptance Tests
Team IPMAN has conducted a total of 3 user tests which allowed us to better manage sponsor expectations as well as improve on usability of our application interface.
For more detailed version of Team IPMAN's user acceptance test results, access it here:
Sponsors' Testimonial (Ernest and Faye)
"Team IPMAN produces work of great quality, values constant communication and provides regular updates (every 2 weeks) on the progress of product development. The team is dedicated, possess strong technical capabilities and is willing and flexible with requests to improve the user experience design on the product."