IS480 Team wiki: 2016T1 MonoChrome Midterms
- 1 Project Progress Summary
- 2 Project Management
- 3 Quality of product
- 4 Reflection
Project Progress Summary
- Achieved X-Factor (Successfully deployed and monitored 20 sensors across 2 buildings on 15 September 2016)
- Our platform is a monitoring solution that incorporated communication with devices without public IP addresses.
- Current platform is ready to monitor any devices/servers with Linux-based OS
- Current platform has successfully built in command prompt terminal specifically for every device
- Notification module successfully completed
- Conducted 3 UAT successfully
- Rescheduled 2-way communication to be completed before new live deployment date
- 2-way is more complicated then we estimated, resulted in delayed in Iteration 8
- After live deployment, there are many unexpected bugs and issues occurred. Resulted in scheduling sessions for optimizating most of the functionalities
- Planning for better anytical work for monitoring sensors to reduce false positive
|S/N||FEATURES||STATUS||CONFIDENCE LEVEL(0 - 1.0)||COMMENT|
|1||Database Module||Fully deployed and tested 100%||1||Completed|
|2||Data Collection Module||Fully deployed and tested 100%||1||Completed|
|3||Analytics Module||Fully deployed and tested 100%||1||Completed|
|4||Dashboard Module I||Fully deployed and tested 100%||1||Completed|
|5||Sensor Module||Fully deployed and tested 100%||1||Completed|
|6||Notification Module||Fully deployed and tested 100%||1||Completed|
|7||Two-way Communication Module||Fully deployed and tested 100%||1||Completed|
|8||Dashboard Module II||In progress||1||In progress|
|9||Database Collection Module II||In progress||1||In progress|
|10||Mobile Responsive Module||In progress||1||Will always revise the mobile responsiveness for every change in design|
|11||Optimization Module||In progress||1||New scope proposed by Monochrome: Archiving; Half of Security Module moved to this new module.|
|12||Account Management Module||In progress||1||"Register new user account" to be completed in Iteration 11|
|13||Dashboard Module III||Scheduled for Future Development||1||To Be Completed|
|14||Downtime Scheduler Module||Scheduled for Future Development||1||To Be Completed|
|15||Security Module||Removed upon negtiation||N.A||Removed upon negtiation|
Project Schedule (Plan Vs Actual):
Project Scope (Plan Vs Actual):
|7||UAT 2||22 Aug 2016||UAT 2||22 Aug 2016||Newly added in UAT after acceptance feedback. Rationale: Small group of users in company, require need more UAT to gather more feedbacks|
|UAT 3||8 Sept 2016||-||Scheduled to complete UAT 3 earlier before live deployment on 15 Sept|
|2-way communication||Iteration 8,9||2-way communication||Iteration 7,8||Rescheduled earlier to deliver 2 way-communication module before live deployment on 15 Sept. Dashboard module I & Database Module were pushed to the back|
|8||UAT 3||8 Sept 2016||UAT 3||13 Sept 2016||2-way communication is unexpectedly difficult, could not finish in time. Required to push UAT 3 back.|
|-||Display detailed information of the servers (Canvas Mode)||29 Aug 2016||New scope proposed by sponsor|
|9||-||Optimization Module (Archiving, Caching of charts & Enhance web application & reduce vulnerability)||Sept 2016||New scope proposed by Monochrome|
|-||Ping Sensors||Sept 2016||New scope proposed by Sponsor|
|Security Module(Hardening Raspberry Pi), Database Module (Sharding), Dashboard Module III (Display metrics for customer based only)||Sept 2016||-||Removed proposed by Monochrome|
- Task metric is used for determining whether we are still in healthy zone for completing the project.
- Till date, most of Task Metrics falls within the green zone.
- Iteration 8 have exceeded the green zone, mitigation action have been taken. The schedule is on track.
- Metrics page: Click here to visit metrics page
|TASK METRICS||BUG METRICS|
Link to view full list of project risks: Monochrome Risk Assessment
- We learnt that at different stage of the project, different risk arises.
- Because there are limited time and constraints that we faced, we have become very critical and review every new request. Through this experience, we realized the importance of prioritization. Therefore, we have revamp our change request management process to standardize the process of receiving a new request from sponsor.
- Below is the process that we have came up with for mitigating client management risk:
Below are the top 3 risk from Acceptance to Midterms Risk Management
|S/N||RISK TYPE||RISK EVENT||LIKELIHOOD||IMPACT||CATEGORY||MITIGATION|
|1||Technical Risk||2-way communication is difficult to implement, as there are no commercially available APIs||High||High||A||
|2||Client Management Risk||Project scope changes as the project progress||High||High||A||
|3||Technical Risk||Front end lacking of manpower to code the charts||High||High||A||
Sensors (Raspberry Pi)
Installed on deployed sensors are Fluentd and a Python Script. The Python script runs every 10 seconds, collecting data and sending it to Fluentd. Fluentd collects this information and forwards it to MongoDB, which sits on the Server.
Server (Softlayer - Ubuntu)
A server on Softlayer runs on Ubuntu and hosts MongoDB, a NoSQL Database that is fast and flexible. PHP, running on Apache Server, is used to pull raw data from the database and turn it into a format that is easily readable by our dashboard.
There are 2 main ways for data to be transported from our Server to our Dashboard. These are websockets and a web-based terminal. The former is handled by Ratchet, a PHP library for websockets, and ZMQ, for distributed messaging, while the latter is made possible by Wetty, a terminal emulator based on ChromeOS' hterm.
|1||Connecting to a device without a public IP address (Reverse-ssh)||Typically, terminal access to remote servers or, in this case, Raspberry Pis' is achieved through SSH. However, this requires that the Raspberry Pi have a public available IP address, something that most deployed sensors will not have.
As traditional SSH is not an option, we use a technique known as reverse-SSH to achieve remote access to sensors without a public IP address.
This involves having the Raspberry Pi make a persistent connection to one of the server's ports. With this tunnel in place, the server will be able to have remote access to the Raspberry Pi by creating an SSH connection to itself, on the same port.
|2||Pass information from backend to frontend via websockets||We needed something that was fast and scaled well, regardless of the number of users. Traditional API calls can be slow, as a new connection has to be made for every request. Websockets, on the other hand, allow for a client to make a single connection and get all information through that connection. However, there were a few problems with our first implementation of websockets.
Our dashboard deals with 5 sets of data, and the first version of our websocket involved the client sending a request for which set of data it wanted. The server would then generate this data and return it for the dashboard. However, as more users came online, the server got slower and slower, because it was essentially generating a copy of the data for every user. In addition, data displayed on the dashboard was not synced: if a sensor goes down in the health overview, it might not be immediately reflected in the watch list.
The second implementation of websockets adopted a publish subscribe model. The server generates data at a constant rate, and anyone connected to the websocket would receive that data. As it no longer depends on input from the clients, this system is not affected by the number of concurrent online users. In addition, all 5 sets of data were combined into a single JSON string, ensuring that information displayed on the dashboard is synchronized.
|3||Flapping Detection||Since the sensor is configured to send data to the database every 10 seconds. If the time difference between 2 consecutive data records for a particular sensor is > 15 seconds (10 seconds data retrieval + 5 seconds buffer), we know that the sensor is down (disconnected/powered off). We check for such ”gaps” in the 10-minute data set to determine how many times the sensor has been down. If the number of such “gaps” exceed a preset number (If number too low - false positive. If number too high - false negative), the sensor is considered to be flapping.|
Quality of product
To ensure a quality platform, we performed non-functionality testing to measure things such as performance. This is especially important in a monitoring platform where data has to be close to real-time. Over the course of the project there are a number of things we did to boost performance:
- With proper indexing, the number of documents that MongoDB has to scan to execute a query is greatly reduced. This improves query performance and efficiency.
- PHP does not have a in-built function for multi-threading. For the 5 different sets of data passed to the dashboard through the websocket, it means that the sets of data have to be generated one at a time.
- To overcome this, we used multi-curl, a tool that allowed us to create multiple PHP processes that could generate these 5 sets of data all at once, nearly halving total execution time.
- Initially, all of the raw data sat on a single collection in the database. With only 2 sensors in the beginning of the project, the system was fast at the time. However, after deploying more than 20 sensors, we noticed significant loss in speed. This was due in part to the size of the data collection. More than 20 sensors meant that the collection was growing too large too quickly for our current system to be sustainable.
- Our solution was to place older entries into an archive collection, which limits the size of the collection and query time. The server has been configured to automatically archive records older than a day every midnight.
|TOPIC OF INTEREST||LINK|
|Project Overview||Project Overview|
|Project Documentation||Use Case|
|Tech & System Architecture|
- The deployed site is live with company confidential data, so we are able to show the dashboard, however we are able to show the dashboard components and site map.
- Link to use case diagram: Use Case Diagram
- Till date, we have completed 3 UAT.
- The objective, test cases and results can be view on the UAT page
Reaching the mid-term milestone, we are still going on strong and we are proud for what we have achieved so far. We are glad that we have each other to pull through this together.
Write code that is easy to understand — As a programmer, I have a tendency to focus simply on code that works, but just as important is code that is understandable. As we progressed, I find that there are many instances where code that was written weeks or months prior has to be revisited, to be revised or optimized. Another possibility is that another programmer might have to look at my code. Easily understandable code then becomes essential and a necessity.
To be unafraid to try something new — There can be multiple different ways to solve a problem, and when researching these solutions, the best solution might not be immediately obvious. When we finally decide to implement a solution, we may find out later that it's not the best one. It is not easy to give up on our current solution in favor of a better. One example is the terminal in browser, which we need to be persistent and support multiple connections. The original solution, shellinabox, worked well, but we encountered stability issues soon after. It did not seem to be able to support multiple connections at once. In the end we had to give up on shellinabox, and implemented a more stable solution, wetty.
Drawing component diagrams helped to frame my thought processes — I was able to visualize parent-child component relationships. This was essential in managing data flow.
Revising code periodically made code maintenance easier — I managed to shorten a script by a significant number of lines by simply converting a switch case to an associative array. In some instances, it eliminated redundant or repeated code.
Yong Jin's Reflection
As a backend developer, code enhancement is always an never ending process. Writing codes that provides the correct output is easy, but writing fast codes is hard. In order to push the limits of the proc rate on the dashboard, the codes are looked into over and over again. As the proc rate decreases, it feels good to know the performance of the various functions I have developed are slowly reaching this utopia.
This has also taught me about being overconfident, as I have always been extremely confident about the my codes being error-free without any additional code reviews. The time spent over multiple reviewing and debugging sessions has made me realized there are no error-free codes from the start, but it is slowly improved on and achieved over countless debugging sessions.
Through this FYP process i have learned to be more meticulous to ensure that the application is constantly improving for the better. Wanting our users to have a good user experience, has taught me not to overlook the minute details and really understand from the users perspective.
Being open to feedback from our testers without being defensive, has also allowed us to greatly improve our application.
Wen Da's Reflection
It is an enjoyable process as a project manager. I have learned so much from acceptance feedback till today. There is no perfect planning, plans always change due to unforeseen circumstances when we started rolling out the functionalities after building backend. It is always a constant reprioritization of functionalities, re-evaluating the business value of the function with the sponsor and meeting expectations from different stakeholders.
For instance, the live deployment magnified the loopholes in our coding. On the day of live deployment, there was a surge of sensors and the amount of data collected was huge. It slowed down the whole querying time and the performance of the dashboard was greatly impacted. We had to resolve this problem immediately by proposing new scope to optimize the querying.
I am looking forward to more challenges ahead and strengthen my project management skills.