IS480 Team wiki: 2012T1 M.O.O.T/Midterm Wiki
- 1 Project Progress Summary
- 2 Project Management
- 2.1 Project Status
- 2.2 Project Schedule
- 2.3 Project Metrics
- 2.4 Project Risks
- 2.5 Technical Complexity
- 3 Quality of Product
- 4 Reflection
Project Progress Summary
Team M.O.O.T has progressed steadily since Iteration 1 -which was kick-started post Acceptance presentation; and is currently in Iteration 4, which will last up to the end of the midterm week. We also encountered a major obstacle in coming up with an alpha version of Artificial Neural Network to determine gender in Iteration 1. There was also a change in client's requirement, contributing to the delay. Confirmation of final requirement was achieved in Iteration 2, enabling team to focus and hence team managed to catch up in Iteration 2.
By midterm, team is confident of completing 70% of AlterSense features. These features include gender recognition and basic photo taking functionalities. After midterm, team will work on implementing the narrative interaction features and refining of AlterSense in general. Team is therefore confident of delivering the complete AlterSense solution by the end of week 13.
- Inclusion of machine learning
- Internal addition of stakeholders: requirement to liase with Marcom team
- Requirement changes:
- AlterSense to enable photo taking
- Gender-related content will not be limited to promotions only
- Interaction with Techy & Marlon to be replaced with photo-taking related narrative, which translates to starting over from scratch
- Analytics possibility provided by Microsoft Tag will be independently explored by CapitaMalls Asia
- Took 3 weeks to set up alpha version of Neural Network instead of only 2 weeks
- Omitted Waist-Hip-Ratio and detection of bag for gender recognition; for Kinect does not detect hip, and depth measurement to detect bag interferes with the detection of arm joint
|S/N||Feature||Status||Confidence Level (0-1)||Comments|
|1||Detect stationary shopper||Deployed & tested||1|
|2||Display shopper’s silhouette||Iteration 5||0.9||Implemented for Acceptance version|
|3||Display doors with enticing scene||Iteration 5||0.8||Inserted image too for Acceptance, but its activation may require more work|
|4||Detect shopper's gesture (reaching for a particular door)||Iteration 5||0.7|
|5||Display change in chosen door||Iteration 5||0.7|
|6||Display augmented reality scenery background||Deployed & tested||1||Refinement completed as well|
|7||Inserting instruction thought bubble||Deployed||0.99|
|8||Detect shopper raising right hand||Deployed||0.99|
|9||Countdown timer||Deployed & tested||1|
|10||Take a photo||Deployed & tested||1|
|11||Display photo||Deployed & tested||1|
|12||Measure height||Deployed & tested||1|
|13||Measure shoulder & hip width||Deployed & tested||1|
|14||Detect presence of long hair||Ongoing||0.5|
|15||Detect presence of skirt||Deployed||0.6|
|16||Determine gender of shopper||Deployed & tested||1|
|Promotions, Photo Gallery & Microsoft Tag|
|17||Determine type of promotions to display||Ongoing||0.75|
|19||Overlay of tag||Deployed||0.85||Position to be adjusted|
|20||Exit screen||Deployed & tested||1|
|22||Read tag||Deployed||0.95||Implemented for Acceptance|
|23||Update tag||Iteration 6||0.9||Implemented for Acceptance|
|24||Delete tag||Iteration 6||0.9|
|25||Integrate tag with AlterSense||Iteration 6||0.9|
|26||Retrieve analytics raw values from Microsoft Tag web service||Iteration 6||0.95||Implemented for Acceptance|
|27||Parse analytics values||Iteration 6||0.95||Implemented for Acceptance|
|28||Display parsed analytics values||Iteration 6||0.95||Implemented for Acceptance|
|1||Explore Classifier Algorithm||Week 1||Week 1||Found out about Neural Network|
|Primary research: measurement collection||24/08/12||24/08/12||58 participants (30 males, 28 females)|
|Analysis of measurement collection||Week 2||Week 2|
|Gender differences trend establishment||Week 2||Week 2||
|Gender recognition based on Waist-to-Hip Ratio||Week 2||Dropped||Decided to drop Waist-to-Hip Ratio as Kinect cannot detect waist|
|Neural Network classification to determine gender based on Waist-to-Hip Ratio||Week 2||Height measurement & face tracking on Kinect||Week 2|
|2||Neural Network classification & scoring system||Week 3||Gender classification based on height & shoulder width||Week 3|
|Physical parameter expansion: neck & shoulder width||Week 3||Shoulder width & height passed in as parameters,but not neck||Week 3||Behavioural parameters: hasBag, hasSkirt, hasLongHair (related to neck)coded but not passed in as parameters|
|3||Gender recognition based on physical & behavioural parameters||Week 4||Skirt & bag behavioural displayed, but not used for gender recognition||Ongoing|
|Countdown timer||Week 4||Week 4|
|Capturing shopper's outline||Week 4||Week 4||Edges refined as well|
|Capturing of photo||Week 5||Week 5|
|Backward propagation||Week 5||Week 5|
|Saving learning state||Week 5||Pushed to Iteration 6|
|Integration of photo taking with physical gender recognition||Week 5||Week 5|
|4||Microsoft Tag creation||Week 6||Week 6|
|Advertisement Management System page||Week 6||Week 6|
|Refining image overlaying||Week 6||Week 6|
|Incorporate tag into photo||Week 6||Week 6|
|Gender recognition: physical + behavioural||Week 6||Week 7|
|Integration of gender recognition (physical+behavioural) with photo taking||Week 6||Week 7|
Schedule & Bug Metrics
Team M.O.O.T's schedule and bug metrics can be accessed here.
Schedule metrics collected from Iteration 1 to 3 indicate that team has been progressing as expected in terms of the amount of tasks completed. However, it is important to note that though scheduled tasks are completed, they may not directly contribute to the deliverable. For example, although the team was highly prolific in Iteration 1, we only managed to estimate height through Kinect by the end of the iteration. This is due to the inability to measure waist, and some of the completed tasks were related to gender recognition using Waist-to-Hip Ratio. Difficulties in implementing Neural Network (NN) also resulted in more tasks for further exploration.
Nonetheless, velocity of 23 in Iteration 1 was an indicator that team would be able to complete significant amount of tasks within 2 weeks. Hence, Iteration 2 was scheduled to be short and intense with confidence; lasting only for a week. Team had to rapidly attempt to set up NN for gender recognition, otherwise team would have to look for an alternative. Velocity obtained was 10 for a week. Should the iteration have lasted for 2 weeks, velocity obtained could be projected to be 20. This is still within accepted zone and is comparable to that of Iteration 1. The result was also shown as team managed to predict gender using NN by the end of Iteration 2.
Iteration 3 resulted in velocity of 18, which indicates that we are still progressing at an acceptable rate. Although there seems to be a trend of decreasing velocity, it is likely due to the more focused nature of the tasks as project progresses. We have set the ideal velocity to be the average velocity of 3 iterations, to be calculated automatically by Pivotal Tracker. Hence, velocity is expected to stabilize after Iteration 3.
Bug metrics collected in Iteration 3 raised the red flag that team needed to spend more time debugging before proceeding. Time was spent eliminating the bugs, resulting in the need to push the functionality of "saving learning state" to Iteration 6.
Gender Recognition Metrics
Our gender recognition metrics can be accessed here.
Based on metrics collected from User Testing 1, we have decided to drop the behavioural parameter of bag due to its unacceptable level of accuracy. Behavioural parameter of skirt has been integrated into gender recognition due to its promising performance, whereas long hair parameter still has to be refined before being integrated.
All risks mentioned during Acceptance Presentation have been mitigated, especially that of technical risks.
Team has also foreseen several new risks:
- storage and transferring of photos
- development delay due to members' unavailability
- inability to respond appropriately to AlterSense
- granular overlaying of augmented background
For more information on these risks, access team's risk management page.
Detection of Joints
The Kinect for Windows SDK provides skeletal tracking features that allow the Kinect to recognize people and follow their actions. Within each skeleton, there is an array of 20 specific joints that is tracking the user’s movement in real-time. The values given by this joints are listed as x, y, z in a 3-dimensional space. X & Y values are listed in pixels determined by the camera resolution (in this case, 640 x 480 pixels), while the Z values are provided in millimeters after performing applying a bit-shifting formula. A bit-shift formula is required as the Kinect framework stores both depth and player information within a short datatype.
To recognize user gestures, we have to develop an understanding of how the joints values change in relation to the user gesture.
Prior to this, it is important to have a firm understanding of the Windows Presentation Framework, WPF for short, as well as how images are recorded and referenced in a method.
On a 3-dimension view, the framework returns pixels that have an associated x,y, and z value. The z value measures depth to determine how far that pixel is perceived to be away from the IR sensor.
On a 2-dimension view, the framework references the first pixel as x:0, y:0, the middle point of the image as x:320 y: 240, and the last pixel as x: 640 y:480.
On a 1-dimension view, the image is concatenated into a single array of byte values. The first pixel index would be referenced as 0, the middle point of the image would be 76799 and the last pixel would be 307200.
To aid development, we have to create a library that ensures that developers can work with RAW values (1 dimension), 2 dimensional images, and the third dimension (z value). Coming up with methods to translate between these values are important in subsequent initiatives of determining user’s physical and behavioral traits, as well as post-image processing. The Kinect SDK does not provide these methods.
In gesture recognition, we have to develop an understanding of how the values of joints change in relation to the gesture movement. For instance, raising a hand would imply a decrease in y values. Pushing your hand forward would imply a decrease in z values. Identifying the appropriate change of x, y and z values of specific joint(s) in a certain sequence would imply that the user is executing the gesture.
Due to the nature of people’s different builds and position (i.e at the side of the kinect). We deduced that hard coding x,y,z values are not adaptable to a user’s physical build. For instance, stretching out one’s right hand would be an effective gesture for a taller person since he can easily reach beyond a certain y value on the screen. On the contrary, a shorter person might have difficulties hitting the specific y value. In this case, we have to develop our methods based on the relative point of certain joints. The complexity of such gestures increases because we have to determine what would be an appropriate joint to reference to. We have other considerations such as “Would the user invoke the gesture if his joints are not detected properly?” and “Would everyone raise their hand above their heads?”. Such subtle considerations introduce complexity in gesture recognition accuracy.
The Kinect provides 3 separate streams, namely Audio, Color, Depth streams that are retrieved from the built-in sensors and cameras of the Kinect Device. These 3 streams are retrieved as a 1 dimensional data, a single array of byte values. The Kinect SDK does not provide real world values of the user’s height, clothing and image processing methods that refines the display output.
Our challenge is to determine how to interpret these bytes values and associate it with real-world elements. For instance, “how do we determine the top of a person’s head in the Kinect” and “how do we determine if the person is wearing a long skirt?” Although the Kinect does provide head as a joint, it is referencing to the middlepoint of a person’s head, which adds to a degree of inaccuracy for our purpose of finding a person’s height. Hence, we developed a method that would be able to find the end of an object (i.e. person or skirt) based on the finding that there would be a significant difference in z values between the pixel of a person and the pixel in the background.
Gender Recognition Inputs
Using tutorials from various sources, we used the codes to determine a user height. However, the results given by the codes were not accurate. Instead, we refined the method to determine the height of the person’s by using the “end of an object” method as well as calibrating the values to match real-world measurements. This method was refined over a series of testing to emulate real-world measurements as closely as possible. We also noted that the height given by the Kinect would tend to be greater than a shopper’s height as their shoes and heels are affecting the height values. Overall, the measurements given were accurate enough as an input for gender recognition. Similarity, we developed a method that would be able to detect if the user is wearing a long skirt, given a similar depth value of pixels between the knees.
Image Overlay (Green Screen)
Using the Kinect sample codes as a reference, we attempted to integrate the codes into AlterSense. The green screen feature works on the basis of 2 layers of images, the backdrop and the person. The application attempts to filter the pixels that belonged to the person, and removed the rest of the pixels, which were assumed to be the background. Subsequently, the person’s outline would be superimposed on the backdrop, creating an interesting result, also known as the green screen effect.
However, the overall display output was edgy and unclear. The hair of the user was not appearing and some parts of the user were flicking. Upon further investigation into the possible causes on why the resulting output is displayed this way, we concluded the following:
- Missing Hair: While the Kinect is able to tell which pixels belong to a skeleton. It is only effective for body parts such as head, arms and legs. This shortcoming resulted in pixels containing hair information to be omitted and out of the skeleton structure. As a result, the output did not display the hair portion of the user
- Flickering parts: As the Kinect uses IR rays to detect depth and skeletal tracking, reflective surfaces such as hair and shiny objects would reflect the rays off the Sensor, which would result in a depth of -1 and possibly give off a valid depth value at certain intervals. A depth of -1 would imply that the pixel would not belong to the skeleton. Simply put, this noise was responsible in the flickering of images at a playback of 30 frames per second.
To circumvent the aforementioned, we created a post processing method to determine if the pixel could possibly belong to the hair of user. This involved an approach to determine from the depth value that a pixel belongs to the hair and if the colour information (RGB information) of the pixel closely mimics the hair colour. We also zoned the scanning area around the head to prevent unnecessary processing as well as prevent introducing artifacts as a result of processing on the other areas of the image. We also had manage error values such as the depth value of -1. This negative value could be a result of other reasons. For instance, distances over 8m would have depth values of -1, which is a similar value to reflective surfaces; hence, we have to ensure that the pixels to be displayed have to belong to the user and not the background. Any introduction of background pixels would distort the output greatly.
With the implementation of the above method, we enabled the hair to be displayed on the screen with far less flickering. However, this method is only effective on light coloured backgrounds. Dark coloured backgrounds that closely mimics hair colour would introduce many artifacts.
In conclusion, a firm understanding of all 3 core elements: raw data stream, WPF framework elements, and the Kinect framework is required to produce a coherent output that user can see and interact.
A general overview of Neural Network (NN) can be viewed at IS480 Knowledge Base. Team had a hard time figuring out how to implement NN for gender recognition initially as there is no available sample codes for the same purpose and similar methodology. Gender recognition using NN is usually done by face tracking or voice recognition. Hence, team's method of using physical measurements estimated by Kinect is really exploratory.
After numerous futile attempts of passing in raw Kinect values and scratching head over why parameters did not seem to converge, team decided to try normalising value. Apparently our Statistics classes really came in handy, for results started coming out after normalisation of inputs! The next challenge is to systematically conduct trial-and-error to fine-tune learning rate (eta), momentum (alpha), and error threshold. Such fine-tuning required a lot of time as the whole cycle of learning and testing has to be repeated once one of the variables was altered.
Error threshold is also counter-intuitive. Ideally, we would like to lower error threshold to a zero. Nonetheless, along the way, we learnt that reducing it to zero would defeat the purpose of machine learning. Annihilating error threshold can be compared to rote learning, where certain output is expected for certain input, and a different input will simply return no output as the network will not be able to handle it. Being trapped in a dilemma between "generalisation" or "specialisation", we are still striving to achieve the optimum error threshold.
Microsoft Tag is a high capacity colour barcode, formed by clusters of coloured triangles and is believed to be more responsive than QR code. It is also chosen over QR code as a Microsoft Tag is customisable - company logo can be incorporated into the tag. The ease of changing the landing page linked to the tag is another strength of this tag. Data retrieved from Microsoft Tag Web Server upon scanning of the tag also promises the possibility of analytics. Challenge encountered in this area was in designing our own tag and integrating the tag to the server and AlterSense, as we have never worked on such thing before.
Quality of Product
|Project Management||Metrics||Schedule & Bug Metrics, Bug Log|
|Minutes||Client Meeting,Supervisor Meeting|
|Content||Interaction flow chart||Flow Chart|
|Narrative flow||Interactive Content Flow|
|Design||Use Case||Use Case|
|Architectural diagram||Architectural Diagram|
|Testing||User Testing 1 documentation||UT1|
Latest version of codes is always available on SVN Server, which can be easily deployed to any machine; as long as supporting software is installed. Hardware that we are using is provided by CapitaMalls Asia, and will be used for actual deployment. Hence, our application is ready to be deployed anytime and will be usable by target user (shoppers) with relative ease.
|Date||24 September 2012 from 12pm to 4.30pm|
70 testers - 31 females & 39 males
1. Get tester to stand at a specific distance away from the Kinect
On hindsight, testing for Software Engineering product seems so much simpler than what we have to do for AlterSense. I took quite a while to come up with the gender recognition metrics, for we have to measure things that we are not very familiar with. Proper documentation for metrics collected from User Testing 1 is particularly useful in helping us to improve AlterSense. The results helped us to gauge which were the behavioural parameters to keep, refine, and/or discard to improve gender recognition. We have become more focused in the refinement of gender recognition and development, thanks to the metrics!
As we have to present so many things on what we have done in a relatively short time of one hour during IS480 presentations, I am also learning to sieve the important things from the plethora of information. The amount of time we spent discussing on the best way to present data, what things to include in presentation slides, as well as how to make our slides more visual also have made me conversant with coming up with appealing presentation slides.
Working on Project AlterSense is really about venturing a new terrain. I was initially not very comfortable working with C# - a new language, a new development environment - Kinect with its data stream values and other properties, and definitely the Neural Network black box! We had to do a copious amount of research for Neural Network just to grasp the concept. Implementing it was another immense challenge, for the professors we consulted had head knowledge of the concept, but none had actually implemented the Neural Network themselves. This project has pushed me to really "learn to learn".
There were times things seemed bleak. This journey has not been a bed of roses, but truly all of us manage to hold on this far because of one another. We encourage one another in various forms; from buying food to readily stepping in to help out with another team member's task when the member seems to hit a cul de sac.
Six weeks of Project AlterSense has trained me to really multi-task. On top of completing my own share of work, I have to keep an eye on the rest of the team members’ tasks – to track the progress and be conscientious in following up. I need to be able to fit together jigsaw pieces from all the iterations to form the big picture, and communicate this to team members. I have also really experienced the strength of iterative development – the flexibility and ability to respond quickly to changes; which has enabled team to progress steadily despite scope changes and unexpected delays in setting up neural network.
I am also exercising stakeholder management and in fact, even team member management skills. I have to figure out what makes a certain member tick and thus leverage on it to influence team member to accomplish team’s common goal. Even allocating task has not been an easy feat, for each team member’s capability and commitment have to be taken into account. In summary, I have become more sensitive to different personalities and working styles, and have witnessed how proper documentation really helped me in planning and reviewing progress. On a more technical ground, I was very thrilled to explore artificial intelligence in the form of Neural Network. For the coming 7 weeks, I hope to motivate the team to achieve more consistent, tighter communication; which will help all members to comprehend the state of the project and therefore be able to respond accordingly, quickly.
The FYP journey has been an enriching experience. I have learnt many things about the Kinect from the development and experimentation, and hope to explore further as the project progresses. I have dabbled into new areas and languages, such as C# and creating post-processing methods that involve manipulating RGB pixels, raw data streams from a 3 dimensional perspective.
In addition, my involvement with stakeholders from various departments has provided me with real life experiences in project management and requirement gathering. IS480 has allowed me to have an experience of being in a project lifecycle, which leveraged on my skills sets and knowledge that were honed in the early years of my studies.
Project AlterSense provides me with my first real encounter with Artificial Intelligence, in the form of Neural Network (NN). We have come a long way from disentangling the cobweb of questions regarding NN. Exploring the black box and tweaking it, predicting the outcome and watching it materializing, or getting surprises from the results evoke a certain sense of joy - curiosity,more like it.
I have also picked up tips on presenting in a corporate setting from Ethan, the Manager of Business Process of CapitaMalls. Interacting with the professionals from CapitaMalls has trained my networking skill, and will definitely provide me with the confidence to interact with business partners in the future.