IS480 Team wiki: 2009T2 Mob5tars
Contents
Our Team and Project Roles
Members |
Roles |
Project Management Responsibilities |
|
Saurav |
Project Manager |
|
|
Sunita |
Lead Developer Front End |
|
|
Shaun |
Lead Developer Back End |
||
Palak |
Document Manager |
|
|
Nayeem |
Quality Assurance Manager |
|
Stakeholders
Stakeholders |
Name |
Company |
Interest |
Client |
Mr.Paul Lim |
Consultant, iForce Consulting Pte Ltd. |
Force Consulting Pte. Ltd is a business technology firm that enables companies to achieve sustained competitive advantage by engaging in strategic IT recommendations to solution implementation. Our client is keen to understand on the emerging new concept of search engine optimization by leveraging on cloud computing which could later be incorporated into new software for its own customers. |
Supervisor |
Mr. Kevin Steppe |
Lecturer of Information Systems, SMU |
Project Supervisor who will be advising and supervising the team. |
Technical Advisor |
Mr. Paul Lim |
Consultant, iForce Consulting Pte Ltd. |
Mr. Lim has extensive programming knowledge about the python programming language and Django framework on which the new search engine will be developed. |
Developers |
Mob5tars |
SMU SIS |
A group of SIS students that wants to learn more about new technologies and deliver in best quality a comprehensive search engine |
Project Overview
Project Description
The ever changing World Wide Web is growing at a rate too fast for a single search engine to keep up and index entirely. Each search engine crawls a small fraction of the web and indexes the content in a database. Owing to the immense size of the web, to retrieve information, a user may have to use multiple search engines to find what he or she wants. With different interfaces and different advanced search capabilities, users have to spend time learning and accessing these search engines individually in order to gather comprehensive search results thoroughly. Meta-search engines have been introduced to make such a task simpler for users.
A meta-search engine harnesses the power of different search engines to provide better coverage of the web. For a single query, a Meta-search engine transmits the query to multiple search engines and aggregates the results before they are displayed to the user.
Our meta-search engine aims to provide users a comprehensive search experience on a web application hosted on a cloud computing platform, Google App Engine using Python programming language. With an intuitive UI which promotes the use of advanced search features, such as Boolean operators and domain-name restricted search, we aim to enable users to fully harness the capabilities of modern search engines today. Search results are filtered and ranked according to an algorithm which takes into account the page ranking scores from different search engines for each page, providing users with not just a combination of search results but a ranking of pages with relevancy determined from a combination of search engine algorithms.
Scope
The project will focus on these key areas:
- Search Logic: The search logic involves running a search query on Bing/Yahoo/Google search engines using the Google Application Engine. Refer to Appendix C for the detailed explanation for the Search logic functionality.
- Screen Scraping: Selecting all search results that appear on Bing/Yahoo/Google based search engines for a particular search query. Refer to Appendix C for the detailed explanation for the Screen Scrapping functionality.
- Filtering/Ranking: Ranking the different search results retrieved from Bing/Yahoo/Google search engines using coding algorithms. This would also involve filtering of search results for the search objects. Please refer to Appendix C & D for detailed explanation for Filtering and Ranking functionality.
- Caching: Queries involving repeated search results cached for improved efficiency and faster response time.
- GUI: User friendly GUI will be designed to promote the use of advanced search features and to improve the efficiency and accuracy of the search process.
Milestones and Major Deliverables
Team Motivation
Our team has selected this project due to the following reasons:
- Challenging: We are motivated to learn both new and old technologies that we are not familiar with. We believe that is good and essential to be equipped with skills and experience of working with technologies that are fresh and new while being also well versed with older programming language such as python. Furthermore, the project deals with unfamiliar technology such as the Google Application Engine which will certainly put our technical skills as well as project management skills to the test.
- Emerging Technology: The use of cloud computing is slowly being adopted into various businesses ranging from banks to online businesses like Amazon.com. Hence, this project will help the team to familiarize and gain a strong understanding of the concept of cloud computing which will be useful in our future careers.
- Advanced Search Engine: There are small scale search engines which are currently leveraging on cloud computing to retrieve results from existing traditional search engine like Yahoo.com and Google.com. However, none of these search engines are equipped with advanced search features like searching for images, videos etc and most retrieve only the top 10 results for a particular search query. Hence, this project aims to bridge the gap between traditional Meta search engines and operating platforms to produce an improved search engine while reaping the benefits of cloud computing.
- Project Management: This project involves learning of new technologies and exposure to new coding language and framework which the team is unfamiliar with. Hence, there the risks involved are high since the team will be venturing into unchartered territory. However, this situation presents the opportunity to improve our project management skills with effective risk mitigation and time management strategies.
Project Proposal
Media:IS480_FYP Proposal_Mob5tars.pdf
Risks and Mitigation
Project Risk Analysis |
|||||
Risk Factor |
Comments/Description |
Impact |
Probability |
Risk Exposure |
Risk Treatment |
Time Management |
Progress for our final year project has started early this December, however not all members were present during the holiday period. As our project involves new technologies, it is important to factor in the time needed to transfer skills and knowledge to group members that were not present. |
3 |
3 |
5 |
Appointed project tutor that would be able to consolidate all the required study notes and assist in facilitation for information sharing. |
Misunderstanding of project requirements |
Inexperience and unfamiliarity with new technologies used may arise in different understanding of the project requirements by different team mates |
3 |
3 |
5 |
Ensure that the project requirements are constantly in check with the client. Have frequent meetings with client to update our progress. |
Amendments and modifications |
While working on the application, changes may be made to the design phase. As such, all affected parts of the project must be modified. |
2 |
3 |
4 |
Ensure documentation and commenting is properly done and is flexible with changes. Also, being far-sighted and thinking ahead of during the design stage would help to minimize errors. |
Improper documentation SOP |
As the iteration approach is adopted, some functions may have multiple existing files hence wrong versions of a particular function may be updated. Hence confusion may arise and result in inefficiency. |
3 |
2 |
4 |
Make sure that all files are backed up and comment the files with dates such that confusion would be avoided when updating files. |
Team Oriented |
Allocating the whole of December to get familiar with the technologies. Moreover, the team is being divided into 2 groups and concentrating on different technologies in the beginning. Halfway through December, groups will cross-teach each others in the team. Towards the 3rd week of December, there will also be tutorial session with the sponsor to better enhance the familiarity with the technologies. |
2 |
3 |
5 |
We have study sessions in the month of December. We research on good books that will be able to provide us with a good head start on the new technologies. The session would be led by Shaun to help each other pick up the skills. |
Team Metrics
Team Confidence Level Metric Our team has decided to do a weekly peer evaluation to assess our performance at a team as well as individual level. We have created a Google Form that needs to be filled every Saturday night, reflecting on our week's progress. The results from this are then collated on a weekly basis.
Meeting Productivity and Satisfaction Confidence Levels
Individual Commitment and Contribution Confidence Level
Project Progress So Far
Project Status
The project has been going as per schedule as both inception and elaboration phases have been completed on time. We are currently in the Construction Phase 2. Our deliverable at the end of the Construction Phase 2 will comprise of an integrated meta-search engine which will be able to perform basic web queries using at least 2 search engines. This will be followed by Construction phase 3 and 4 which will focus on building the audio and video functionalities and subsequently, Transition phase 1 and 2 will focus on improving the user interface and GUI along with User Acceptance Testing.
Project Progress
Midterm Reflection
Media:IS480_FYP_Mid Term Reflection_Mob5tars.pdf
Meeting Minutes
- Meeting Minutes 1 (Media:FYP_Meeting_Minutes_1_051209.pdf)
- Meeting Minutes 2 (Media:FYP_Meeting_Minutes_2_121209.pdf)
- Meeting Minutes 3 (Media:FYP_Meeting_Minutes_3_181209.pdf)
- Meeting Minutes 4 (Media:FYP_Meeting_Minutes_4_221209.pdf)
- Meeting Minutes 5 (Media:FYP_Meeting_Minutes_5_281209.pdf)
- Meeting Minutes 6 (Media:FYP_Meeting_Minutes_6_291209.pdf)
- Meeting Minutes 7 (Media:FYP_Meeting_Minutes_7_070110.pdf)
- Meeting Minutes 8 (Media:FYP_Meeting_Minutes_8_200110.pdf)
- Meeting Minutes 9 (Media:FYP_Meeting_Minutes_9_230110.pdf)
- Meeting Minutes 10 (Media:FYP_Meeting_Minutes_10_270110.pdf)
- Meeting Minutes 11 (Media:FYP_Meeting_Minutes_11_030210.pdf)
- Meeting Minutes 12 (Media:FYP_Meeting_Minutes_12_100210.pdf)
- Meeting Minutes 13 (Media:FYP_Meeting_Minutes_13_220210.pdf)
- Meeting Minutes 14 (Media:FYP_Meeting_Minutes_14_010310.pdf)
Resources and Training Guides
Websites:
- Screen Scraping Tutorial: http://tinyurl.com/yhhxvtt
The above web site contains videos on Screen Scrapping which will be used to retrieve the search results from Google/Yahoo search engines
- Python Online Tutorial: http://tinyurl.com/yjjgmey
The above web site contains tutorial videos on learning python which will be used extensively for coding our algorithms.
- Building a Search Engine with GAE and Yahoo: http://tinyurl.com/cu6nrm
This website presents a rough guide on building a meta search engine using GAE
- Using GAE as your Own Content Delivery Network: http://tinyurl.com/5ecxvp
This website contain information on setting up your application on the GAE
Training Aids:
- Regular Expressions Cookbook: http://tinyurl.com/yhs5t75
The regular expressions cookbook would be invaluable for string handling (e.g. stripping text from a document) which will save a lot of programming time.
- Programming Google Application Engine 2009
This document presents a detailed overview on building a web application on the Google Application Engine.
- Dive into Python: http://diveintopython.org/
This website contains detailed instructions and explanation of coding using the python programming language.
- Regular expressions cheat sheet: http://tinyurl.com/5p6v4r
Using regular expressions presents a guide to patterns in regular expressions which will making programming easier, efficient and save time.
- Python cheat sheet: http://tinyurl.com/bfs5ja
Python cheat sheet serves as a quick reference guide to the python programming language which will help to save time.
Research:
- Using a Ranking Algorithm based on a Markov Model
Information on designing and building effective ranking algorithms to rank the search results
- Segmentation of Search Engine Results for Effective Data fusion
An approach to improve the ranking of search results
- Search Engine Prototype based on cloud computing
Presents the advantages of cloud computing in relation to search engines
- A meta search engine that learns which search engines to query
A dynamic approach to improving the process of querying a search engine.
- Tadpole: Meta Search Engine: Evaluation of Meta Search ranking strategies
An evaluative guide to existing search engine ranking strategies
- Evaluation of Result merging strategies for Metasearch Engines.
A guide to merging search results effectively and subsequently ranking the results
- A study of Blog Search
A guide to understand how searching on a blog works
- A meta search engine with Hits redundancies Filtering
A research article on filtering process for meta search engines
- A Fuzzy Search engine weighted approach to result merging for Metasearch
A guide to combining search results from different search engines and ranking them effectively to improve efficiency and accuracy.