IS480 Team wiki: 2018T1 analyteaka research
Under PDPA’s guidance, we are not legally obligated to care for personal data. However, we would follow the best practice tips by exploring
1. Set out how the personal data in custody may be well-protected.
2. Classify the personal data to better manage housekeeping
3. Set clear timelines for the retention of the various personal data and cease to retain documents containing personal data that is no longer required for business or legal purposes.
4. For the transfer of personal data overseas, including the use of contractual agreements with the organizations involved in the transfer to provide a comparable standard of protection overseas.
The above classification is based on our interpretation of Federal Information Processing Standards (FIPS) publication 199 published by the National Institute of Standards and Technology as stated by Carnegie Mellon University to reflect the level of impact to the company if confidentiality, integrity or availability is compromised.
Potential Impact table
Security Objective | Low | Moderate | High |
Confidentiality | Leakage of information could be expected to have a limited adverse effect on the company’s operation, assets or individuals. | Leakage of information could be expected to have a serious adverse effect on the company’s operation, assets or individuals. | Leakage of information could be expected to have a severe or catastrophic adverse effect on the company’s operation, assets or individuals. |
Integrity | Unauthorized modification or destruction of information could be expected to have a limited adverse effect on the company’s operation, assets or individuals. | Unauthorized modification or destruction of information could be expected to have a serious adverse effect on the company’s operation, assets or individuals. | Unauthorized modification or destruction of information could be expected to have a severe or catastrophic adverse effect on the company’s operation, assets or individuals. |
Availability | The disruption of the information or system could be expected to have a limited adverse effect on the company’s operation, assets or individuals. | The disruption of the information or system could be expected to have a serious adverse effect on the company’s operation, assets or individuals. | The disruption of the information or system could be expected to have a severe or catastrophic adverse effect on the company’s operation, assets or individuals. |
Based on the above tips and impact. We decided to split the data into 5 different class.
- Class 1 contain at least 2 high impacts
- Class 2 contain at least 1 high impacts
- Class 3 contain at least 1 moderate impacts
- Class 4 contain 0 impacts
- Class 5 contain 0 impacts with easily accessible public data
CLass Level | Description | Example | Action |
---|---|---|---|
1 | Highly confidential data | CVV code, credit card number | Never stored or process. |
2 | Uniquely personally identifiable information. | Fingerprints, eye scan, session token, NRIC, password | Never stored, process and discard. |
3 | Personally identifiable information | DoB, email, address | Store only hashed value. |
4 | non-Personally identifiable information | State, city, region, subzone | Can be stored as is it |
5 | public website available content | Item details, category, item price | Can be stored as is it |
Data Analytics is the process of examining data sets in order to draw conclusions about the information they contain, increasingly with the aid of specialized systems and software.
Typical mechanisms: Database (only Data)
Typical timeframe: Offline
The outcome of analytics is informed business decisions to verify or disprove scientific models, theories and hypotheses. The typical goals is to improve efficiency, optimize processes, increase revenues etc.
The hardest part of analytics project is asking the question. As Robert Half once mentioned, "Asking the right questions takes as much skill as giving the right answers." - Robert Half
Descriptive analytics | Insight into the past:
Data operations:
|
Predictive analytics | Understanding the future:
|
Prescriptive analytics | Advise on possible outcomes:
|
Based on the above details our modules are split into the respective section
Descriptive:
- Customer Profile module
- Store Profile
- Staff profile
Predictive:
- Machine Learning
Prescriptive:
- Data visualisation module
- Analytics and reporting module
What's Machine Learning?
In a nutshell, using algorithms to parse data, learn from it, and then make a determination or predictions. To be specific, it’s a field of computer science that use statistical techniques to give computer system the ability to “learn (e.g progressively improve performance on a specific task) with data, without being explicitly programmed.
Misconceptions
However, there are some misconceptions about machine learning.
- It's not logic based, its stats based.
- It's not a solution without proper understanding and expectations.
- AI versus ML vs Deep learning Source
- AI: Human intelligence exhibited by Machines - ML: An approach to achieve AI - Deep learning: A technique for implementing Machine learning
- Lastly, there's nothing new about the concept of machine learning (it exist as early as 1950s). It just became much more relevant due to the rise of IOT devices and the potential to store endless data.
The type of machine learning algorithms Source
Machine learning algorithms can be broken down into four main categories.
1. Supervised learning In supervised learning. the input training dataset contains both the input and the desired output, called labels (wrong, correct). e.g. In other words, dataset that contains examples of the answer we desired. An example would be the spam filter, it's being provided with many example email along with their label (wrong or correct), as it learns to classify new emails.
It's also usable for prediction of numeric value based on a set of features ( potential variables that affect the end result). An example would be the sales price, the sale price is dependent on the season, location, targeted segments, and cost. There, the training set would require both the sales price and the features. The training process is continued until the model achieves the desired level of accuracy on the training data. For our case, we adopt the dataset into 70/15/15 split of training/testing/validating.
Example of supervised learning algorithms:
1. Linear regression
2. Logistic regression
3. Support vector machines
4. Decision trees and random forest
5. k-nearest neighbors
6. Neural network
2. Unsupervised learning
3. Semisupervised learning
4. Reinforcement learning
Other reference
Lets talk about Machine learning (30 mins)
Google machine learning crash course (15 hours)
Artificial Intelligence for Business – Doug Rose
Machine Learning for Absolute Beginners
Deep Learning with Python (Highly Technical)
Stanford Article on 100 years of AI
AI, Deep learning and ML : A Primer
Google Machine Learning Crash Course