Difference between revisions of "SMT483G2: AThings Overview"
Line 53: | Line 53: | ||
− | + | <b>week 1 - week 7</b> | |
In part 1 of phase II, we are processing the log files handed over from phase II and prepare it for topic modelling, in which we obtain the most dominant topics that prevail the apps. Each topic generated from topic modelling would be in a combination of words, and each app can be viewed as a probability distribution of topics. In this stage, such probability distribution data is then fed into unsupervised high dimensional clustering algorithms, producing clusters based on the similarity between the topic distribution of the apps. | In part 1 of phase II, we are processing the log files handed over from phase II and prepare it for topic modelling, in which we obtain the most dominant topics that prevail the apps. Each topic generated from topic modelling would be in a combination of words, and each app can be viewed as a probability distribution of topics. In this stage, such probability distribution data is then fed into unsupervised high dimensional clustering algorithms, producing clusters based on the similarity between the topic distribution of the apps. | ||
Line 59: | Line 59: | ||
In summary, we are building our clustering models in 3 steps: | In summary, we are building our clustering models in 3 steps: | ||
− | # | + | # <b>Data Analysis:</b> generating statistics and understanding from the raw data (log files), to understand the parameters and data format |
− | # | + | # <b>Natural Language Processing:</b> log files are processed to remove the noise in the data, LDA is performed for topic modelling among all apps, producing dominant topics and probability distribution among these topics for each of the app. |
− | # | + | # <b>App Clustering:</b> probability distribution among topics for apps are feed into clustering algorithm, producing clusters that cluster apps base on their similarity in their probability distribution. |
− | + | <b>week 8 - week 13</b> | |
<br> | <br> | ||
Revision as of 15:29, 26 September 2020
Project Background
Recent advances on the Internet of Things (IoT) have enabled a myriad of smart applications such as smart home, smart transportation, smart environment, smart healthcare, etc. According to Statista (2017), the number of smart devices around the world is estimated to be 75.44 billion in 2025. These devices are typically equipped with sophisticated sensors, such as temperature, humidity, light, face, and motion. The amount of data these devices generated and the kind of operations these devices could perform tend to be privacy-, security-, and safety-sensitive. Thus, applications operating and interacting with these devices could have become a highly attractive attack surface for attackers. Kaspersky (2019) reported that there have been more than 100 million cybersecurity attacks on IoT devices and applications in the first six months of year 2019 alone.
Security issues in IoT applications could easily lead to serious physical, financial, and psychological harms. For example, a malicious IoT app can take over the control of a smart car and threaten peoples’ lives. This project contributes to the advancement of information technology in terms of detecting anomalies in IoT applications that could cause such catastrophic affects.
After detecting anomalous behaviors of IoT applications and generating random test cases, it is important to find the sequence of events to systematically deal with inter-dependencies and diverse nature of IoT ecosystem.
Summary
Scope: Applications on web-based SmartThings IDE2 platform
Aim: To identify anomalous sensitive operations (operations that requires access to sensitive information and actions, such as the permission to send SMS, the opening of Smart locks) performed by IoT applications
Phase I
During Phase I of the project, program analysis is used to generate valid inputs and track coverage of sinks. The output produced shows sequences of generated events and corresponding actions performed by the application, to be reviewed by the tester for possible anomalies (in the form of log files).
In this stage, an automatic testing tool has been built for test generation for sensitive operation by apps, where there is an improvement in coverage of sinks by 184% (where sensitive operation taps on, by this, we are able to increase the scope of test case performed) and produce 20% fewer test cases compare to ad-hoc testing approach.
Phase II
week 1 - week 7
In part 1 of phase II, we are processing the log files handed over from phase II and prepare it for topic modelling, in which we obtain the most dominant topics that prevail the apps. Each topic generated from topic modelling would be in a combination of words, and each app can be viewed as a probability distribution of topics. In this stage, such probability distribution data is then fed into unsupervised high dimensional clustering algorithms, producing clusters based on the similarity between the topic distribution of the apps.
In summary, we are building our clustering models in 3 steps:
- Data Analysis: generating statistics and understanding from the raw data (log files), to understand the parameters and data format
- Natural Language Processing: log files are processed to remove the noise in the data, LDA is performed for topic modelling among all apps, producing dominant topics and probability distribution among these topics for each of the app.
- App Clustering: probability distribution among topics for apps are feed into clustering algorithm, producing clusters that cluster apps base on their similarity in their probability distribution.
week 8 - week 13
About Our Data