Difference between revisions of "SMT483G2: AThings Overview"

From SMT Project Experience
Jump to navigation Jump to search
Line 41: Line 41:
  
 
<!-- Body -->
 
<!-- Body -->
[[File:Description & Motivation.png|1000px|frameless|center]]
+
[[]]
 
<!-- /Body -->
 
<!-- /Body -->
  

Revision as of 15:27, 29 September 2020

AThings.jpg


HOME

ABOUT US

OVERVIEW

MILESTONES

DOCUMENTATION


Phase II


[[]]

Project Background


Recent advances on the Internet of Things (IoT) have enabled a myriad of smart applications such as smart home, smart transportation, smart environment, smart healthcare, etc. According to Statista (2017), the number of smart devices around the world is estimated to be 75.44 billion in 2025. These devices are typically equipped with sophisticated sensors, such as temperature, humidity, light, face, and motion. The amount of data these devices generated and the kind of operations these devices could perform tend to be privacy-, security-, and safety-sensitive. Thus, applications operating and interacting with these devices could have become a highly attractive attack surface for attackers. Kaspersky (2019) reported that there have been more than 100 million cybersecurity attacks on IoT devices and applications in the first six months of year 2019 alone.

Security issues in IoT applications could easily lead to serious physical, financial, and psychological harms. For example, a malicious IoT app can take over the control of a smart car and threaten peoples’ lives. This project contributes to the advancement of information technology in terms of detecting anomalies in IoT applications that could cause such catastrophic affects.

After detecting anomalous behaviors of IoT applications and generating random test cases, it is important to find the sequence of events to systematically deal with inter-dependencies and diverse nature of IoT ecosystem.

Summary


Scope: Applications on web-based SmartThings IDE2 platform

Aim: To identify anomalous sensitive operations (operations that requires access to sensitive information and actions, such as the permission to send SMS, the opening of Smart locks) performed by IoT applications

Phase I

During Phase I of the project, program analysis is used to generate valid inputs and track coverage of sinks. The output produced shows sequences of generated events and corresponding actions performed by the application, to be reviewed by the tester for possible anomalies (in the form of log files).

In this stage, an automatic testing tool has been built for test generation for sensitive operation by apps, where there is an improvement in coverage of sinks by 184% (where sensitive operation taps on, by this, we are able to increase the scope of test case performed) and produce 20% fewer test cases compare to ad-hoc testing approach.

Phase II

week 1 - week 7

In part 1 of phase II, we are processing the log files handed over from phase II and prepare it for topic modeling, in which we obtain the most dominant topics that prevail in the apps. Each topic generated from topic modeling would be in a combination of words, and each app can be viewed as a probability distribution of topics. In this stage, such probability distribution data is then fed into unsupervised high dimensional clustering algorithms, producing clusters based on the similarity between the topic distribution of the apps.

In summary, we are building our clustering models in 3 steps:

  1. Data Analysis:​generating statistics and understanding from the raw data (log files)​, and find the noise in the data set
  2. Natural Language Processing: log files are processed to remove the noise in the data, LDA is performed for topic modeling among all apps, producing dominant topics and probability distribution among these topics for each of the app.
  3. App Clustering: probability distribution among topics for apps is feed into the clustering algorithm, producing clusters that cluster apps base on their similarity in their probability distribution.



week 8 - week 13

In the second part of phase II, we will be experimenting with the clustering model, exploring other algorithms, and testing our model with more data if time permits.

The main focus here would be to compare malicious app behavior with trusted app behavior to identify the malicious app, for any new app, we will first classify it into one of the clusters that we have. Next, we will examine the operations performed by the app, and compare it to the range of operations that the legitimate apps in the cluster would perform, for identification of malicious behavior.


About Our Data


Test cases has been automatically generated for each of the apps (60 in total) that we are interest in using the tool that is built in phase I, and they are compiled into a single log file individually.

In each of the log file, parameters regarding the identities of apps, their operations, test case duration and events that are trigger during the test are recorded in txt files, in the form of key-value pairs.