Difference between revisions of "SMT483G2: AThings Overview"

From SMT Project Experience
Jump to navigation Jump to search
 
(11 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
<!--/HEADER-->
 
<!--/HEADER-->
  
 +
<br>
 
<!--MENU-->
 
<!--MENU-->
 
<!--rax-->
 
<!--rax-->
Line 22: Line 23:
 
[[SMT483G2: AThings_Documentation|<font color="#F5F5F5" size=2.5 family ="helvetica"><b>DOCUMENTATION</b></font>]]
 
[[SMT483G2: AThings_Documentation|<font color="#F5F5F5" size=2.5 family ="helvetica"><b>DOCUMENTATION</b></font>]]
 
|}  
 
|}  
 
 
 
  
 
<!--/MENU-->
 
<!--/MENU-->
 
 
<!-- Body -->
 
= Project Background =
 
 
<br>
 
<br>
Recent advances on the Internet of Things (IoT) have enabled a myriad of smart applications such as smart home, smart transportation, smart environment, smart healthcare, etc. According to Statista (2017), the number of smart devices around the world is estimated to be 75.44 billion in 2025. These devices are typically equipped with sophisticated sensors, such as temperature, humidity, light, face, and motion. The amount of data these devices generated and the kind of operations these devices could perform tend to be privacy-, security-, and safety-sensitive. Thus, applications operating and interacting with these devices could have become a highly attractive attack surface for attackers. Kaspersky (2019) reported that there have been more than 100 million cybersecurity attacks on IoT devices and applications in the first six months of year 2019 alone. 
 
  
Security issues in IoT applications could easily lead to serious physical, financial, and psychological harms. For example, a malicious IoT app can take over the control of a smart car and threaten peoples’ lives. This project contributes to the advancement of information technology in terms of detecting anomalies in IoT applications that could cause such catastrophic affects. 
+
<!--Sub Header Start-->
 +
<!--rax-->
 +
{| style="background-color:white; color:white padding: 5px 0 0 0;" width="100%" height=50px cellspacing="0" cellpadding="0" valign="top" border="0" |
  
After detecting anomalous behaviors of IoT applications and generating random test cases, it is important to find the sequence of events to systematically deal with inter-dependencies and diverse nature of IoT ecosystem. 
+
| style="vertical-align:top;width:15%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:13px; border-bottom:3px solid #7c0a02; font-family:helvetica"> [[SMT483G2: AThings Overview| <font color="#232D34"><b>Project Overview</b></font>]]
<br>
 
  
= Summary =  
+
| style="vertical-align:top;width:15%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:13px; border-bottom:1px solid #7c0a02; font-family:helvetica"> [[SMT483G2: A-Things Phase I | <font color="#232D34"><b>Phase I </b></font>]]
<br>
 
<b>Scope:</b> Applications on web-based SmartThings IDE2 platform
 
  
<b>Aim:</b> To identify anomalous sensitive operations (operations that requires access to sensitive information and actions, such as the permission to send SMS, the opening of Smart locks) performed by IoT applications
+
| style="vertical-align:top;width:15%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:13px; border-bottom:1px solid #7c0a02; font-family:helvetica"> [[SMT483G2: A-Things Phase II| <font color="#232D34"><b>Phase II </b></font>]]
<br><br>
 
== Phase I ==
 
During Phase I of the project, program analysis is used to generate valid inputs and track coverage of sinks. The output produced shows sequences of generated events and corresponding actions performed by the application, to be reviewed by the tester for possible anomalies (in the form of log files).
 
  
In this stage, an automatic testing tool has been built for test generation for sensitive operation by apps, where there is an improvement in coverage of sinks by 184% (where sensitive operation taps on, by this, we are able to increase the scope of test case performed) and produce 20% fewer test cases compare to ad-hoc testing approach.
+
| style="vertical-align:top;width:15%;" | <div style="padding: 3px; font-weight: bold; text-align:center; line-height: wrap_content; font-size:13px; border-bottom:1px solid #7c0a02; font-family:helvetica"> [[SMT483G2: A-Things Data Required| <font color="#232D34"><b>About Our Data</b></font>]]
<br><br>
 
== Phase II ==
 
  
 +
|}
 +
<!--Sub Header End-->
  
=== <b>week 1 - week 7</b> ===
+
<!-- Body -->
  
In part 1 of phase II, we are processing the log files handed over from phase II and prepare it for topic modeling, in which we obtain the most dominant topics that prevail in the apps. Each topic generated from topic modeling would be in a combination of words, and each app can be viewed as a probability distribution of topics. In this stage, such probability distribution data is then fed into unsupervised high dimensional clustering algorithms, producing clusters based on the similarity between the topic distribution of the apps. 
+
{|style="background-color:#fff" width="100%"  |
 +
| style="background-color:#fff;  border-bottom:0px solid #fff; border:solid #fff; color:#fff"|
 +
[[File:3.jpg|650px|frameless|center]]
 +
|}
  
In summary, we are building our clustering models in 3 steps:
 
 
#  <b>Data Analysis:</b>​generating statistics and understanding from the raw data (log files)​, and find the noise in the data set
 
# <b>Natural Language Processing:</b> log files are processed to remove the noise in the data, LDA is performed for topic modeling among all apps, producing dominant topics and probability distribution among these topics for each of the app.
 
# <b>App Clustering:</b> probability distribution among topics for apps is feed into the clustering algorithm, producing clusters that cluster apps base on their similarity in their probability distribution.
 
<br><br>
 
=== <b>week 8 - week 13</b> ===
 
 
In the second part of phase II, we will be experimenting with the clustering model, exploring other algorithms, and testing our model with more data if time permits.
 
 
The main focus here would be to compare malicious app behavior with trusted app behavior to identify the malicious app, for any new app, we will first classify it into one of the clusters that we have. Next, we will examine the operations performed by the app, and compare it to the range of operations that the legitimate apps in the cluster would perform, for identification of malicious behavior. 
 
 
<br>
 
 
= About Our Data =
 
<br>
 
<br>
 
  
 
<!-- /Body -->
 
<!-- /Body -->

Latest revision as of 22:08, 29 September 2020