Difference between revisions of "Group01 Report"
Yc.lim.2016 (talk | contribs) |
Yc.lim.2016 (talk | contribs) |
||
Line 81: | Line 81: | ||
|- | |- | ||
| <div style="font-family:Verdana; border-radius: 1px "> | | <div style="font-family:Verdana; border-radius: 1px "> | ||
− | In designing our R Shiny application to visualise suspicious Honeypot activity, we consulted Stephen Few’s whitepaper on the common pitfalls of dashboard design[1]. We found that our initial wireframe suffered from some of the pitfalls, namely displaying excessive detail, arranging the data poorly and misusing colour. Thus, we went back to the drawing board and arrived at a new three-tiered design that avoided those pitfalls. <br><br> | + | In designing our R Shiny application to visualise suspicious Honeypot activity, we consulted Stephen Few’s whitepaper on the common pitfalls of dashboard design<sup>[1]</sup>. We found that our initial wireframe suffered from some of the pitfalls, namely displaying excessive detail, arranging the data poorly and misusing colour. Thus, we went back to the drawing board and arrived at a new three-tiered design that avoided those pitfalls. <br><br> |
The three tiers are arranged sequentially to create a narrative to help users pinpoint and identify suspicious activity. The first tier utilises a control chart to quickly separate time periods with highly abnormal activity from those without. From there, the user can move on to the second tier to explore the sources of those activity, through a Trellis plot of the connections in the form of networks. Finally, the user can interact with the third tier to visualise the connections as flows in a Sankey graph, to pinpoint the exact minute and quantity of the flows. <br><br> | The three tiers are arranged sequentially to create a narrative to help users pinpoint and identify suspicious activity. The first tier utilises a control chart to quickly separate time periods with highly abnormal activity from those without. From there, the user can move on to the second tier to explore the sources of those activity, through a Trellis plot of the connections in the form of networks. Finally, the user can interact with the third tier to visualise the connections as flows in a Sankey graph, to pinpoint the exact minute and quantity of the flows. <br><br> | ||
− | By structuring our visualizations as a narrative, the dashboard will show only the necessary and useful information. Furthermore, we referred to Stephen Few’s article on the practical use of colours in charts[2] to ensure that our dashboard is appropriately coloured. <br><br> | + | By structuring our visualizations as a narrative, the dashboard will show only the necessary and useful information. Furthermore, we referred to Stephen Few’s article on the practical use of colours in charts<sup>[2]</sup> to ensure that our dashboard is appropriately coloured. <br><br> |
==== Control Chart ==== | ==== Control Chart ==== | ||
Line 91: | Line 91: | ||
==== Sankey Chart ==== | ==== Sankey Chart ==== | ||
To this point, the user has identified the day which has significant suspicious activities using the control chart, and the countries where the activities originate using the network graphs. The final step of this visualization journey is to find out the exact timing of the activities, which will allow cybersecurity experts to perform further investigations and take targeted measures. For instance, the experts could explore the data to see if the attackers were targeting certain TCP or UDP ports which by extension would suggest that certain applications running at those timings contained exploitable loopholes or backdoors.<br><br> | To this point, the user has identified the day which has significant suspicious activities using the control chart, and the countries where the activities originate using the network graphs. The final step of this visualization journey is to find out the exact timing of the activities, which will allow cybersecurity experts to perform further investigations and take targeted measures. For instance, the experts could explore the data to see if the attackers were targeting certain TCP or UDP ports which by extension would suggest that certain applications running at those timings contained exploitable loopholes or backdoors.<br><br> | ||
− | The Sankey charts in our application were created using the Plotly package[3]. We chose Plotly as it is designed to quickly create aesthetically-pleasing and reactive D3 plots that are especially powerful in web applications e.g. dashboards. It supports a wide range of chart types and in the case of Sankey, there are many different programmable attributes that allow for a high level of control over the visual design. <br><br> | + | The Sankey charts in our application were created using the Plotly package<sup>[3]</sup>. We chose Plotly as it is designed to quickly create aesthetically-pleasing and reactive D3 plots that are especially powerful in web applications e.g. dashboards. It supports a wide range of chart types and in the case of Sankey, there are many different programmable attributes that allow for a high level of control over the visual design. <br><br> |
The data were also transformed to fit the requirements of Plotly’s Sankey algorithm. The various source countries and Singapore Honeypot were designated as nodes while connections were defined as links, with the number of connections as the value of said links. Nodes were also assigned colours using RGB values. | The data were also transformed to fit the requirements of Plotly’s Sankey algorithm. The various source countries and Singapore Honeypot were designated as nodes while connections were defined as links, with the number of connections as the value of said links. Nodes were also assigned colours using RGB values. | ||
In our dataset, there is a clear-cut attempt from Iran on 6th May 2013. This attempt was detected and caught by Singapore’s AWS Honeypot. The Sankey chart for this Iranian effort (Figure 1) shows both the exact timing and quantity of the connections. There was a huge surge of 405 connections at 18:41 hours, with 107 connections in the minute before.<br><br> | In our dataset, there is a clear-cut attempt from Iran on 6th May 2013. This attempt was detected and caught by Singapore’s AWS Honeypot. The Sankey chart for this Iranian effort (Figure 1) shows both the exact timing and quantity of the connections. There was a huge surge of 405 connections at 18:41 hours, with 107 connections in the minute before.<br><br> |
Revision as of 16:08, 13 August 2018
LINK TO PROJECT GROUPS:
Please Click Here -> [1]
Cybersecurity
|
|
|
|
Contents
Introduction
PUT YOUR CONTENT HERE |
Objective and Motivations
PUT YOUR CONTENT HERE |
Previous Works
PUT YOUR CONTENT HERE |
Dataset and Data Preparation
PUT YOUR CONTENT HERE |
Design Framework and Visualization Methodologies
In designing our R Shiny application to visualise suspicious Honeypot activity, we consulted Stephen Few’s whitepaper on the common pitfalls of dashboard design[1]. We found that our initial wireframe suffered from some of the pitfalls, namely displaying excessive detail, arranging the data poorly and misusing colour. Thus, we went back to the drawing board and arrived at a new three-tiered design that avoided those pitfalls. Control ChartPUT HERE Network ChartPUT HERE Sankey ChartTo this point, the user has identified the day which has significant suspicious activities using the control chart, and the countries where the activities originate using the network graphs. The final step of this visualization journey is to find out the exact timing of the activities, which will allow cybersecurity experts to perform further investigations and take targeted measures. For instance, the experts could explore the data to see if the attackers were targeting certain TCP or UDP ports which by extension would suggest that certain applications running at those timings contained exploitable loopholes or backdoors. Fig. 1. The number of connections from Iran by the minute on 6 May Contrast the Sankey chart of connections from Iran to that of connections from Japan (Figure 2). While Japanese connections were more frequent (i.e. occurring at more timings across the day) than Iranian ones, there were less of them for each given minute. This stark contrast is further proof that the Iranian connections were indeed carried out as an attack. Fig. 2. The number of connections from Japan by the minute on 6 May |
Insights and Implications
It is impossible to tell if a single network connection is an attack or not. While the intent could be malicious, it could also be benign, for instance when a person makes a typo in the URL and thereby establishing a connection by mistake. |
Limitation and Future Work
The app in its current iteration is not designed for real-time monitoring. Future work would include adapting the code to ingest real-time data and create a loop to refresh the analysis periodically. The time taken to refresh the analysis would be shorter than the interval in which network traffic is analysed for suspicious activity. The app could be deployed within Big Data Architecture that use Apache Spark for analysis, which is a common solution, as the Spark engine comes with APIs for R. In fact, with enough data points, the dashboard could even be expanded to include a predictive module that anticipates where and when the next cyber-attack will take place. |
Conclusion
This project attempts to tackle the complexity of cybersecurity and visualise suspicious attacks that are highly likely to be actual attacks in a meaningful and intuitive manner. That is not an easy task given that cyber-attacks can take place at any time, from anywhere, at any intensity (e.g. number of connections) and in many different forms. Hence tools to aid cybersecurity experts in detecting and defending against cyber-attacks need to continually be refined and upgraded. This project is a first step in that direction. |
References
[1] Shneiderman, B. (2005) “The eyes have it: A task by data type taxonomy for information visualization” IEEE Conference on Visual Languages (VL96), pp. 336-343 |