Difference between revisions of "Group01 Report"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 129: Line 129:
 
|}
 
|}
 
<!--Limitation and Future Work-->
 
<!--Limitation and Future Work-->
 +
 
<!--Conclusion-->
 
<!--Conclusion-->
  

Revision as of 15:25, 13 August 2018

LINK TO PROJECT GROUPS:
Please Click Here -> [1]

1718T3 Group1 hacking.png

Cybersecurity

Proposal

Poster

Application

Report


Introduction

PUT YOUR CONTENT HERE

Objective and Motivations

PUT YOUR CONTENT HERE

Previous Works

PUT YOUR CONTENT HERE

Dataset and Data Preparation

PUT YOUR CONTENT HERE

Design Framework and Visualization Methodologies

In designing our R Shiny application to visualise suspicious Honeypot activity, we consulted Stephen Few’s whitepaper on the common pitfalls of dashboard design[1]. We found that our initial wireframe suffered from some of the pitfalls, namely displaying excessive detail, arranging the data poorly and misusing colour. Thus, we went back to the drawing board and arrived at a new three-tiered design that avoided those pitfalls.

The three tiers are arranged sequentially to create a narrative to help users pinpoint and identify suspicious activity. The first tier utilises a control chart to quickly separate time periods with highly abnormal activity from those without. From there, the user can move on to the second tier to explore the sources of those activity, through a Trellis plot of the connections in the form of networks. Finally, the user can interact with the third tier to visualise the connections as flows in a Sankey graph, to pinpoint the exact minute and quantity of the flows.

By structuring our visualizations as a narrative, the dashboard will show only the necessary and useful information. Furthermore, we referred to Stephen Few’s article on the practical use of colours in charts[2] to ensure that our dashboard is appropriately coloured.

Control Chart

PUT HERE

Network Chart

PUT HERE

Sankey Chart

To this point, the user has identified the day which has significant suspicious activities using the control chart, and the countries where the activities originate using the network graphs. The final step of this visualization journey is to find out the exact timing of the activities, which will allow cybersecurity experts to perform further investigations and take targeted measures. For instance, the experts could explore the data to see if the attackers were targeting certain TCP or UDP ports which by extension would suggest that certain applications running at those timings contained exploitable loopholes or backdoors.

The Sankey charts in our application were created using the Plotly package[3]. We chose Plotly as it is designed to quickly create aesthetically-pleasing and reactive D3 plots that are especially powerful in web applications e.g. dashboards. It supports a wide range of chart types and in the case of Sankey, there are many different programmable attributes that allow for a high level of control over the visual design.

The data were also transformed to fit the requirements of Plotly’s Sankey algorithm. The various source countries and Singapore Honeypot were designated as nodes while connections were defined as links, with the number of connections as the value of said links. Nodes were also assigned colours using RGB values. In our dataset, there is a clear-cut attempt from Iran on 6th May 2013. This attempt was detected and caught by Singapore’s AWS Honeypot. The Sankey chart for this Iranian effort (Figure 1) shows both the exact timing and quantity of the connections. There was a huge surge of 405 connections at 18:41 hours, with 107 connections in the minute before.

1718T3 IRAN Sankey Singapore Honeypot.jpeg


Fig. 1. The number of connections from Iran by the minute on 6 May

Contrast the Sankey chart of connections from Iran to that of connections from Japan (Figure 2). While Japanese connections were more frequent (i.e. occurring at more timings across the day) than Iranian ones, there were less of them for each given minute. This stark contrast is further proof that the Iranian connections were indeed carried out as an attack.

1718T3 JAPAN Sankey Singapore Honeypot.jpeg


Fig. 2. The number of connections from Japan by the minute on 6 May


Insights and Implications

It is impossible to tell if a single network connection is an attack or not. While the intent could be malicious, it could also be benign, for instance when a person makes a typo in the URL and thereby establishing a connection by mistake.

To diagnose a cyber-attack requires us to analyse network traffic by specific temporal intervals. An abnormal amount of traffic during a interval would call for further investigation and intervention. From the dataset that has been explored in this project, intervals of one minute would be a good starting point for detecting unusual network activity.

Cyber-attacks could happen at any time, on any day. Attackers could programme their attempts to be carried out at specific timings by machines, or even to be triggered by certain events (e.g. phishing emails where an unsuspecting user clicks on a link that then triggers the installation of malware). This dataset contains connections across all twenty-four hours of any given day. While the Iranian attack took place between 18:40 to 18:41 hours, it could easily have had taken place in the wee hours instead. Therefore, it is important for defenders to possess real-time capabilities to detect and fend off cyber-attacks.

Cyber-attacks are also getting more and more complex. While this dataset does not give a full account of the complexities involved, the high variability in the IP addresses and TCP/UDP ports of the connections already proved to be quite challenging for us to interpret and manipulate for analysis. Furthermore, the volume of cyber-attacks is also growing as techniques become more sophisticated and the cost of machines become lower. Hence defenders would need to invest correspondingly in both powerful hardware and software, to implement advanced cyber-defence techniques.

Limitation and Future Work

The app in its current iteration is not designed for real-time monitoring. Future work would include adapting the code to ingest real-time data and create a loop to refresh the analysis periodically. The time taken to refresh the analysis would be shorter than the interval in which network traffic is analysed for suspicious activity. The app could be deployed within Big Data Architecture that use Apache Spark for analysis, which is a common solution, as the Spark engine comes with APIs for R. In fact, with enough data points, the dashboard could even be expanded to include a predictive module that anticipates where and when the next cyber-attack will take place.

The interactivity of the app could also be enhanced. More control elements could be implemented for users to perform their own exploration of the data. For example, the current Sankey visualisation only allows users to examine the connections by source country e.g. Iran but does not allow users to specify specific timings to inspect.

While the Sankey and Network visualisations currently perform specific and distinct functions within the app, they could potentially overlap in terms of the type of information that can be conveyed to users. Hence future work would include tweaking the coding to see if either one could be omitted for an even simpler App interface.


Conclusion

This project attempts to tackle the complexity of cybersecurity and visualise suspicious attacks that are highly likely to be actual attacks in a meaningful and intuitive manner. That is not an easy task given that cyber-attacks can take place at any time, from anywhere, at any intensity (e.g. number of connections) and in many different forms. Hence tools to aid cybersecurity experts in detecting and defending against cyber-attacks need to continually be refined and upgraded. This project is a first step in that direction.


References

[1] Shneiderman, B. (2005) “The eyes have it: A task by data type taxonomy for information visualization” IEEE Conference on Visual Languages (VL96), pp. 336-343

[2] About BP: https://en.wikipedia.org/wiki/BP

[3] Data: http://www.bp.com/statisticalreview

[4] http://ryanhafen.com/blog/geofacet

[5] https://hafen.github.io/geofacet/

[6] https://github.com/timelyportfolio/sunburstR

[7] R Packages Description: https://cran.r-project.org