Group01 Dataset Overview

From Visual Analytics and Applications
Jump to navigation Jump to search

LINK TO PROJECT GROUPS:
Please Click Here -> [1]





Proposal

Dataset Overview

Statistics

Visualisation

Animation

Observation

Poster

Application


What is Honeypot?

In simple terms, Honeypot is a trap for network attacks, and it records the IP addresses of such attacks.

As described by Amazon Web Service (AWS)[2], a honeypot is a security mechanism intended to lure and deflect an attempted attack. AWS’s honeypot is a trap point that one can insert into website to detect inbound requests from content scrapers and bad bots. The IP addresses are recorded if a source accesses the honeypot.

Overview of the AWS Honeypot Cyberattack

The test dataset of AWS Honeypot Cyberattack is retrieved from Kaggle https://www.kaggle.com/casimian2000/aws-honeypot-attack-data/data

We use Tableau Prep to run an overview of the data before any analysis.

AWS Honeypont Dataset Overview.jpg



















Analysis of Data fields

Field # Dataset Field Comments Details
1 datatime Packet Arrival Date From 03/03/2013, 09:53:00PM

To 07/24/2013 07:47:00AM 185k entries

2 host Honeypot Server 9 Categories:
  1. Eu
  2. Oregon
  3. Sa
  4. Singapore
  5. Sydney
  6. Tokyo
  7. US East
  8. Groucho-norcal
  9. Zeppo-norcal
3 src Packet Source 70k different sources
4 proto Packet Protocol Type ICMP, TCP, UDP
5 type Packet Type 8 different types
6 spt Source Port 46k ports
7 dpt Destination Port 4k ports
8 srcstr Source IP Address 70k addresses
9 cc Source Country Code 178 countries
10 country Source Country 178 countries
11 locale Source Location 1k locations
12 localeabbr Location Abbreviation 614k entries

Note the grouping is not effective, making this field redundant for reference

13 postalcode Postal Code 3k entries
14 Latitude Source Latitude 5k entries
15 Longitude Source Longtitude 5k entries
16 F16 A dummy field for those with longitude > 20,000 There are 6 types in this category
  • Null is for everything else
  • Those with numbers refer to Washington D.C


Acknowledgement:the interpretation of the datasets have been assisted with below reference.
https://emreovunc.com/projects/honeypots_data_analysis.pdf
https://www.kaggle.com/jonathanbouchet/aws-honeypot/notebook


What to analysis?

Without a doubt, the dataset requires data cleaning as the work proceeds. However, based on the analysis of the field, it is clear that

  • The targets/destinations are 8 different servers (host)
  • The attackers are from various sources around the world
  1. IP addresses
  2. Counties + cities
  3. Postcode + Geographic data
  • Time log is available


We can run a few analyses

  • Basic statistics of the data
  • Advance visualisation of the data
  • Animation of attacks showing “Origin Vs Destination” over the time log