Difference between revisions of "Group01 Dataset Overview"
(3 intermediate revisions by the same user not shown) | |||
Line 62: | Line 62: | ||
In simple terms, Honeypot is a trap for network attacks, and it records the IP addresses of such attacks.<br /> | In simple terms, Honeypot is a trap for network attacks, and it records the IP addresses of such attacks.<br /> | ||
− | As described by Amazon Web Service (AWS)[https://docs.aws.amazon.com/solutions/latest/aws-waf-security-automations/architecture.html], a | + | As described by Amazon Web Service (AWS)[https://docs.aws.amazon.com/solutions/latest/aws-waf-security-automations/architecture.html], a honeypot is a security mechanism intended to lure and deflect an attempted attack. AWS’s honeypot is a trap point that one can insert into website to detect inbound requests from content scrapers and bad bots. The IP addresses are recorded if a source accesses the honeypot. |
==Overview of the AWS Honeypot Cyberattack== | ==Overview of the AWS Honeypot Cyberattack== | ||
− | The test dataset of AWS Honeypot Cyberattack is retrieved from Kaggle | + | The test dataset of AWS Honeypot Cyberattack is retrieved from Kaggle [https://www.kaggle.com/casimian2000/aws-honeypot-attack-data/data%20 https://www.kaggle.com/casimian2000/aws-honeypot-attack-data/data ] <br /> |
<br /> | <br /> | ||
Line 95: | Line 95: | ||
! Field # !! Dataset Field !! Comments !! Details | ! Field # !! Dataset Field !! Comments !! Details | ||
|- | |- | ||
− | | | + | | 1 || datatime || Packet Arrival Date || From 03/03/2013, 09:53:00PM |
+ | To 07/24/2013 07:47:00AM | ||
+ | 185k entries | ||
|- | |- | ||
− | | | + | | 2 || host || Honeypot Server || 9 Categories: |
+ | # Eu | ||
+ | # Oregon | ||
+ | # Sa | ||
+ | # Singapore | ||
+ | # Sydney | ||
+ | # Tokyo | ||
+ | # US East | ||
+ | # Groucho-norcal | ||
+ | # Zeppo-norcal | ||
|- | |- | ||
− | | | + | | 3 || src || Packet Source|| 70k different sources |
+ | |- | ||
+ | | 4 || proto || Packet Protocol Type || ICMP, TCP, UDP | ||
+ | |- | ||
+ | | 5 || type || Packet Type || 8 different types | ||
+ | |- | ||
+ | | 6 || spt || Source Port || 46k ports | ||
+ | |- | ||
+ | | 7 || dpt || Destination Port || 4k ports | ||
+ | |- | ||
+ | | 8 || srcstr || Source IP Address || 70k addresses | ||
+ | |- | ||
+ | | 9 || cc || Source Country Code || 178 countries | ||
+ | |- | ||
+ | | 10 || country || Source Country || 178 countries | ||
+ | |- | ||
+ | | 11 || locale || Source Location || 1k locations | ||
+ | |- | ||
+ | | 12 || localeabbr || Location Abbreviation || 614k entries<br /> | ||
+ | Note the grouping is not effective, making this field redundant for reference | ||
+ | |||
+ | |- | ||
+ | | 13 || postalcode || Postal Code || 3k entries | ||
+ | |- | ||
+ | | 14 || Latitude || Source Latitude || 5k entries | ||
+ | |- | ||
+ | | 15 || Longitude || Source Longtitude || 5k entries | ||
+ | |- | ||
+ | | 16 || F16 || A dummy field for those with longitude > 20,000 || There are 6 types in this category | ||
+ | * Null is for everything else | ||
+ | * Those with numbers refer to Washington D.C | ||
+ | |||
|} | |} | ||
− | + | Acknowledgement:the interpretation of the datasets have been assisted with below reference.<br /> | |
− | https://emreovunc.com/projects/honeypots_data_analysis.pdf | + | [https://emreovunc.com/projects/honeypots_data_analysis.pdf%20 https://emreovunc.com/projects/honeypots_data_analysis.pdf ]<br /> |
− | https://www.kaggle.com/jonathanbouchet/aws-honeypot/notebook | + | https://www.kaggle.com/jonathanbouchet/aws-honeypot/notebook<br /> |
− | |||
Line 117: | Line 158: | ||
# Postcode + Geographic data | # Postcode + Geographic data | ||
* Time log is available | * Time log is available | ||
+ | <br /> | ||
We can run a few analyses | We can run a few analyses |
Latest revision as of 22:55, 22 July 2018
LINK TO PROJECT GROUPS:
Please Click Here -> [1]
|
|
|
|
|
|
|
|
|
Contents
What is Honeypot?
In simple terms, Honeypot is a trap for network attacks, and it records the IP addresses of such attacks.
As described by Amazon Web Service (AWS)[2], a honeypot is a security mechanism intended to lure and deflect an attempted attack. AWS’s honeypot is a trap point that one can insert into website to detect inbound requests from content scrapers and bad bots. The IP addresses are recorded if a source accesses the honeypot.
Overview of the AWS Honeypot Cyberattack
The test dataset of AWS Honeypot Cyberattack is retrieved from Kaggle https://www.kaggle.com/casimian2000/aws-honeypot-attack-data/data
We use Tableau Prep to run an overview of the data before any analysis.
Analysis of Data fields
Field # | Dataset Field | Comments | Details |
---|---|---|---|
1 | datatime | Packet Arrival Date | From 03/03/2013, 09:53:00PM
To 07/24/2013 07:47:00AM 185k entries |
2 | host | Honeypot Server | 9 Categories:
|
3 | src | Packet Source | 70k different sources |
4 | proto | Packet Protocol Type | ICMP, TCP, UDP |
5 | type | Packet Type | 8 different types |
6 | spt | Source Port | 46k ports |
7 | dpt | Destination Port | 4k ports |
8 | srcstr | Source IP Address | 70k addresses |
9 | cc | Source Country Code | 178 countries |
10 | country | Source Country | 178 countries |
11 | locale | Source Location | 1k locations |
12 | localeabbr | Location Abbreviation | 614k entries Note the grouping is not effective, making this field redundant for reference |
13 | postalcode | Postal Code | 3k entries |
14 | Latitude | Source Latitude | 5k entries |
15 | Longitude | Source Longtitude | 5k entries |
16 | F16 | A dummy field for those with longitude > 20,000 | There are 6 types in this category
|
Acknowledgement:the interpretation of the datasets have been assisted with below reference.
https://emreovunc.com/projects/honeypots_data_analysis.pdf
https://www.kaggle.com/jonathanbouchet/aws-honeypot/notebook
What to analysis?
Without a doubt, the dataset requires data cleaning as the work proceeds. However, based on the analysis of the field, it is clear that
- The targets/destinations are 8 different servers (host)
- The attackers are from various sources around the world
- IP addresses
- Counties + cities
- Postcode + Geographic data
- Time log is available
We can run a few analyses
- Basic statistics of the data
- Advance visualisation of the data
- Animation of attacks showing “Origin Vs Destination” over the time log