Difference between revisions of "Group01 Dataset Overview"
| (5 intermediate revisions by the same user not shown) | |||
| Line 62: | Line 62: | ||
In simple terms, Honeypot is a trap for network attacks, and it records the IP addresses of such attacks.<br /> | In simple terms, Honeypot is a trap for network attacks, and it records the IP addresses of such attacks.<br /> | ||
| − | As described by Amazon Web Service (AWS)[https://docs.aws.amazon.com/solutions/latest/aws-waf-security-automations/architecture.html], a | + | As described by Amazon Web Service (AWS)[https://docs.aws.amazon.com/solutions/latest/aws-waf-security-automations/architecture.html], a honeypot is a security mechanism intended to lure and deflect an attempted attack. AWS’s honeypot is a trap point that one can insert into website to detect inbound requests from content scrapers and bad bots. The IP addresses are recorded if a source accesses the honeypot. |
==Overview of the AWS Honeypot Cyberattack== | ==Overview of the AWS Honeypot Cyberattack== | ||
| − | The test dataset of AWS Honeypot Cyberattack is retrieved from Kaggle | + | The test dataset of AWS Honeypot Cyberattack is retrieved from Kaggle [https://www.kaggle.com/casimian2000/aws-honeypot-attack-data/data%20 https://www.kaggle.com/casimian2000/aws-honeypot-attack-data/data ] <br /> |
<br /> | <br /> | ||
We use Tableau Prep to run an overview of the data before any analysis. | We use Tableau Prep to run an overview of the data before any analysis. | ||
| + | |||
| + | <p>[[File:AWS Honeypont Dataset Overview.jpg|800px |left]]</p><br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
| + | <br /> | ||
==Analysis of Data fields== | ==Analysis of Data fields== | ||
| − | + | {| class="wikitable sortable" | |
| − | + | |- | |
| − | + | ! Field # !! Dataset Field !! Comments !! Details | |
| + | |- | ||
| + | | 1 || datatime || Packet Arrival Date || From 03/03/2013, 09:53:00PM | ||
| + | To 07/24/2013 07:47:00AM | ||
| + | 185k entries | ||
| + | |- | ||
| + | | 2 || host || Honeypot Server || 9 Categories: | ||
| + | # Eu | ||
| + | # Oregon | ||
| + | # Sa | ||
| + | # Singapore | ||
| + | # Sydney | ||
| + | # Tokyo | ||
| + | # US East | ||
| + | # Groucho-norcal | ||
| + | # Zeppo-norcal | ||
| + | |- | ||
| + | | 3 || src || Packet Source|| 70k different sources | ||
| + | |- | ||
| + | | 4 || proto || Packet Protocol Type || ICMP, TCP, UDP | ||
| + | |- | ||
| + | | 5 || type || Packet Type || 8 different types | ||
| + | |- | ||
| + | | 6 || spt || Source Port || 46k ports | ||
| + | |- | ||
| + | | 7 || dpt || Destination Port || 4k ports | ||
| + | |- | ||
| + | | 8 || srcstr || Source IP Address || 70k addresses | ||
| + | |- | ||
| + | | 9 || cc || Source Country Code || 178 countries | ||
| + | |- | ||
| + | | 10 || country || Source Country || 178 countries | ||
| + | |- | ||
| + | | 11 || locale || Source Location || 1k locations | ||
| + | |- | ||
| + | | 12 || localeabbr || Location Abbreviation || 614k entries<br /> | ||
| + | Note the grouping is not effective, making this field redundant for reference | ||
| + | |- | ||
| + | | 13 || postalcode || Postal Code || 3k entries | ||
| + | |- | ||
| + | | 14 || Latitude || Source Latitude || 5k entries | ||
| + | |- | ||
| + | | 15 || Longitude || Source Longtitude || 5k entries | ||
| + | |- | ||
| + | | 16 || F16 || A dummy field for those with longitude > 20,000 || There are 6 types in this category | ||
| + | * Null is for everything else | ||
| + | * Those with numbers refer to Washington D.C | ||
| + | |||
| + | |} | ||
| + | |||
| + | |||
| + | Acknowledgement:the interpretation of the datasets have been assisted with below reference.<br /> | ||
| + | [https://emreovunc.com/projects/honeypots_data_analysis.pdf%20 https://emreovunc.com/projects/honeypots_data_analysis.pdf ]<br /> | ||
| + | https://www.kaggle.com/jonathanbouchet/aws-honeypot/notebook<br /> | ||
| Line 87: | Line 158: | ||
# Postcode + Geographic data | # Postcode + Geographic data | ||
* Time log is available | * Time log is available | ||
| + | <br /> | ||
We can run a few analyses | We can run a few analyses | ||
Latest revision as of 22:55, 22 July 2018
LINK TO PROJECT GROUPS:
Please Click Here -> [1]
|
|
|
|
|
|
|
|
|
|
Contents
What is Honeypot?
In simple terms, Honeypot is a trap for network attacks, and it records the IP addresses of such attacks.
As described by Amazon Web Service (AWS)[2], a honeypot is a security mechanism intended to lure and deflect an attempted attack. AWS’s honeypot is a trap point that one can insert into website to detect inbound requests from content scrapers and bad bots. The IP addresses are recorded if a source accesses the honeypot.
Overview of the AWS Honeypot Cyberattack
The test dataset of AWS Honeypot Cyberattack is retrieved from Kaggle https://www.kaggle.com/casimian2000/aws-honeypot-attack-data/data
We use Tableau Prep to run an overview of the data before any analysis.
Analysis of Data fields
| Field # | Dataset Field | Comments | Details |
|---|---|---|---|
| 1 | datatime | Packet Arrival Date | From 03/03/2013, 09:53:00PM
To 07/24/2013 07:47:00AM 185k entries |
| 2 | host | Honeypot Server | 9 Categories:
|
| 3 | src | Packet Source | 70k different sources |
| 4 | proto | Packet Protocol Type | ICMP, TCP, UDP |
| 5 | type | Packet Type | 8 different types |
| 6 | spt | Source Port | 46k ports |
| 7 | dpt | Destination Port | 4k ports |
| 8 | srcstr | Source IP Address | 70k addresses |
| 9 | cc | Source Country Code | 178 countries |
| 10 | country | Source Country | 178 countries |
| 11 | locale | Source Location | 1k locations |
| 12 | localeabbr | Location Abbreviation | 614k entries Note the grouping is not effective, making this field redundant for reference |
| 13 | postalcode | Postal Code | 3k entries |
| 14 | Latitude | Source Latitude | 5k entries |
| 15 | Longitude | Source Longtitude | 5k entries |
| 16 | F16 | A dummy field for those with longitude > 20,000 | There are 6 types in this category
|
Acknowledgement:the interpretation of the datasets have been assisted with below reference.
https://emreovunc.com/projects/honeypots_data_analysis.pdf
https://www.kaggle.com/jonathanbouchet/aws-honeypot/notebook
What to analysis?
Without a doubt, the dataset requires data cleaning as the work proceeds. However, based on the analysis of the field, it is clear that
- The targets/destinations are 8 different servers (host)
- The attackers are from various sources around the world
- IP addresses
- Counties + cities
- Postcode + Geographic data
- Time log is available
We can run a few analyses
- Basic statistics of the data
- Advance visualisation of the data
- Animation of attacks showing “Origin Vs Destination” over the time log