Group01 Dataset Overview

As described by Amazon Web Service (AWS)[2], a honeypot is a security mechanism intended to lure and deflect an attempted attack. AWS’s honeypot is a trap point that one can insert into website to detect inbound requests from content scrapers and bad bots. The IP addresses are recorded if a source accesses the honeypot.

Overview of the AWS Honeypot Cyberattack

The test dataset of AWS Honeypot Cyberattack is retrieved from Kaggle https://www.kaggle.com/casimian2000/aws-honeypot-attack-data/data

We use Tableau Prep to run an overview of the data before any analysis.

Analysis of Data fields

Field #	Dataset Field	Comments	Details
1	datatime	Packet Arrival Date	From 03/03/2013, 09:53:00PM To 07/24/2013 07:47:00AM 185k entries
2	host	Honeypot Server	9 Categories: Eu Oregon Sa Singapore Sydney Tokyo US East Groucho-norcal Zeppo-norcal
3	src	Packet Source	70k different sources
4	proto	Packet Protocol Type	ICMP, TCP, UDP
5	type	Packet Type	8 different types
6	spt	Source Port	46k ports
7	dpt	Destination Port	4k ports
8	srcstr	Source IP Address	70k addresses
9	cc	Source Country Code	178 countries
10	country	Source Country	178 countries
11	locale	Source Location	1k locations
12	localeabbr	Location Abbreviation	614k entries Note the grouping is not effective, making this field redundant for reference
13	postalcode	Postal Code	3k entries
14	Latitude	Source Latitude	5k entries
15	Longitude	Source Longtitude	5k entries
16	F16	A dummy field for those with longitude > 20,000	There are 6 types in this category Null is for everything else Those with numbers refer to Washington D.C

Acknowledgement:the interpretation of the datasets have been assisted with below reference.
https://emreovunc.com/projects/honeypots_data_analysis.pdf
https://www.kaggle.com/jonathanbouchet/aws-honeypot/notebook

What to analysis?

Without a doubt, the dataset requires data cleaning as the work proceeds. However, based on the analysis of the field, it is clear that

The targets/destinations are 8 different servers (host)
The attackers are from various sources around the world

IP addresses
Counties + cities
Postcode + Geographic data

Time log is available

We can run a few analyses

Basic statistics of the data
Advance visualisation of the data
Animation of attacks showing “Origin Vs Destination” over the time log

Group01 Dataset Overview

Contents

What is Honeypot?

Overview of the AWS Honeypot Cyberattack

Analysis of Data fields

What to analysis?

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools