Difference between revisions of "Group05 Proposal"

From Visual Analytics and Applications
Jump to navigation Jump to search
Line 1: Line 1:
<div style=background:#383838 border:#A3BFB1>
+
<div style=background:#000000 border:#A3BFB1>
  
[[Image:Bike riding.jpg|250px]]  
+
[[Image:bike riding.jpg|250px]]  
 
<b><font size = 5; color="#FFFAF0">Visual Application for Time Series Clustering </font></b>
 
<b><font size = 5; color="#FFFAF0">Visual Application for Time Series Clustering </font></b>
 
</div>
 
</div>
 
<!--MAIN HEADER -->
 
<!--MAIN HEADER -->
 
{|style="background-color:#ffefd8;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
 
{|style="background-color:#ffefd8;" width="100%" cellspacing="0" cellpadding="0" valign="top" border="0"  |
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#383838; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#383838; text-align:center;" width="20%" |  
 
;
 
;
 
[[Group05_Proposal|<b><font size="3"><font color="#FFFAF0">Project Proposal</font></font></b>]]
 
[[Group05_Proposal|<b><font size="3"><font color="#FFFAF0">Project Proposal</font></font></b>]]
  
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#383838; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:100%; solid #000000; background:#000000; text-align:center;" width="20%" |
 +
;
 +
[[Group05_Dashboard|<b><font size="2"><font color="#FFFAF0">Methodology & Dashboard Design</font></font></b>]]
 +
 
 +
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#000000; text-align:center;" width="20%" |  
 
;
 
;
 
[[Group05_Poster|<b><font size="2"><font color="#FFFAF0">Poster</font></font></b>]]
 
[[Group05_Poster|<b><font size="2"><font color="#FFFAF0">Poster</font></font></b>]]
  
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#383838; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#000000; text-align:center;" width="20%" |  
 
;
 
;
 
[[Group05_Report|<b><font size="2"><font color="#FFFAF0">Final Report</font></font></b>]]
 
[[Group05_Report|<b><font size="2"><font color="#FFFAF0">Final Report</font></font></b>]]
  
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#383838; text-align:center;" width="25%" |  
+
| style="font-family:Century Gothic; font-size:100%; solid #1B338F; background:#000000; text-align:center;" width="20%" |  
 
;
 
;
 
[[Group05_Application| <b><font size="2"><font color="#FFFAF0">Application</font></font></b>]]
 
[[Group05_Application| <b><font size="2"><font color="#FFFAF0">Application</font></font></b>]]

Revision as of 14:41, 24 November 2018

Bike riding.jpg Visual Application for Time Series Clustering

Project Proposal

Methodology & Dashboard Design

Poster

Final Report

Application

 


Abstract

Time series clustering is to partition time series data into groups based on similarity or distance, so that time series in the same cluster are similar. Time-series datasets contain valuable information that can be obtained through pattern discovery. Clustering is a common solution performed to uncover these patterns on time-series datasets. It represents the time-series cluster structures as visual images (visualization of time-series data) can help users quickly understand the structure of data, clusters, anomalies, and other regularities in datasets.

Time series clustering has a wide variety of strategies and a series specific to Dynamic Time Warping (DTW) distance. The dtwclust is a package of R statistical software so that have many of the algorithm implemented in this package that are specifically tailored to DTW. A great amount of effort went into implementing them as efficiently as possible, and the functions were designed with flexibility and extensibility in mind. As such, the dtwclust is a package with its functions comparable to, if not more superior than the expensive commercial-of-the-shelves analytical toolkit such as SAS Enterprise Miner. However, till date, the usage of dtwclust package tends to be confined within academic research as it required intermediate R programming skill.

The project aims to provide a user-friendly interface to dtwclust package by using R Shiny framework. The user-friendly interface design allows casual users to import data, manage, explore, calibrate, visualise and evaluate clusters without having to type a single line of code. In addition to that, the application aims to incorporates graph visualization to enhance data exploration, to aid in the interpretability of the outputs of the clusters and to investigate the similarities or dissimilarities within the cluster.

Background on Time Series Clustering

What is Time Series Clustering?

Clustering is a data analysis technique for organizing observed data (e.g. people, things, events, brands, companies) into meaningful taxonomies, groups or clusters without advanced knowledge of the groups’ definition. Clusters are formed based on combinations of input variable, which maximizes the similarity of cases within each cluster while maximizing the dissimilarity between groups that are initially unknows. Time-series clustering is a type of clustering algorithm made to handle dynamic data. It is a special type of clustering is time-series clustering, which is essentially dynamic data as its feature values changes as a function of time. They pose some challenging issues due to large size and high dimensionality commonly associated with time-series.

Key Parameters of Time-Series Clustering

Parameters Algorithm
Type
  • Hierarchical Clustering
  • Partitional Clustering
Distances
  • Dynamic Time Warping (DTW)
  • Global Alignment Kernels (GAK)
  • Shape-Based Distance (SBD)
Centroid
  • DTW Barycenter Averaging (DBA)
  • Partitioning Around Medoids (PAM)
  • Shape Averaging (Shape)

Key Methods of Hierarchical Clustering

Agglomeration Method Methods in R
Single-Linkage (single)
  • single - Nearest Neighbour clustering
Complete-Linkage (complete)
  • complete - Furthest Neighbour Sorting
Average Agglomerative Clustering
  • average - Unweighted Arithmetic Average Clustering (UPGMA)
  • mcquitty - Weighted Pair Group Method with Arithmetric Mean (WPGMA)
  • centroid - Unweighted Centroid Clustering (UPGMC)
  • method - Weighted Centroid Clustering (WPGMC)
Ward’s Minimum Variance
  • ward.D – Does not implement Ward’s (1963) clustering criterion
  • ward.D2 – Implements that criterion (Murtagh and Legendre 2014)

Packages Used

This dashboard mainly uses dtwclust package from R.

dtwclust:

The dtwclust package provides the functionality to choose the time-series representation, preprocessing and clustering algorithm, and includes implementations of recently developed time-series clustering algorithms and optimizations. It serves as a bridge between classical clustering algorithms and time-series data, additionally providing visualization and evaluation routines that can handle time-series.

Objective

To build an application for Time-Series Clustering

Time-series data are of interest due to their ubiquity in various areas ranging from science, engineering, business, economics, healthcare, to government. This dashboard aims to allow user to do time series clustering on time series related data to uncover patterns which have potential use case in the respective domain.

Reference

  1. File:Time Series Clustering A Decade Review.pdf
  2. The Arules