Difference between revisions of "Group05 Proposal"

Revision as of 18:59, 20 November 2018

Visual Application for Time Series Clustering

Project Proposal

Poster

Final Report

Application

Abstract

Time series clustering is to partition time series data into groups based on similarity or distance, so that time series in the same cluster are similar. Time-series datasets contain valuable information that can be obtained through pattern discovery. Clustering is a common solution performed to uncover these patterns on time-series datasets. It represents the time-series cluster structures as visual images (visualization of time-series data) can help users quickly understand the structure of data, clusters, anomalies, and other regularities in datasets. Time series clustering has a wide variety of strategies and a series specific to Dynamic Time Warping (DTW) distance. The dtwclust is a package of R statistical software so that have many of the algorithm implemented in this package that are specifically tailored to DTW. A great amount of effort went into implementing them as efficiently as possible, and the functions were designed with flexibility and extensibility in mind. As such, the dtwclust is a package with its functions comparable to, if not more superior than the expensive commercial-of-the-shelves analytical toolkit such as SAS Enterprise Miner. However, till date, the usage of dtwclust package tends to be confined within academic research as it required intermediate R programming skill. In view of this limitation, our project seeks to provide an user-interface to dtwclust package by using R Shiny framework. The user-friendly interface design allows casual users to manage, explore, calibrate and visualise complex items mining and association rules mining models without having to type a single line of code. Besides providing user-friendly interface, our application also incorporates an interactive graph visualisation method to enhance the interpretability of the outputs of frequent itemsets mining and association rules mining algorithms.

This presentation consists of five main sections. Firstly, the motivation and objectives of the project will be discussed. This is followed by a detailed discussion on the principles and concepts of association rule mining and the R packages used to perform association rules mining, the arules family of packages. Thirdly, the application and visualization design with respect to the improvements made to the arules visualization packages will be discussed. Following which, we will demonstrate the flexible use of our application with two different use cases. The presentation will conclude with a sharing of valuable insights gained through working on the project and potential application areas of our application.

Background on Time Series Clustering

What is Time Series Clustering?

Clustering is a data analysis technique for organizing observed data (e.g. people, things, events, brands, companies) into meaningful taxonomies, groups or clusters without advanced knowledge of the groups’ definition. Clusters are formed based on combinations of input variable, which maximizes the similarity of cases within each cluster while maximizing the dissimilarity between groups that are initially unknows. Time-series clustering is a type of clustering algorithm made to handle dynamic data. It is a special type of clustering is time-series clustering, which is essentially dynamic data as its feature values changes as a function of time. They pose some challenging issues due to large size and high dimensionality commonly associated with time-series.

Key Parameters of Time-Series Clustering

Parameters	Algorithm
Type	Hierarchical Clustering Partitional Clustering
Distances	Dynamic Time Warping (DTW) Global Alignment Kernels (GAK) Shape-Based Distance (SBD)
Centroid	DTW Barycenter Averaging (DBA) Partitioning Around Medoids (PAM) Shape Averaging (Shape)

Key Methods of Hierarchical Clustering

Agglomeration Method	Method in R
Single-Linkage (single)	single - Nearest Neighbour clustering	Complete-Linkage (complete)	complete - Furthest Neighbour Sorting	Average Agglomerative Clustering	average - Unweighted Arithmetic Average Clustering (UPGMA) mcquitty - Weighted Pair Group Method with Arithmetric Mean (WPGMA) centroid - Unweighted Centroid Clustering (UPGMC) method - Weighted Centroid Clustering (WPGMC)	Ward’s Minimum Variance	ward.D – Does not implement Ward’s (1963) clustering criterion ward.D2 – Implements that criterion (Murtagh and Legendre 2014)

Packages Used

This dashboard mainly uses dtwclust package from R.

dtwclust:

The dtwclust package provides the functionality to choose the time-series representation, preprocessing and clustering algorithm, and includes implementations of recently developed time-series clustering algorithms and optimizations. It serves as a bridge between classical clustering algorithms and time-series data, additionally providing visualization and evaluation routines that can handle time-series.

Objective

To build an application for Time-Series Clustering

Time-series data are of interest due to their ubiquity in various areas ranging from science, engineering, business, economics, healthcare, to government. This dashboard aims to allow user to do time series clustering on time series related data to uncover patterns which have potential use case in the respective domain.

Reference

File:Time Series Clustering A Decade Review.pdf The Arules

@@ Line 27: / Line 27: @@
 ==Abstract==
-XXXXX
+Time series clustering is to partition time series data into groups based on similarity or distance, so that time series in the same cluster are similar. Time-series datasets contain valuable information that can be obtained through pattern discovery. Clustering is a common solution performed to uncover these patterns on time-series datasets. It represents the time-series cluster structures as visual images (visualization of time-series data) can help users quickly understand the structure of data, clusters, anomalies, and other regularities in datasets.
+Time series clustering has a wide variety of strategies and a series specific to Dynamic Time Warping (DTW) distance. The <b>dtwclust</b> is a package of R statistical software so that have many of the algorithm implemented in this package that are specifically tailored to DTW. A great amount of effort went into implementing them as efficiently as possible, and the functions were designed with flexibility and extensibility in mind. As such, the <b>dtwclust</b> is a package with its functions comparable to, if not more superior than the expensive commercial-of-the-shelves analytical toolkit such as SAS Enterprise Miner. However, till date, the usage of <b>dtwclust</b> package tends to be confined within academic research as it required intermediate R programming skill.
+In view of this limitation, our project seeks to provide an user-interface to <b>dtwclust</b>  package by using R Shiny framework. The user-friendly interface design allows casual users to manage, explore, calibrate and visualise complex items mining and association rules mining models without having to type a single line of code. Besides providing user-friendly interface, our application also incorporates an interactive graph visualisation method to enhance the interpretability of the outputs of frequent itemsets mining and association rules mining algorithms.
+This presentation consists of five main sections. Firstly, the motivation and objectives of the project will be discussed. This is followed by a detailed discussion on the principles and concepts of association rule mining and the R packages used to perform association rules mining, the arules family of packages. Thirdly, the application and visualization design with respect to the improvements made to the arules visualization packages will be discussed. Following which, we will demonstrate the flexible use of our application with two different use cases. The presentation will conclude with a sharing of valuable insights gained through working on the project and potential application areas of our application.
 ==Background on Time Series Clustering==
 ===What is Time Series Clustering?===
 Clustering is a data analysis technique for organizing observed data (e.g. people, things, events, brands, companies) into meaningful taxonomies, groups or clusters without advanced knowledge of the groups’ definition. Clusters are formed based on combinations of input variable, which maximizes the similarity of cases within each cluster while maximizing the dissimilarity between groups that are initially unknows.
-A special type of clustering is time-series clustering, which is essentially dynamic data as its feature values changes as a function of time.
+Time-series clustering is a type of clustering algorithm made to handle dynamic data. It is a special type of clustering is time-series clustering, which is essentially dynamic data as its feature values changes as a function of time. They pose some challenging issues due to large size and high dimensionality commonly associated with time-series.
 ===Key Parameters of Time-Series Clustering===
 {| class="wikitable"
 |-
@@ Line 43: / Line 51: @@
 |-
 | Distances||
-* Dynamic Time Wrapping (DTW)
+* Dynamic Time Warping (DTW)
 * Global  Alignment Kernels (GAK)
 * Shape-Based Distance (SBD)
@@ Line 52: / Line 60: @@
 * Shape Averaging (Shape)
 |}
+===Key Methods of Hierarchical Clustering===
+{| class="wikitable"
+|-
+! Agglomeration Method !! Method in R
+|-
+| Single-Linkage (single)||
+* single - Nearest Neighbour clustering
+| Complete-Linkage (complete)||
+* complete - Furthest Neighbour Sorting
+|Average Agglomerative Clustering ||
+* average - Unweighted Arithmetic Average Clustering (UPGMA)
+* mcquitty - Weighted Pair Group Method with Arithmetric Mean (WPGMA)
+* centroid - Unweighted Centroid Clustering (UPGMC)
+* method - Weighted Centroid Clustering (WPGMC)
+| Ward’s Minimum Variance ||
+* ward.D – Does not implement Ward’s (1963) clustering criterion
+* ward.D2 – Implements that criterion (Murtagh and Legendre 2014)
+|}
 ==Packages Used==
 This dashboard mainly uses dtwclust package from R.
 <b><big>dtwclust:</big></b><br>
-XXXX
+The <b>dtwclust</b> package provides the functionality to choose the time-series representation, preprocessing and clustering algorithm, and includes implementations of recently developed time-series clustering algorithms and optimizations. It serves as a bridge between classical clustering algorithms and time-series data, additionally providing visualization and evaluation routines that can handle time-series.
 ==Objective==
-===To build an application for Time Series Clustering===
+===To build an application for Time-Series Clustering===
 Time-series data are of interest due to their ubiquity in various areas ranging from science, engineering, business, economics, healthcare, to government. This dashboard aims to allow user to do time series clustering on time series related data to uncover patterns which have potential use case in the respective domain.
 ==Reference==
 [[File:Time Series Clustering A Decade Review.pdf|none|A journal by Saeed Aghabozorgi, Ali Seyed Shirkhorshidi and Teh Ying Wah]]
+[[ISSS608_2016-17_T3_Group8_Arules_Project Proposal|The Arules]]

Difference between revisions of "Group05 Proposal"

Revision as of 18:59, 20 November 2018

Contents

Abstract

Background on Time Series Clustering

What is Time Series Clustering?

Key Parameters of Time-Series Clustering

Key Methods of Hierarchical Clustering

Packages Used

Objective

To build an application for Time-Series Clustering

Reference

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools