Difference between revisions of "ANLY482 AY2016-17 T2 Group 2 Project Overview Data Source"

From Analytics Practicum
Jump to navigation Jump to search
Line 46: Line 46:
 
==<div style="background: #6A8D9D; line-height: 0.3em; font-family:helvetica;  border-left: #466675 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Preliminary Data Source</strong></font></div></div>==
 
==<div style="background: #6A8D9D; line-height: 0.3em; font-family:helvetica;  border-left: #466675 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Preliminary Data Source</strong></font></div></div>==
 
<div style="margin:20px; padding: 10px; background: #ffffff; text-align:left; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);">
 
<div style="margin:20px; padding: 10px; background: #ffffff; text-align:left; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);">
To facilitate our initial analysis, GovTech provided us with dataset that consists of job postings for January 2016. The dataset contains on every instance’s job title, description and requirements. Relevant skills can be found in all 3 columns. <br><br>
+
To facilitate our initial analysis, Kaisou has provided us with sample datasets that consists of some transaction data from November 2016. The three datasets given are namely the musical transaction data, concerts transaction data and customer profile data.
'''Data Dictionary'''
+
<br><br>
 +
'''Musical data'''
 
{|class="wikitable" width="60%"
 
{|class="wikitable" width="60%"
 
|-
 
|-
Line 54: Line 55:
 
 
 
         |-  
 
         |-  
         ! [empty]
+
         ! Currency
| Serial number of the job postings
+
| The currency the transaction was purchased in. Should be “SGD” for all transactions.
 
          
 
          
 
         |-  
 
         |-  
         ! jobtitle
+
         ! AccDummy
| This shows the hiring post for the job. In the job title, it displays whether is it an engineering or finance role.
+
| The account number that purchased this transaction. This is being anonymised.
  
 
         |-  
 
         |-  
         ! Description
+
         ! TicketStatus
| This describes in detail the company’s profile, candidate’s characteristics they are looking for and the expected work scope of the candidate.
+
| S is for Single and M is for Multiple.
  
 
         |-  
 
         |-  
         ! requirements
+
         ! TicketType
| This lists out the certifications, experiences, skills required for the job post.
+
| The kind of ticket type.
 +
 
 +
        |-
 +
        ! Channel
 +
| I is for Internet and P is for Phone. We will use this column to differentiate which channel the transaction is purchased from.
 +
 
 +
        |-
 +
        ! MusicalDate
 +
| The date where the musical is held.
 +
 
 +
        |-
 +
        ! QuickPick
 +
| Y means that the machine picked the number while N means that the customer picked the number.
 +
 
 +
        |-
 +
        ! DrawNumber
 +
| Unique number for each musical.
 +
 
 +
        |-
 +
        ! Product
 +
| 23 is local production, 9 is overseas production
 +
 
 +
        |-
 +
        ! SettleDate
 +
| The settlement date for the purchase.
 +
 
 +
        |-
 +
        ! Selection
 +
| Seat number that the customer selected.
 +
 
 +
        |-
 +
        ! TicketDate
 +
| The date of purchase.
 +
 
 +
        |-
 +
        ! TicketTime
 +
| The time of purchase
 +
 
 +
        |-
 +
        ! TicketAmount
 +
| The total amount from the purchase
 
|}
 
|}
 
<br>
 
<br>
 +
'''Concert data'''
 +
{|class="wikitable" width="60%"
 +
|-
 +
! width="15%" | Data Field
 +
! Description
 +
 +
        |-
 +
        ! Currency
 +
| The currency the transaction was purchased in. Should be “SGD” for all transactions.
 +
       
 +
        |-
 +
        ! AccountDummy
 +
| The account number that purchased this transaction. This is being anonymised.
 +
 +
        |-
 +
        ! TicketStatus
 +
| A is for Active.
 +
 +
        |-
 +
        ! TicketType
 +
| The kind of ticket type.
 +
 +
        |-
 +
        ! Channel
 +
| I is for Internet and P is for Phone. We will use this column to differentiate which channel the transaction is purchased from.
 +
 +
        |-
 +
        ! LiveInd
 +
| Y means that the purchase was on a live concert while N means otherwise
 +
 +
        |-
 +
        ! TicketType
 +
| The type of ticket
 +
 +
        |-
 +
        ! LegStatus
 +
|
 +
 +
        |-
 +
        ! MarketName
 +
|
 +
 +
        |-
 +
        ! Odds
 +
|
 +
 +
        |-
 +
        ! SettleDate
 +
| The settlement date for the tickets.
 +
 +
        |-
 +
        ! SettleInfo
 +
|
 +
 +
        |-
 +
        ! TicketDate
 +
| The date of purchase.
 +
 +
        |-
 +
        ! TicketTime
 +
| The time of purchase.
 +
 +
        |-
 +
        ! ArtistCode
 +
| The concert artist that the ticket belongs to
 +
 +
        |-
 +
        ! TicketAmount
 +
| The total amount from the purchase
 +
|}
 
Using this data, we can gather labelled data by identifying the words from the ‘description’ and ‘requirements’ fields and scraping websites for common skillsets. This labelled data will then be used to build a model.
 
Using this data, we can gather labelled data by identifying the words from the ‘description’ and ‘requirements’ fields and scraping websites for common skillsets. This labelled data will then be used to build a model.
 
Subsequently, data will be scraped from jobsbank.gov.sg and used to train the model.
 
Subsequently, data will be scraped from jobsbank.gov.sg and used to train the model.
  
 
</div>
 
</div>

Revision as of 16:16, 8 January 2017


HOME

 

PROJECT OVERVIEW

 

FINDINGS

 

PROJECT DOCUMENTATION

 

PROJECT MANAGEMENT

Background Data Source Methodology

Preliminary Data Source

To facilitate our initial analysis, Kaisou has provided us with sample datasets that consists of some transaction data from November 2016. The three datasets given are namely the musical transaction data, concerts transaction data and customer profile data.



Musical data

Data Field Description
Currency The currency the transaction was purchased in. Should be “SGD” for all transactions.
AccDummy The account number that purchased this transaction. This is being anonymised.
TicketStatus S is for Single and M is for Multiple.
TicketType The kind of ticket type.
Channel I is for Internet and P is for Phone. We will use this column to differentiate which channel the transaction is purchased from.
MusicalDate The date where the musical is held.
QuickPick Y means that the machine picked the number while N means that the customer picked the number.
DrawNumber Unique number for each musical.
Product 23 is local production, 9 is overseas production
SettleDate The settlement date for the purchase.
Selection Seat number that the customer selected.
TicketDate The date of purchase.
TicketTime The time of purchase
TicketAmount The total amount from the purchase


Concert data

Data Field Description
Currency The currency the transaction was purchased in. Should be “SGD” for all transactions.
AccountDummy The account number that purchased this transaction. This is being anonymised.
TicketStatus A is for Active.
TicketType The kind of ticket type.
Channel I is for Internet and P is for Phone. We will use this column to differentiate which channel the transaction is purchased from.
LiveInd Y means that the purchase was on a live concert while N means otherwise
TicketType The type of ticket
LegStatus
MarketName
Odds
SettleDate The settlement date for the tickets.
SettleInfo
TicketDate The date of purchase.
TicketTime The time of purchase.
ArtistCode The concert artist that the ticket belongs to
TicketAmount The total amount from the purchase

Using this data, we can gather labelled data by identifying the words from the ‘description’ and ‘requirements’ fields and scraping websites for common skillsets. This labelled data will then be used to build a model. Subsequently, data will be scraped from jobsbank.gov.sg and used to train the model.