Difference between revisions of "ANLY482 AY2016-17 T2 Group 2 Project Overview Data Source"
Jump to navigation
Jump to search
Line 46: | Line 46: | ||
==<div style="background: #6A8D9D; line-height: 0.3em; font-family:helvetica; border-left: #466675 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Preliminary Data Source</strong></font></div></div>== | ==<div style="background: #6A8D9D; line-height: 0.3em; font-family:helvetica; border-left: #466675 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Preliminary Data Source</strong></font></div></div>== | ||
<div style="margin:20px; padding: 10px; background: #ffffff; text-align:left; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);"> | <div style="margin:20px; padding: 10px; background: #ffffff; text-align:left; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);"> | ||
− | To facilitate our initial analysis, | + | To facilitate our initial analysis, Kaisou has provided us with sample datasets that consists of some transaction data from November 2016. The three datasets given are namely the musical transaction data, concerts transaction data and customer profile data. |
− | ''' | + | <br><br> |
+ | '''Musical data''' | ||
{|class="wikitable" width="60%" | {|class="wikitable" width="60%" | ||
|- | |- | ||
Line 54: | Line 55: | ||
|- | |- | ||
− | ! | + | ! Currency |
− | | | + | | The currency the transaction was purchased in. Should be “SGD” for all transactions. |
|- | |- | ||
− | ! | + | ! AccDummy |
− | | This | + | | The account number that purchased this transaction. This is being anonymised. |
|- | |- | ||
− | ! | + | ! TicketStatus |
− | | | + | | S is for Single and M is for Multiple. |
|- | |- | ||
− | ! | + | ! TicketType |
− | | | + | | The kind of ticket type. |
+ | |||
+ | |- | ||
+ | ! Channel | ||
+ | | I is for Internet and P is for Phone. We will use this column to differentiate which channel the transaction is purchased from. | ||
+ | |||
+ | |- | ||
+ | ! MusicalDate | ||
+ | | The date where the musical is held. | ||
+ | |||
+ | |- | ||
+ | ! QuickPick | ||
+ | | Y means that the machine picked the number while N means that the customer picked the number. | ||
+ | |||
+ | |- | ||
+ | ! DrawNumber | ||
+ | | Unique number for each musical. | ||
+ | |||
+ | |- | ||
+ | ! Product | ||
+ | | 23 is local production, 9 is overseas production | ||
+ | |||
+ | |- | ||
+ | ! SettleDate | ||
+ | | The settlement date for the purchase. | ||
+ | |||
+ | |- | ||
+ | ! Selection | ||
+ | | Seat number that the customer selected. | ||
+ | |||
+ | |- | ||
+ | ! TicketDate | ||
+ | | The date of purchase. | ||
+ | |||
+ | |- | ||
+ | ! TicketTime | ||
+ | | The time of purchase | ||
+ | |||
+ | |- | ||
+ | ! TicketAmount | ||
+ | | The total amount from the purchase | ||
|} | |} | ||
<br> | <br> | ||
+ | '''Concert data''' | ||
+ | {|class="wikitable" width="60%" | ||
+ | |- | ||
+ | ! width="15%" | Data Field | ||
+ | ! Description | ||
+ | |||
+ | |- | ||
+ | ! Currency | ||
+ | | The currency the transaction was purchased in. Should be “SGD” for all transactions. | ||
+ | |||
+ | |- | ||
+ | ! AccountDummy | ||
+ | | The account number that purchased this transaction. This is being anonymised. | ||
+ | |||
+ | |- | ||
+ | ! TicketStatus | ||
+ | | A is for Active. | ||
+ | |||
+ | |- | ||
+ | ! TicketType | ||
+ | | The kind of ticket type. | ||
+ | |||
+ | |- | ||
+ | ! Channel | ||
+ | | I is for Internet and P is for Phone. We will use this column to differentiate which channel the transaction is purchased from. | ||
+ | |||
+ | |- | ||
+ | ! LiveInd | ||
+ | | Y means that the purchase was on a live concert while N means otherwise | ||
+ | |||
+ | |- | ||
+ | ! TicketType | ||
+ | | The type of ticket | ||
+ | |||
+ | |- | ||
+ | ! LegStatus | ||
+ | | | ||
+ | |||
+ | |- | ||
+ | ! MarketName | ||
+ | | | ||
+ | |||
+ | |- | ||
+ | ! Odds | ||
+ | | | ||
+ | |||
+ | |- | ||
+ | ! SettleDate | ||
+ | | The settlement date for the tickets. | ||
+ | |||
+ | |- | ||
+ | ! SettleInfo | ||
+ | | | ||
+ | |||
+ | |- | ||
+ | ! TicketDate | ||
+ | | The date of purchase. | ||
+ | |||
+ | |- | ||
+ | ! TicketTime | ||
+ | | The time of purchase. | ||
+ | |||
+ | |- | ||
+ | ! ArtistCode | ||
+ | | The concert artist that the ticket belongs to | ||
+ | |||
+ | |- | ||
+ | ! TicketAmount | ||
+ | | The total amount from the purchase | ||
+ | |} | ||
Using this data, we can gather labelled data by identifying the words from the ‘description’ and ‘requirements’ fields and scraping websites for common skillsets. This labelled data will then be used to build a model. | Using this data, we can gather labelled data by identifying the words from the ‘description’ and ‘requirements’ fields and scraping websites for common skillsets. This labelled data will then be used to build a model. | ||
Subsequently, data will be scraped from jobsbank.gov.sg and used to train the model. | Subsequently, data will be scraped from jobsbank.gov.sg and used to train the model. | ||
</div> | </div> |
Revision as of 16:16, 8 January 2017
Background | Data Source | Methodology |
---|
Preliminary Data Source
Preliminary Data Source
To facilitate our initial analysis, Kaisou has provided us with sample datasets that consists of some transaction data from November 2016. The three datasets given are namely the musical transaction data, concerts transaction data and customer profile data.
Musical data
Data Field | Description |
---|---|
Currency | The currency the transaction was purchased in. Should be “SGD” for all transactions. |
AccDummy | The account number that purchased this transaction. This is being anonymised. |
TicketStatus | S is for Single and M is for Multiple. |
TicketType | The kind of ticket type. |
Channel | I is for Internet and P is for Phone. We will use this column to differentiate which channel the transaction is purchased from. |
MusicalDate | The date where the musical is held. |
QuickPick | Y means that the machine picked the number while N means that the customer picked the number. |
DrawNumber | Unique number for each musical. |
Product | 23 is local production, 9 is overseas production |
SettleDate | The settlement date for the purchase. |
Selection | Seat number that the customer selected. |
TicketDate | The date of purchase. |
TicketTime | The time of purchase |
TicketAmount | The total amount from the purchase |
Concert data
Data Field | Description |
---|---|
Currency | The currency the transaction was purchased in. Should be “SGD” for all transactions. |
AccountDummy | The account number that purchased this transaction. This is being anonymised. |
TicketStatus | A is for Active. |
TicketType | The kind of ticket type. |
Channel | I is for Internet and P is for Phone. We will use this column to differentiate which channel the transaction is purchased from. |
LiveInd | Y means that the purchase was on a live concert while N means otherwise |
TicketType | The type of ticket |
LegStatus | |
MarketName | |
Odds | |
SettleDate | The settlement date for the tickets. |
SettleInfo | |
TicketDate | The date of purchase. |
TicketTime | The time of purchase. |
ArtistCode | The concert artist that the ticket belongs to |
TicketAmount | The total amount from the purchase |
Using this data, we can gather labelled data by identifying the words from the ‘description’ and ‘requirements’ fields and scraping websites for common skillsets. This labelled data will then be used to build a model. Subsequently, data will be scraped from jobsbank.gov.sg and used to train the model.