Difference between revisions of "ANLY482 AY2016-17 T2 Group 2 Project Overview Data Source"
Jump to navigation
Jump to search
Line 46: | Line 46: | ||
==<div style="background: #6A8D9D; line-height: 0.3em; font-family:helvetica; border-left: #466675 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Preliminary Data Source</strong></font></div></div>== | ==<div style="background: #6A8D9D; line-height: 0.3em; font-family:helvetica; border-left: #466675 solid 15px;"><div style="border-left: #FFFFFF solid 5px; padding:15px;font-size:15px;"><font color= "#F2F1EF"><strong>Preliminary Data Source</strong></font></div></div>== | ||
<div style="margin:20px; padding: 10px; background: #ffffff; text-align:left; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);"> | <div style="margin:20px; padding: 10px; background: #ffffff; text-align:left; font-size: 95%;-webkit-border-radius: 15px;-webkit-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96); -moz-box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);box-shadow: 7px 4px 14px rgba(176, 155, 121, 0.96);"> | ||
− | + | To facilitate our initial analysis, GovTech provided us with dataset that consists of job postings for January 2016. The dataset contains on every instance’s job title, description and requirements. Relevant skills can be found in all 3 columns. <br><br> | |
− | |||
− | The dataset | ||
'''Data Dictionary''' | '''Data Dictionary''' | ||
{|class="wikitable" width="60%" | {|class="wikitable" width="60%" | ||
Line 71: | Line 69: | ||
| This lists out the certifications, experiences, skills required for the job post. | | This lists out the certifications, experiences, skills required for the job post. | ||
|} | |} | ||
+ | <br> | ||
+ | Using this data, we can gather labelled data by identifying the words from the ‘description’ and ‘requirements’ fields and scraping websites for common skillsets. This labelled data will then be used to build a model. | ||
+ | Subsequently, data will be scraped from jobsbank.gov.sg and used to train the model. | ||
+ | |||
</div> | </div> |
Revision as of 16:02, 29 December 2016
Background | Data Source | Methodology |
---|
Preliminary Data Source
Preliminary Data Source
To facilitate our initial analysis, GovTech provided us with dataset that consists of job postings for January 2016. The dataset contains on every instance’s job title, description and requirements. Relevant skills can be found in all 3 columns.
Data Dictionary
Data Field | Description |
---|---|
[empty] | Serial number of the job postings |
jobtitle | This shows the hiring post for the job. In the job title, it displays whether is it an engineering or finance role. |
Description | This describes in detail the company’s profile, candidate’s characteristics they are looking for and the expected work scope of the candidate. |
requirements | This lists out the certifications, experiences, skills required for the job post. |
Using this data, we can gather labelled data by identifying the words from the ‘description’ and ‘requirements’ fields and scraping websites for common skillsets. This labelled data will then be used to build a model.
Subsequently, data will be scraped from jobsbank.gov.sg and used to train the model.