Team Flair Data Source

From Analytics Practicum
Revision as of 08:37, 19 January 2018 by Jmleong.2014 (talk | contribs) (Created page with "<!---------------START of header ----------------------> <div align="right"> ANLY482_AY2017-18_Term_2|<font color="#f9660e" font-family:helvetica><b>Return to ANLY482 AY2017...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Return to ANLY482 AY2017-18 Home Page

T12Banner2.png

T12Home.png T12AboutUs.png T12ProjectOverview2.png T12ProjectFindings.png T12ProjectManagement.png T12Documentation.png

Background Data Source Methodology

Dataset Description

ARTistique’s data set is called the national survey of the arts where they find out information related to arts and culture in Singapore such as respondents’ attendance or participation in arts/cultural activities; reasons why they attend/do not attend such activities etc. They have conducted approximately 5 surveys over the last 10 years with a total of around 5000 respondents. Currently they have provided us a sample extract of their data set excluding demographic information. The sample extract consists of 458 variables with 2041 data points.


For our proposed scope, the relevant data points include:

1. Dependent variable

  • Number of events physically attended in the past 12 months (numeric - integer)
  • Of those events attended, how many did you pay for entry (numeric - integer)
  • How regularly did you attend these arts and cultural events and activities (Likert scale)


2. Independent variables (not limited to)

  • I find arts and cultural events/activities enjoyable (yes | no)
  • I heard positive reviews from friends/colleagues/relatives/media (yes | no)


Most of the data consist of binary values 1 or 2 (1=yes, 2=no). For some variables values can have a numeric range while some other open ended questions can allow for string format in the data. A quick analysis of the data reveals that most of the open ended questions are left unanswered. This means that these variables do not add information and can be removed, allowing us to reduce some dimensionality in the data.