Difference between revisions of "ISSS608 2016-17 T1 Assign2 Li Nanxun"
Line 125: | Line 125: | ||
− | [[File:ISSS608 Assignment2 Slopegraph1.2.png| | + | [[File:ISSS608 Assignment2 Slopegraph1.2.png|400px]] |
'''Main Findings''' | '''Main Findings''' | ||
Line 135: | Line 135: | ||
===Slopegraph: How scores change from PhD=No to =Yes=== | ===Slopegraph: How scores change from PhD=No to =Yes=== | ||
− | [[File:ISSS608 Assignment2 Slopegraph2.2.png| | + | [[File:ISSS608 Assignment2 Slopegraph2.2.png|400px]] |
'''Main Findings''' | '''Main Findings''' | ||
Line 143: | Line 143: | ||
===SlopeGraph: Domain – Section Scores=== | ===SlopeGraph: Domain – Section Scores=== | ||
− | [[File:ISSS608 Assignment2 Slopegraph3.2.png| | + | [[File:ISSS608 Assignment2 Slopegraph3.2.png|600px]] |
'''Main Findings''' | '''Main Findings''' |
Revision as of 12:43, 26 September 2016
Contents
Abstract
There is a rough relationship between the academic level (Observed via PhD and Position) of university faculty and how they think about Wikipedia: the more profound in domain, the lower the scores. In fact, the most important factor that affects the scores is whether the user is registered or not. This is largely due to the usage factors related with behaviour and habit like contribution, Wiki usage. The detailed reasons may be that they cannot use some of the functions unless they are registered users(behaviour scores), and once they registered, which should means they think Wikipedia is good and want to contribute to Wiki(subjective positive view of Wiki), and after their contribution, the quality scores should also increase.
Interesting findings are:
- The highest scores are shown in Item section SA (Sharing Attitude), which is not so related with Wikipedia, but their themselves value system. Luckily, most of university faculty think it is important to share via other media other than journals and books. However, the scores related with online material contribution( Exp4, Pf1 and Pf3)are all relatively low.
- There is only one obvious decreasing line (Avg. Peu1)in the graph, which means Wikipedia user friendliness for those who are registered users is not as good as for unregistered users!!! So that may because the Wikipage is so hard to edit, since this is the main function used only by registered users. Anyway, the Peu1 score is still very high.
- Law & Politics domain has relatively low scores for almost all of the score sections. This may due to the domain features that it is hard to learn and write, the quality can hardly be improved by people from other domains.
Motivation and Problems
Wikipedia, as the best free online encyclopedia in the world, has helped so many people a lot and enjoys a high reputation. From a student perspective, the quality of the wiki pages is quite good. But because it is an open platform that everyone can edit, the academic accuracy and quality may be a concern for viewers since they don’t have the specialized domain knowledge to distinguish the mistakes or incorrect description. So let’s look at the survey data from university faculty to find out how do they think about Wikipedia in their domains so as to make Wikipedia greater.
Problems addressed:
- Is the academic quality of Wikipages an existing concern?
- What’s the pattern of the scores given by the university faculties? What are the factors involved (i.e. position, experience, PhD or not)?
- How to make Wikipedia better (Recommendations)?
Approaches
The data was obtained from https://archive.ics.uci.edu/ml/datasets/wiki4HE
It contained the following information that can be utilized:
- Age - Showing year and month of which the flat was resold.
- Gender - Male or Female.
- Yearsexp - The length of university teaching experience.
- PhD - Is the faculty PhD or not?
- Domain - The academic domains of the faculty.
- Position - the academic position of faculty (i.e. Professor, Associate Professor).
From the above data, extra fields were derived
- Position - created by summarizing the position info.
- Total average- Derived by averaging all the scores, since higher score means better in all categories, we can leverage this factor as the overview score of Wikipedia.
Type of chart used: Treemap, Slopegraph, Box Plot.
Interactive Data Visualization
Here is the link to Tableau Public Display . You can view my effort and explore the data yourself!
Tools Utilized
In this report, Tools used are:
- Tableau 10.0 (for data analysis)
- JMP 12.0 (for data preparation)
- Excel 2016 (for data preparation)
Data Preparation
Import Data into Excel
After downloading the data, we can notice that it cannot be viewed correctly in Excel, so we need to make some changes in the original file before loading it into Excel and checking its patterns.
After checking the original data via Notepad, we can find all the pieces of data are split by “;”. The split sign seems cannot be recognized automatically by Excel. The solution for this is to replace all the “;” with tab: enter a tab and copy it somewhere else, open the Replace function by “Ctrl” + “H”, “Find what” “;” , “Replace with” the tab you just copy and “Replace all”, save the document and open it in Excel. And you can find everything is in good order now.
Create New Score Factors
In order to compare the section scores, we need to create new columns to calculate each section (i.e. PU=average(Pu1,Pu2,Pu3)) and Total Average. The way to do this is for each section:
1. Create a new column. Name the column as per the section in the first row. 2. Use average function to calculate the section score.
Of course we can do this in Tableau or JMP, but it is much more time-consuming due to the function writing workload.
Correct Wrong Column Names
The OTHERSTATUS and OTHER_POSITION are misplaced. We need to modify the names by changing the two title columns.
Missing Data
When checking the data, we can find a lot of “?”, which mean missing values. We can use JMP to replace all the “?” with blank:
After that we should delete all the rows with missing values. But before doing that, we should treat the position first, since there are a lot of reasonable missing values in the columns related to position.
Create Position
I consider the positions should have no matter with which university they are in, so I recognize their positions as the higher one in the “UOC_POSITION” and “OTHER_POSITON”.
According to the data description, higher position means smaller number. Then we can create a new column “Position” with formula as followed.
Then clear the formula for the new column, delete Column “University”, “UOC_POSITION”, “OTHERSTATUS” and “OTHER_POSITION”, we don’t need these columns in this analysis anymore.
Change Labels
According to the data description, the data creator used numbers to represent a lot of category variables (i.e. for PhD, 1 means Yes, 0 means No). So we should change the category variables back to their original means so that the following analysis can be easier and user friendly.
Here is how to do it: “Cols” “Utilities” “Recode”. And the picture is the example about how to revise one of the columns.
Delete Rows With Missing Data
OK, now it is time to get rid of missing values. For the missing value shown in the score sections, I decided to directly delete the entire rows which have at least one missing value.
And now, the data preparation is done. Export the file as .xlsx format for further analysis with other data analytics software.
Data Analysis
With the questions kept in mind, let's explore the data now!
Because we want to see the relationships among more than 3 different category dimensions, one of the best way to start with should be Treemap. Treemap can let us have the feeling about the data.
Treemap
Gender-Domain-PhD
- According to the split in the treemap, we can find that, no matter the domain or the gender except the Health Science domain, those who are PhDs gave lower total average score to Wikipedia, which means for those who have achieved the highest degree in academic, they are not quit admire Wikipedia as much as others. The possible explanation for this is because they are more profound in the domain, so they have the ability to find the weakness of Wiki blogs contributed by other people.
- The biggest difference pair is the Male Health Science section, which is the special exception in the previous observation. And the difference between the pairs in Female section are smaller than the respective pairs in Male section.
Gender–Position–Registered
- We can find similar pattern shown in the previous treemap: no matter the domain or the gender, those who are registered users gave higher total average score to Wikipedia. The possible explanation for this is that the registered users should like to use Wikipedia or otherwise, and they have contributed to Wikipedia in their domains, so the quality of Wikipedia pages is high.
- The sections’ average scores also have relationships with positions. And according to the meaning behind the positions, it seems the higher the position the lower the score. Potential reason is the same as the previous one, since the positions also can show their mastery of domain knowledge.
Slopegraph
Because we have distinguished one important factor that affects the scores largely (Userwiki, which means the faculty is registered Wikipedia users or not.), and with the unsolved problem, how does the academic level affect the scores, kept in mind, I decided to use two similar Slopegraphs in order to have a deep and detailed vision.
Slopegraph: How scores change from Userwiki=No to =Yes
Main Findings
- The highest scores are shown in Item section SA (Sharing Attitude), which is not so related with Wikipedia, but their themselves value system. Luckily, they all think it is important to share via other media other than journals and books.
- The only one obvious decreasing line in the graph is the Avg. Peu1, which means Wiki is not User friendly for those who are registered users!!! So that may because the Wikipage is so hard to edit!!!! Since this is the main function used only by registered users!!!Anyway, the Peu1 score is still very high.
- And for the items on the bottom (Pf1, User1, Vis3, Use2, Exp4), most of them are about the using the wiki functions, which need registration, so the reason is quite straight. But For Vis3, which means the user cited Wikipage in academic papers, also increased a lot with Userwiki changed from No to Yes
Slopegraph: How scores change from PhD=No to =Yes
Main Findings
- The obvious drops are in JR and Vis3, JR is Job relevance, which means for PhDs, they have less job relevance with Wikipages. And of course, for those PhDs, they cannot cite much Wikipage thing, since the quality is not assured.
- But Pf3, which means publishing academic content in open platform, is increasing lightly, which is good.
SlopeGraph: Domain – Section Scores
Main Findings
- Law & Politics domain has relatively low scores for almost all of the sections. This may due to the domain features that it is hard to learn and write, the quality can hardly be improved by people from other domains.
- The Use and Pf scores are the lowest. Use means User behaviour and Pf means user participation in open platform. That means Wikipedia still have space to improve unless it can get those profound users well-involved.
- The highest Score is the SA part, and we have discussed about it.
Finding Summary
General observations
The entire resale public housing market is relatively stable with price dropping slightly and volume rebounding from 2013's dive.
For Supply:
- Most of the transactions are located in the new towns which are far away from CBD and like JuRong West, Tampines and Woodlands
- On the other hand, the less traded regions (top 3) are Bukit Timah, Marine Parade and Central Area, which are all located in the centre of the country.
- Within one region, the most traded flat type is normally “4 ROOM” for high transacted regions and “3 Room” for the less transacted regions.“1 ROOM” and “MULTI-GENERATION” are so few traded that we can hardly recognize them in the chart above.And “2 ROOM” are also traded much less than the other types.
- The Supply Volume experienced a big drop in 2013 and rebounded in 2015.
- Volume Distribution in terms of flat types is very stable, and the volume ranking never change.
- The most traded type is “4 ROOM”, followed by “3 ROOM” and “5 ROOM” when talking about the total supply of each year. These three flat types count more than 90% of the market.
- The transaction pattern of the major types of flat don’t have big change, they are all fluctuating slightly.
- For the most transacted type—“4 ROOM”, its portion of total is on an uptrend. On the opposite, the second transacted type—“3 ROOM” is experiencing a slight drop starting from 2014.
- The fluctuation shows that the supply has time series pattern. Normally there is a peak in October, and around April and May. And there is no apparent difference in terms of different flat types.
For Price:
- The Trading price per m2 largely following a normal distribution with a fat right tail. The average price of 2015 is S$4817/m2, which is right biased due to the fat right tail.
- What we can find is, the resale prices are affected by the human preference—the most traded prices are multiples of 100, especially those which are multiples of 500.
- The Price Distribution Patterns of 2015 and 20161H are very similar.
- As we can see, the more the ROOMs of a flat, the less expensive the average price which is consistent with economies of scale—more ROOMs mean bigger floor area, and that will make the building cost and other costs that are allocated to per m2 cheaper.
- The most transacted top 3 types of the market are respectively “4 ROOM”, “3 ROOM” and “5 ROOM”, but their prices ranking (low to high) is not consistent with the transaction volume ranking, which means the economical one is not always the most traded. And that means other factors like people’s usage needs and market supply pattern are also affecting people’s property preference, which is very reasonable in real-life property transaction.
- The town price ranking is relatively stable, and the prices for the central areas are more expensive than those new remote areas, which are consistent with our common sense.
- As you can see, starting from 2015, for those expensive areas (especially Central Area), the price per m2 actually goes even higher, and for those economical areas, the price per m2 is going lower. To sum up, the high higher, the low lower. The high increased more obviously than the low drop.
- For 2012 to 2016 1H, the central area’s price changes much more dramatically than other areas, experiencing a big jump in 2015 and dropped back slightly in the first half of 2016.
- The transaction prices are dropping during the years.
- The average prices of each type are very stable during the years when compared with the average prices of the respective years.
Data Dictionary
- AGE: numeric
- GENDER: 0=Male; 1=Female
- DOMAIN: 1=Arts & Humanities; 2=Sciences; 3=Health Sciences; * 4=Engineering & Architecture; 5=Law & Politics
- PhD: 0=No; 1=Yes
- YEARSEXP (years of university teaching experience): numeric
- UNIVERSITY: 1=UOC; 2=UPF
- UOC_POSITION (academic position of UOC members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
- OTHER (main job in another university for part-time members): 1=Yes; 2=No
- OTHER_POSITION (work as part-time in another university and UPF members): 1=Professor; 2=Associate; 3=Assistant; 4=Lecturer; 5=Instructor; 6=Adjunct
- USERWIKI (Wikipedia registered user): 0=No; 1=Yes
The following survey items are Likert scale (1-5) ranging from strongly disagree / never (1) to strongly agree / always (5)
Perceived Usefulness
- PU1: The use of Wikipedia makes it easier for students to develop new skills
- PU2: The use of Wikipedia improves students' learning
- PU3: Wikipedia is useful for teaching
Perceived Ease of Use
- PEU1: Wikipedia is user-friendly
- PEU2: It is easy to find in Wikipedia the information you seek
- PEU3: It is easy to add or edit information in Wikipedia
Perceived Enjoyment
- ENJ1: The use of Wikipedia stimulates curiosity
- ENJ2: The use of Wikipedia is entertaining
Quality
- QU1: Articles in Wikipedia are reliable
- QU2: Articles in Wikipedia are updated
- QU3: Articles in Wikipedia are comprehensive
- QU4: In my area of expertise, Wikipedia has a lower quality than other educational resources
- QU5: I trust in the editing system of Wikipedia
Visibility
- VIS1: Wikipedia improves visibility of students' work
- VIS2: It is easy to have a record of the contributions made in Wikipedia
- VIS3: I cite Wikipedia in my academic papers
Social Image
- IM1: The use of Wikipedia is well considered among colleagues
- IM2: In academia, sharing open educational resources is appreciated
- IM3: My colleagues use Wikipedia
Sharing attitude
- SA1: It is important to share academic content in open platforms
- SA2: It is important to publish research results in other media than academic journals or books
- SA3: It is important that students become familiar with online collaborative environments
Use behaviour
- USE1: I use Wikipedia to develop my teaching materials
- USE2: I use Wikipedia as a platform to develop educational activities with students
- USE3: I recommend my students to use Wikipedia
- USE4: I recommend my colleagues to use Wikipedia
- USE5: I agree my students use Wikipedia in my courses
Profile 2.0
- PF1: I contribute to blogs
- PF2: I actively participate in social networks
- PF3: I publish academic content in open platforms
Job relevance
- JR1: My university promotes the use of open collaborative environments in the Internet
- JR2: My university considers the use of open collaborative environments in the Internet as a teaching merit
Behavioral intention
- BI1: In the future I will recommend the use of Wikipedia to my colleagues and students
- BI2: In the future I will use Wikipedia in my teaching activity
Incentives
- INC1: To design educational activities using Wikipedia, it would be helpful: a best practices guide
- INC2: To design educational activities using Wikipedia, it would be helpful: getting instruction from a colleague
- INC3: To design educational activities using Wikipedia, it would be helpful: getting specific training
- INC4: To design educational activities using Wikipedia, it would be helpfull: greater institutional recognition
Experience
- EXP1: I consult Wikipedia for issues related to my field of expertise
- EXP2: I consult Wikipedia for other academic related issues
- EXP3: I consult Wikipedia for personal issues
- EXP4: I contribute to Wikipedia (editions, revisions, articles improvement...)
- EXP5: I use wikis to work with my students