Qui Vivra Verra - Data Exploration

From Analytics Practicum
Revision as of 23:42, 10 October 2016 by Cxpong.2013 (talk | contribs)
Jump to navigation Jump to search







Summary of Anomalies & Errors

The team has performed exploratory data analysis and highlighted some anomalies as listed:

  • Anomaly 1

In the 2013 Patron Dataset, 2.534% of all records have Locale Planning ADZID set to “Bad Value” and “Missing Value”. In the 2013 Patron Dataset, 2.876% of all records have Locale Planning ADZID set to “Bad Value” and “Missing Value”. A snapshot of the anomaly is as shown below.

Anomaly 1.png
  • Anomaly 2

In the 2013 Transaction Dataset, 19,828 records have Patron_UID set to ‘0’. In the 2014 Transaction Dataset, 641 records have Patron_UID set to ‘0’. A snapshot of the anomaly is as shown below. A snapshot of the anomaly is as shown below.

Anomaly 2.png
  • Anomaly 3

In the 2013 Transaction Dataset, 3 records have Branch Code set to “Bad Value” and 3,065 records set to “Missing Value”. In the 2014 Transaction Dataset, 9,467 records have Branch Code set to “Bad Value” and 1,600 records set to “Missing Value”. A snapshot of the anomaly is as shown below.

Anomaly 3.png
  • Anomaly 4

In the 2013 Transaction Dataset, there are 893 patrons with Avg. No. of Books Borrowed (aggregated value) exceeding 32. In the 2014 Transaction Dataset, there are 779 patrons with Avg. No. of Books Borrowed exceeding 32. Furthermore, from the records with Avg. No. of Books Borrowed exceeding 32, we find that most of the records have attribute Patron Borrower Category Code set to “DEAR”. These records also have unrealistic values in attribute Patron Birthyear i.e. “1900”, “2015, and “2016”. These records have Patron Citizenship, Patron Race, Patron Gender set to “Others”. A snapshot of the anomaly is as shown below.

Anomaly 4.png
  • Anomaly 5

In the 2013 and 2014 Transaction Datasets, there are records with Branch Codes that do not exist in Collection_Dataset_FY13 and FY14, e.g. ‘07LKCRL’, 08LKCRL’. A snapshot of the anomaly is as shown below.

Anomaly 5.png

During Sponsor Meeting 01 with the NLB held on 06 October 2016, the team has consulted the NLB Analytics Team regarding the above-mentioned anomalies and noted the following:

  • Patron Borrower Category Code “DEAR”, is not a unique patron per se, but refers to institutional partnership programmes. NLB suggested to remove all records with Patron Borrower Category Code set to “DEAR” from further analysis.
  • Patrons with Birthyears ‘1900’, ‘2015’ and ‘2016’ are due to values being set to the year that the institutional programme was set up.
  • NLB suggested to exclude the Branch Codes that are not listed in the Collection Dataset.
  • NLB is agreeable with removing all records that contain the anomalies as described above.

Data Pre-processing

Coming Soon!

External Data Sources

Coming Soon!