Group14 Project Findings

From Analytics Practicum
Revision as of 13:18, 16 March 2017 by Gaurib.2013 (talk | contribs)
Jump to navigation Jump to search
Group Logo


HOME

 

ABOUT US

 

PROJECT OVERVIEW

 

PROJECT FINDINGS

 

PROJECT MANAGEMENT

 

DOCUMENTATION

 

Data

Currently we have received the raw data provided by our sponsor including the following files:

  • Collection_Dataset_FY13 and FY14.xlsx
  • Patron_Headers.csv
  • Patron_Dataset_FY13.csv & Patron_Dataset_FY14.csv
  • TXN_Headers.csv
  • TXN_FY13.csv & TXN_FY14.csv

Summary of Data Cleaning

1.Both the TXN_Y13 & TXN_Y14 datasets displayed that the transaction’s timeline from each year’s April to the next year’s April. (12 months)

2.In both the TXN_Y13 & TXN_Y14 datasets, there were rows where the Patron UID was set to “0”, whereas both the Patron_Dataset_FY13 & Patron_Dataset_FY14 datasets do not contain any data of a Patron with UID “0”. After exploring the data, 1335 counts & 65 counts of rows of Data contained Patron UID set to “0” in the TXN_Y13 & TXN_Y14 datasets respectively.

PatronUID_0



3.In the TXN_Y13 dataset, there were Patron Borrower Category Code types such as “DEAR”, “DEARC”, “DEARS” & “Deceased”. Based on the previous team’s data cleaning we discovered that “DEAR” should be removed from the dataset completely as it refers to institutional partnership programmes and NLB suggested to remove all records with Patron Borrower Category Code set to “DEAR” from further analysis. However, there was no mention of the other codes “DEARC”, “DEARS” & “Deceased”, thus for the time being our team has decided to remove those records as well. The frequency of these codes are displayed below for TXN_Y13 & TXN_Y14 datasets respectively.

DEAR



4.In the TXN_Y13 Transaction Dataset, 3 records have Branch Code set to “Bad Value” and 3,065 records set to “Missing Value”. In the TXN_Y14 Transaction Dataset, 9,467 records have Branch Code set to “Bad Value” and 1,600 records set to “Missing Value”.

5.Based on the updated Collection dataset, it was confirmed that there are currently 26 active libraries for NLB and therefore transactions in both the TXN_Y13 & TXN_Y14 datasets which did not belong to these 26 branch codes were removed, such as

a. 07LKCRL
b. 08LKCRL
c. 11LKCRL
d. ORN
e. RN
f. RU


6.Removed any transactions in both the TXN_Y13 & TXN_Y14 datasets that stated that the birthyear of the Patrons is “1900”, “2015”,”2016” as that referred to values being set to the year that the institutional programme was set up.

Invalid BirthYear