Difference between revisions of "ANLY482 AY2016-17 T2 Group10 Project Overview: Methodology"
Jump to navigation
Jump to search
Jxsim.2013 (talk | contribs) |
Jxsim.2013 (talk | contribs) |
||
Line 45: | Line 45: | ||
<!-- Body --> | <!-- Body --> | ||
− | ==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em; text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic'; border-left:8px solid #1b96fe"><font color= #000000><strong>Data | + | ==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em; text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic'; border-left:8px solid #1b96fe"><font color= #000000><strong>Data Preparation</strong></font></div>== |
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | ||
− | + | Data preparation involves cleaning, transformation, and integration, which are standard procedures to standardize data across different datasets for their many formats, errors in data entries and granularity. We will first look at each of the data files, determine best ways to standardize formats and then perform aggregations on more granular data for integration purposes. | |
</div> | </div> | ||
<!-- End Body ---> | <!-- End Body ---> | ||
Line 53: | Line 53: | ||
<!-- Body --> | <!-- Body --> | ||
− | ==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em; text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic'; border-left:8px solid #1b96fe"><font color= #000000><strong> | + | ==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em; text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic'; border-left:8px solid #1b96fe"><font color= #000000><strong>MCCP</strong></font></div>== |
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
</div> | </div> | ||
<!-- End Body ---> | <!-- End Body ---> | ||
Line 71: | Line 60: | ||
<!-- Body --> | <!-- Body --> | ||
− | ==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em; text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic'; border-left:8px solid #1b96fe"><font color= #000000><strong> | + | ==<div style="background: #ffffff; padding: 17px;padding:0.3em; letter-spacing:0.1em; line-height: 0.1em; text-indent: 10px; font-size:17px; text-transform:uppercase; font-weight: light; font-family: 'Century Gothic'; border-left:8px solid #1b96fe"><font color= #000000><strong>Invoice Details</strong></font></div>== |
<div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | <div style="margin:0px; padding: 10px; background: #f2f4f4; font-family: Century Gothic, Open Sans, Arial, sans-serif; border-radius: 7px; text-align:left; font-size: 15px"> | ||
− | === | + | ===<span style="line-height: 0.1em;text-indent: 10px;background-color:#1b96fe;padding:5px;border-radius:5px;font-size:15px"><font color="white">Data Cleaning</font></span>=== |
− | + | A brief scan of the entire Invoice Details data table led to 3 main areas to be cleaned. | |
− | + | # Missing values in Price$ column | |
− | = | + | # Negative values in Sales Qty and Amount$ columns |
− | + | # Some Postal Code with only 5 digits (because they start with 0) | |
− | |||
− | |||
− | |||
− | |||
− | |||
</div> | </div> | ||
<!-- End Body ---> | <!-- End Body ---> |
Revision as of 18:01, 21 February 2017
Data Preparation
Data preparation involves cleaning, transformation, and integration, which are standard procedures to standardize data across different datasets for their many formats, errors in data entries and granularity. We will first look at each of the data files, determine best ways to standardize formats and then perform aggregations on more granular data for integration purposes.
MCCP
Invoice Details
Data Cleaning
A brief scan of the entire Invoice Details data table led to 3 main areas to be cleaned.
- Missing values in Price$ column
- Negative values in Sales Qty and Amount$ columns
- Some Postal Code with only 5 digits (because they start with 0)