ISSS608 2016-17 T1 Assign2 CHIA Yong Jian
Contents
Dataset Chosen
The dataset chosen is the US Stocks Fundamental Data (XBRL).
Theme of Interest and Motivation
The stock markets, other than allowing companies to raise equity from a pool of investors (https://www.theguardian.com/sustainable-business/stock-markets-no-longer-fit-purpose), also allow fund managers and retail investors to participate in the market to grow their capital through short term or long term investments in the companies.
For this dataset, I will explore three main questions, drilling down from the macro, to the micro:
- Overview of the US Market - Understand what is "out there" for fund managers and investors to invest in, with consideration to the sectors and market capitalisation of each company
- Crucial financial ratios - Compare financial ratios across companies that are crucial for value investing, that is, buying stocks of companies that trade for less than their intrinsic value to profit from their long-term performance, while showing stability in their cash flows and debt servicing (http://www.goodreads.com/book/show/75893.The_Little_Book_of_Value_Investing).
- . Understanding the make-up of a firm's cash flow in or out by financing, investing and operating activities.
Data Sources
The following data sources are used:
- . From https://www.kaggle.com/usfundamentals/us-stocks-fundamentals - (a) indicators_by_company.csv - Provides the core information of indicators as reported by companies to the U.S. Securities and Exchange Commission, (b) companies.csv - Provides the mapping of the company name to the company ID.
- . From http://usfundamentals.com/ - (a) companies-names-industries.csv - Provides the NAICS industry sector (http://www.census.gov/eos/www/naics/) information for each company
- . From http://www.fasb.org/jsp/FASB/Page/SectionPage&cid=1176164335312 - (a) Taxonomy_2016.xlsx - provides some information on labels
Data Visualization Link
Discussion of Results
Rough Workings/Discussion on Process
Data Sources and Analysis
Data Challenges
- Information scattered across multiple files
- Incomplete information for each file
Original preparations performed in JMP:
- Open indicators_by_company.csv, perform a transpose
- Join to companies.csv
- Join to companies-names-industries.csv. There are records with missing NAICS industry names. These are inherent in source data, will be recoded to "Not Available". A Sample of Column Names are as below. Save to SAS7BDAT file
- Open in Tableau
However the above steps generated a huge export file (of a few GBs), which causes loading issues with Tableau. A review of the process was performed.
For Tree Map:
- The files are joined in Tableau instead. Taxonomy file is not joined as not all labels available in the original indicators_by_company file is available in the Taxonomy file.
Tools Utilised
- SAS JMP Pro 12
- Tableau 10