Network Analysis of Interlocking Directorates/Findings Insights

From Analytics Practicum
Revision as of 22:30, 3 March 2015 by Tw.zheng.2011 (talk | contribs)
Jump to navigation Jump to search
Home Project Overview Findings & Insights Project Management Project Documentation Learning Outcomes


Data Files

The data set extracted from OneSource consists of 2 different files, a list of companies in Singapore and a list of executives in those companies.

List of Companies

The company list table contains 71,265 records, where each record represents a company and its relevant information. In order to ensure the completeness of the result, we extracted the whole dataset without subtracting any duplicates. After extracting, we observed that data was missing in several columns. As some attributes are not critical to our project, we will seek to eliminate these columns in the future. Data-cleaning was carried out on the selected columns, as described below:

Step 1: Categorizing companies

From the initial extraction, OneSource provided us with 192 in-depth categories delving deeply into the different industries. As this would prevent us from getting a broader picture of Singapore’s corporate environment, we further classified them into 29 broader categories, applied from OneSource definition of Industrial Classification. This is reflected in the column named “Category” and can be found in the file “Company List Processed”.

Step 2: Filling up missing data

Through a quick scan at our data, we observed that many companies had empty cells under the parent company and parent country columns. As this information is useful to our analysis, we did an Internet search on the companies’ profiles to find the information on their parent company and filled up the empty values accordingly. For companies that we were unable to find information on, we assumed that they had no parent companies and that their parent countries were Singapore. Therefore, companies with no parent companies will have their “Parent Company” cell filled with their own name and “Parent Country” to be filled with “Singapore”. We also faced an issue with inconsistent postal codes data given by OneSource. As our team aims to explore the use of a position-based approach in our future analysis, postal codes are important to us. Singapore postal code normally consists of six digits; hence, we did a check where the postal code value were not 6-digit value, and performed Internet searches to fill in empty cells. If the searches do not return results, we would then fill in the cell as “NA”. As filling the missing data requires manual work for searching information on the Internet for more than 1200 records, this step requires a certain amount of time and effort to fill up the empty cells.


List of Executives

The Executive List contains 16 attributes, yet we found that there were some unnecessary columns, as well as missing attributes. Therefore, we had cleaned the data as described below:

Step 1: Filling in the Executives’ Full names

The data extracted from OneSource did not provide us the full name of the executives. Instead, there were 3 columns, First, Middle and Last names. However, these values were inconsistent as some cells contained single value, and some cells contained the full name of the executives. Hence, we created a new column to merge the full name under this column.

Step 2: Correct Executive Names

Multiple variations of the same executive names are also a recurring issue faced by our group. As some companies record their executives names with only their name given at birth while others record both the executives’ birth name and names which they more often go by, multiple variations of the same executive name can occur. To address this issue, our team used a pivot table to identify the duplicated values before manually checking and updating each of the full name values. This part of the data cleaning would help us to gain cleaner data and correctly represent the nodes and connections in our further visualization.

Initial Findings

To gain a broader perspective of Singapore’s corporate environment, our team first conducted several univariate analyses on our dataset.

Firstly, the team finds that Singapore is the most frequent country where parent and ultimate parent companies arise. 85.7% of parent companies belong to Singapore while the US is a distant second with only 3.3%. We also observe that the frequencies of parent country and ultimate parent country do not differ extensively, staying within 1% of each other. However, it should be noted that such a pattern may arise because of our default assumption in assuming all parent countries arise from Singapore if no data was provided by OneSource. Therefore, future research can delve deeper into the truth of this assumption.

Secondly, we tried to identify the top 10 industries in Singapore, where the most number of listed companies reside in. The largest industry by number of companies in Singapore is the Wholesale industry. As seen in Annex B, wholesale companies make up 16.7% of Singapore companies, followed by manufacturing (12.8%) and business and management services (8.9%). The team also observed that Singapore corporate environment is dominated by private independent companies. 66.0% of Singapore companies are private independents followed by private subsidiaries which form 27.5% of the dataset.

Moving Forward

In the next iterations, the team will complete data processing then move on to analysis of quantitative metrics and creating the visualization of the directorates’ network. We will also start researching on relevant quantitative metrics such as centrality and density in existing literature to assist our efforts in gaining more in-depth analysis of our visualization.

Fidings & Insights will be added after the analysis has been done.
Please check back later.

Work-in-progress.png