Difference between revisions of "Network Analysis of Interlocking Directorates/Findings Insights"

From Analytics Practicum
Jump to navigation Jump to search
(Changed headings)
Line 21: Line 21:
 
<font size=3px>
 
<font size=3px>
  
This file contains a list of companies currently operating in Singapore and their relevant business information. The initial extraction produced 66,023 rows of data, but after reviewing the duplicates caused by the selection process, only 50,677 rows remains. This data consists of detailed company-level data such as its industry classification and parent companies making up 22 attributes. Some important attributes used include the company name which will be used to join this list to the list of executives. Information on the industry classification, parent company and country information will be used to find relationships during our social network analysis (SNA).  
+
The company list table contains 71,265 records, where each record represents a company and its relevant information. In order to ensure the completeness of the result, we extracted the whole dataset without subtracting any duplicates. After extracting, we observed that data was missing in several columns. As some attributes are not critical to our project, we will seek to eliminate these columns in the future. Data-cleaning was carried out on the selected columns, as described below:
 +
 
 +
Step 1: Categorizing companies
 +
From the initial extraction, OneSource provided us with 192 in-depth categories delving deeply into the different industries. As this would prevent us from getting a broader picture of Singapore’s corporate environment, we further classified them into 29 broader categories, applied from OneSource definition of Industrial Classification. This is reflected in the column named “Category” and can be found in the file “Company List Processed”.
 +
 
 +
Step 2: Filling up missing data
 +
Through a quick scan at our data, we observed that many companies had empty cells under the parent company and parent country columns. As this information is useful to our analysis, we did an Internet search on the companies’ profiles to find the information on their parent company and filled up the empty values accordingly. For companies that we were unable to find information on, we assumed that they had no parent companies and that their parent countries were Singapore. Therefore, companies with no parent companies will have their “Parent Company” cell filled with their own name and “Parent Country” to be filled with “Singapore”.
 +
We also faced an issue with inconsistent postal codes data given by OneSource. As our team aims to explore the use of a position-based approach in our future analysis, postal codes are important to us. Singapore postal code normally consists of six digits; hence, we did a check where the postal code value were not 6-digit value, and performed Internet searches to fill in empty cells. If the searches do not return results, we would then fill in the cell as “NA”.
 +
As filling the missing data requires manual work for searching information on the Internet for more than 1200 records, this step requires a certain amount of time and effort to fill up the empty cells.
  
While the list produced by the initial extraction was extensive, data was missing in several columns. In particular, many companies had empty cells under the parent company and parent country columns. As this information is pertinent to our project, we did a basic internet search on the companies’ profiles to fill them up as best as we could. For companies which we were unable to find information on, the team assumes that they had no parent companies and that their parent countries were Singapore. We feel that this is a valid assumption because most of the companies missing the information were small private Singapore companies.
 
  
 
=<font face="Corbel"  color= #8A740C> List of Executives </font>=
 
=<font face="Corbel"  color= #8A740C> List of Executives </font>=

Revision as of 22:19, 3 March 2015

Home Project Overview Findings & Insights Project Management Project Documentation Learning Outcomes


Data Files

The data set extracted from OneSource consists of 2 different files, a list of companies in Singapore and a list of executives in those companies.

List of Companies

The company list table contains 71,265 records, where each record represents a company and its relevant information. In order to ensure the completeness of the result, we extracted the whole dataset without subtracting any duplicates. After extracting, we observed that data was missing in several columns. As some attributes are not critical to our project, we will seek to eliminate these columns in the future. Data-cleaning was carried out on the selected columns, as described below:

Step 1: Categorizing companies From the initial extraction, OneSource provided us with 192 in-depth categories delving deeply into the different industries. As this would prevent us from getting a broader picture of Singapore’s corporate environment, we further classified them into 29 broader categories, applied from OneSource definition of Industrial Classification. This is reflected in the column named “Category” and can be found in the file “Company List Processed”.

Step 2: Filling up missing data Through a quick scan at our data, we observed that many companies had empty cells under the parent company and parent country columns. As this information is useful to our analysis, we did an Internet search on the companies’ profiles to find the information on their parent company and filled up the empty values accordingly. For companies that we were unable to find information on, we assumed that they had no parent companies and that their parent countries were Singapore. Therefore, companies with no parent companies will have their “Parent Company” cell filled with their own name and “Parent Country” to be filled with “Singapore”. We also faced an issue with inconsistent postal codes data given by OneSource. As our team aims to explore the use of a position-based approach in our future analysis, postal codes are important to us. Singapore postal code normally consists of six digits; hence, we did a check where the postal code value were not 6-digit value, and performed Internet searches to fill in empty cells. If the searches do not return results, we would then fill in the cell as “NA”. As filling the missing data requires manual work for searching information on the Internet for more than 1200 records, this step requires a certain amount of time and effort to fill up the empty cells.


List of Executives

This file contains the personal details and titles of executives who are currently working in the companies above. The initial extraction produced 117,370 rows of data but after reviewing duplicate entries, we have reduced it to 79,330 rows. This list contains 16 attributes but only 5 attributes will be used from the data. 3 of these attributes make up the name of the executives and will be the basis of our edges in the SNA. The company name is used to join our 2 datasets together and the executive titles may assist us in drawing inferences for our conclusion.

Although OneSource provided us clear options regarding executive titles while extracting, the resulting titles varied wildly. This is to be expected because companies may have differing views on how to label their executives. To achieve a clearer analysis, our team further categorized the titles in line with the options provided by OneSource.

Fidings & Insights will be added after the analysis has been done.
Please check back later.

Work-in-progress.png