Network Analysis of Interlocking Directorates/Findings Insights

From Analytics Practicum
Revision as of 21:28, 23 April 2015 by Tw.zheng.2011 (talk | contribs)
Jump to navigation Jump to search
Home Project Overview Findings & Insights Project Management Project Documentation Learning Outcomes


Data Files

The data used in this project have been extracted from OneSource Database, with access granted from SMU Li Ka Shing Library. OneSource was chosen because it contains comprehensive information of public and private company and industry information worldwide, including company profiles, news, financial data, executive profiles, analyst reports, business and trade articles, etc. Not only can we extract the companies and executives list from OneSource, but we can also track back to the company’s financial profiles or news related to the company at any time; it would be a useful source of information to this project. Details of the data collection and preparation will be discussed below.

The data set extracted from OneSource consists of 2 different files, a list of companies in Singapore and a list of executives in those companies.

List of Companies

The company list table contains 50,334 records, where each record represents a company and its relevant information. While extracting, we found that there are occasionally discrepancies between the total number of companies shown in OneSource and the total number we could extract. However, as the such occurences are low, we considered OneSource to be still reliable for data collection. We categorized the companies based on the industries classification on ISIC Rev. 4. ISIC Rev. 4 (International Standard Industrial Classification of All Economic Activities) is a standard structure to categorize businesses provided by the United Nations Statistics Division.


Step 1: Filling up missing data

Through a quick scan at our data, we observed that many companies had empty cells under the parent company and parent country columns. As this information is useful to our analysis, we did an Internet search on the companies’ profiles to find the information on their parent company and filled up the empty values accordingly. For companies that we were unable to find information on, we assumed that they had no parent companies and that their parent countries were Singapore. Therefore, companies with no parent companies will have their “Parent Company” cell filled with their own name and “Parent Country” to be filled with “Singapore”.

Step 2: Editing inconsistent postal codes We also faced an issue with inconsistent postal codes data given by OneSource. As our team aims to explore the use of a position-based approach in our future analysis, postal codes are important to us. Singapore postal code normally consists of six digits; hence, we did a check where the postal code value were not 6-digit value, and performed Internet searches to fill in empty cells. If the searches do not return results, we would then fill in the cell as “NA”.As filling the missing data requires manual work for searching information on the Internet for more than 1200 records, this step requires a certain amount of time and effort to fill up the empty cells.

As it is impractical to analyze 50,000 postal codes, our team also derived a new attribute named “Postal Sector”. Every Singapore postal code contains 6 digits with the first 2 digits indicating an address’s postal sector. These 2 digits can range from “01” to “82” and they can be further classified into one of the 28 postal districts in Singapore. Hence, our team was able to determine the postal sector of every company with a postal code and further classify them into the 28 different postal districts of Singapore.

Final company 1.PNG Final company 2.PNG

List of Executives

The attributes we extracted from OneSource include: Executive Name, Executive Title, Company Name, Industry that the company belongs to, Postal Code where company’s headquarter locates, and the Country (Singapore). In OneSource, the executive list is very detailed and comprehensive, ranging from top most senior executives to lower level such as head of department directors. Due to the scope of the project, we only considered those high-level management executives, namely: Board of Directors, Senior Officers & C-Level, Executive Vice Presidents, Senior Vice Presidents, and Vice Presidents.

The Executive List contains 16 attributes, yet we found that there were some unnecessary columns, as well as missing attributes. Therefore, we had cleaned the data as described below:

Step 1: Filling in the Executives’ Full names

The data extracted from OneSource did not provide us the full name of the executives. Instead, there were 3 columns, First, Middle and Last names. However, these values were inconsistent as some cells contained single value, and some cells contained the full name of the executives. Hence, we created a new column to merge the full name under this column.

Step 2: Correct Executive Names

Multiple variations of the same executive names are also a recurring issue faced by our group. As some companies record their executives names with only their name given at birth while others record both the executives’ birth name and names which they more often go by, multiple variations of the same executive name can occur. For instance, the name “Adrian Chan Pengee” has multiple variations such as “Adrian Chan”, “Chan Pengee”, Adrian Pengee Chan”, etc. To address this issue, our team used Excel pivot tables to identify the duplicated values and manually check and update the full name values such as “Adrian Chan Pengee”. This part of the data cleaning would help us to gain cleaner data and correctly represent the nodes and connections in our further visualization.

Final dataset.png

Initial Findings

To gain a broader perspective of Singapore’s corporate environment, our team first conducted several univariate analyses on our dataset.

Firstly, the team finds that Singapore is the most frequent country where parent and ultimate parent companies arise. 85.7% of parent companies belong to Singapore while the US is a distant second with only 3.3%. We also observe that the frequencies of parent country and ultimate parent country do not differ extensively, staying within 1% of each other. However, it should be noted that such a pattern may arise because of our default assumption in assuming all parent countries arise from Singapore if no data was provided by OneSource. Therefore, future research can delve deeper into the truth of this assumption.

Secondly, we tried to identify the top 10 industries in Singapore, where the most number of listed companies reside in. The largest industry by number of companies in Singapore is the Wholesale industry. As seen in Annex B, wholesale companies make up 16.7% of Singapore companies, followed by manufacturing (12.8%) and business and management services (8.9%). The team also observed that Singapore corporate environment is dominated by private independent companies. 66.0% of Singapore companies are private independents followed by private subsidiaries which form 27.5% of the dataset.

Duality of Interlocking Directorates

Visualization process:

We used Gephi, a data visual analytics tool, to visualize the interlocking directorates network. Gephi is also an open source interactive network exploration visualization. We also did the experiments on another visualization software - NodeXL, yet because Gephi proved to be stronger in processing large data, eventually I mostly use Gephi to generate the graphs.

The visualization process in Gephi requires a lot of trials and errors in adjusting the properties to come up with meaningful graphs, which are useful for fraud risk assessment analysis. The first graph we generated was a group of gray nodes which cannot express any insights at all (Graph 4.3.1 - Gephi 1st visualization). Because the network contains of directors and companies, which should be displayed differently in the network, we separate them by color coding: the nodes in red represent the company, and those in blue represent the executives (Graph 4.3.2 - Gephi 2nd visualization).

After color coding the nodes, it was still difficult to see the connections between them, so we managed to rank the nodes according to its betweenness centrality, so the nodes with higher betweenness centrality scores will be displayed as bigger than those with lower scores. I also carefully considered different layout and display options to come up with the final layouts, which will be discussed in the next section.

Gephi 3.png

Results:

Full Network Visualization:

Graph 5.1.1. represents the whole Singapore interlocking directorates network. The red color nodes represent the companies, and blue color represent the directors. The sizes of the nodes are ranked according to betweenness centrality; the bigger the node is, the higher betweenness centrality score it has. A high betweenness centrality might suggest that the individual is connecting various different parts of the network together; hence, the big nodes are expected to hold more control and influence over the network. Nodes around the edge of the network are typically have a low betweenness centrality. This whole network visualization allows us to broadly see who are the most influential executives or firms in Singapore. In addition, Appendix 2 & 3 shows the names of top 10 influential executives and top 10 influential companies in Singapore Interlocking Directorates network.

Gephi 4.png

The full network graph are composed of multiple smaller components. The smallest component is one company linked with one or more executives. On the sample below (Graph 5.2.1), Tong Lee Company Pte Ltd consists of only two executives on its directors boards: Brian Ang and Lawrence Ang. On another example, Graph 5.2.2 below shows CPG Investments Pte. Ltd. which has many executives sitting on its directors board. There is actually no limitation on the number of executives firms can have in their directors boards. It depends on the size of the firms, as well as the structure of their governance. However, in social network concept, firms should take into account that the more executives are connected with a company, the more possibility that the information from the company will be widely distributed.

Gephi 5.png Gephi 6.png

Graph 5.1.3 is a bigger component which has connections of interlocking directorships. Fsl Trust Management Pte. Ltd and First Ship Lease Trust are connected by three executives: Roger Woods, Philip Tan Eng Lay and Timothy James Reid. The three executives hold the roles as the coordinators, tying the relationship between the two firms. It is useful to know the component as component can be used as an effective tool to identify how meaningful are the ties amongst the actors in the interlocking network. In Gephi, I can use the filter to show the component related to a specific actor. For instance, if an investor wants to see the connection of executive Roger Woods, he can enter “Roger Woods” to Gephi Ego Topology Filter, and Gephi will show the network of only Roger Woods, instead of the whole network of Singapore executives.

Gephi 7.png

Networks classified by Industry

Some prior studies have raised the concern that different industries may have different impacts on the interlocking directorship. Hence, in Gephi, I have made some visualization using different dataset from three different industries. The results will be discussed below.

Wholesale & Retail Trade Industry

Table 1.PNG

According to industry distribution (Figure 5.3.1), Wholesales and Retail Trade has the highest number of companies involved, almost doubled the following industry - Manufacturing. Because of the high number of companies, the distribution is highly dense to the edge, where the companies has little or no links with one another. Overall, interlocking directorates are being practiced in this sector, and most of the executives who hold more important position than others (being the larger nodes) are quite evenly distributed as the node sizes are not so much differentiated from each other. There are only a few people who has more influence in the network, and are shown as larger blue nodes: David Tan, Siew Tan, James Tan, Lan Sim Lim, Richard Tan, etc. Investors and researchers who are interested in the people involved in Wholesales & Retail Trade Interlocking Directorates network can use Gephi filter to analyze only some specific nodes which relate to their interests.

Gephi 8.png

Manufacturing Interlocking Network

Being the second largest industry in Singapore, Manufacturing Industry has a lot of interlocking directorates connections. However, unlike the Wholesales and Retail Trade Interlocking Network, Manufacturing has more of the “more important” executives, being shown as a cluster of big nodes in the centre. It is highlighted that the executives in the centre are more actively holding positions in directors boards concurrently in multiple firms, thus controlling more information and more power across the network. As the power is concentrated in the hands of these governing elites, there is higher potential that fraud can happen as well. The hubs (executives who have many connections to multiple firms, or firms that hold connections to more executives), can be an early indication of accounting fraud which may happen in the future, and so these hubs need more attention from the investors, researchers or auditors.

Gephi 9.png

Financial & Insurance Industry

Because Financial Services has always been seen as an important industry, we further analyze the distribution of interlocking directorates in the Financial and Insurance Activities Industry. The pattern of practicing interlocking directorates in the Financial Services is different from those in the previous sectors’: in the Financial and Insurance Activities , there are more firms centered in the distribution, and Singapore Exchange Limited has dominated the information control and influence in the network, being the largest node. In terms of betweenness centrality, SGX has the highest betweenness centrality scores, allowing it to take control of the information mostly in the Financial Industry.

To understand how much important is Singapore Exchange, we used Gephi filter to get only the nodes which are related or can be reached via SGX, and came up with a big component below, showing the central of Financial & Insurance Activities interlocking directorates network.

Gephi 10.png Gephi 11.png

Besides SGX which dominated the industry, there are a few other banks that also have big impacts on other business: Oversea Chinese Banking and United Overseas Bank (UOB), or GIC Pte Ltd, IPS Capital Ltd and Wbl Corporation Ltd. It is interesting to know that by only choosing SGX as the main actor, it is eventually connected to every other actors (firms and executives inclusively) in the center of Financial & Insurance Activities Interlocking Network. There are also people who hold important positions in these large companies, and thus results on them having higher betweenness centrality as well. These people are Wee Ghee Quah, Teck Poh Lai, Tao Soon Cham, Song Keng Lau, Davinder Singh, Kok Song Ng, etc. Thus, by conducting the same experiment on the specific actor of interest, the ties linked to that actor can be filtered out to see how influential that actor is.

Nevertheless, we understand that it might not be completed to just only look at the network industry one by one. Thus we take a step further to discuss the network based on geographic locations of the firms’ offices location.

Connections in the same building


Spatiality of Interlocking Directorates

Fidings & Insights will be added after the analysis has been done.
Please check back later.

Work-in-progress.png