Open source research isn’t only about analysing social media or satellite imagery. Another important area involves investigating company structures and relationships.
However, official company registries can be unwieldy or difficult to navigate.
This is where OpenCorporates – a free repository of company registries aggregated from primary public sources, published in order to promote corporate transparency – can be helpful. OpenCorporates is clear about the provenance of their data, so you know where they collected each record, and when. This is critical for companies and finance research and investigations.
While there are other databases like it, OpenCorporates is the only one that covers so many jurisdictions (145 as at 16 June 2023). To see other sites that offer access to corporate registries, go to the companies and finance tab of the Bellingcat Online Investigation Toolkit.
Still, even user-friendly websites like OpenCorporates can be a daunting prospect – conducting searches one by one takes a great deal of time, and it can involve a lot of copying and pasting of search results. But the Open Corporates API can save a lot of time.
An API is an Application Programming Interface. Broadly speaking, it allows users to obtain data from a database without having to know about the structure or languages used to manage that database. Users can build new and interesting tools with API access or create different ways to look at the data they contain, as many previously did with the Twitter API, for example.
Put simply, if you’re researching data at scale, access to the API of an organisation that stores a lot of data can be a powerful resource. It can allow researchers to explore and compare data in ways that might not otherwise be possible.
Imagine, for example, that you’re looking at a company and all the individuals and other companies related to it within OpenCorporates. If you did individual searches for each company and director, it would take a lot of note-taking and organising. Access to an API can allow you to pull all the data at once and create new methods of searching to more easily identify relationships.
Luckily, OpenCorporates provides free API access if you’re undertaking a public benefit project. We’ll detail how to apply for access later on in this guide.
Some investigative journalists say this method has helped them find leads earlier. David Szakonyi, co-founder of the Anti-Corruption Data Collective, reflects that “by accessing OpenCorporates’ API, we achieved in less than a day what would have taken two people between four and six months to do”. OpenCorporates helped the ICIJ to connect companies and directors for over 240,000 companies as part of the Panama Papers investigation, before releasing it all as open data.
What follows is a guide on how to get the best from Open Corporates. If you have coding or tool-building experience, you will be able to do more. But even those who don’t can learn how to use it and gather all manner of potentially valuable and revealing information. Whatsmore, once you understand how to explore the OpenCorporates API, you can begin to transfer that knowledge to other large datasets that provide API access such as the OpenSanctions database.
Much of what is contained within this guide can also be explored in the below video by Rebecca Lee of OpenCorporates. But this guide aims to build upon this very useful resource.
Editor’s note: Some of the images contained in this guide have been blurred as they include the details of real companies that are featured on OpenCorporates.
Go to the OpenCorporates (OC) service desk to apply for access as a public benefit project. Academics, NGOs, registered journalists, media organisations and registered nonprofits can apply for free access under an open public licence. If you’re an independent journalist, you’ll need to use your own name and email address and provide a list of media sites that have referred to your reporting. OpenCorporates may ask for a copy of a current press pass (or equivalent).
Note that their licence requires you to attribute the use of their database in your reporting or published investigation outputs — see their terms of use.
Based on the needs of your investigation, it is possible to obtain:
If you’re not sure yet, start with the ‘Data for one company’ section as it’s the easiest, then work your way up from there.
OpenCorporates has a knowledge base, which is a good place to find things like the API documentation and data dictionaries (which cover things like what is the ‘normalised_name’ field – you need to understand this when you are drawing data to use in an investigation).
Use this method when you have no programming experience. It’s useful when you want to pull the search results for one company into Excel. But it does have the limitation that Excel is made for viewing data all in one table.
This method involves: Using your browser to query the API; downloading an XML file; and viewing the data in Excel by opening it as an XML table.
Query the API With your Browser
Search for the company that you want to view on opencorporates.com and select it (i.e. click on the company name in the search results to view its page).
Go to the browser’s address bar and adjust the URL:
Hit enter, and the data you have queried via the API will appear in your browser. Right-click and save as page source:
You will need to tidy the table as when it displays the data, Excel will duplicate some of the company information (in columns) as it creates one row per officer. You will also need to remove some columns, like ‘@type’.
Use this method when you have no programming experience but are comfortable in Excel. It’s useful when you know the company name and want to draw data for all the companies with that name across all the jurisdictions that OpenCorporates covers.
This method involves: Using your browser to query the API; downloading an XML file; viewing the data in Excel by using the XML source task pane.
Query the API with your Browser
Search with the company name on opencorporates.com. You should get a URL that looks something like this:
https://opencorporates.com/companies?q=company+name&utf8=✓ (Your search terms will appear in place of ‘company’ and ‘name’.)
Go to the browser’s address bar and adjust the URL:
Hit enter and the data you have queried via the API will appear in your browser. Right-click and save as XML.
Depending on which version of Excel you have, opening the file in Excel will work differently.
It’s also possible to upload to Google Sheets — an explanation of how to do so can be found here.
Open the file and it will open like any Excel file. Then clean the data to make it more usable (e.g. delete unnecessary columns).
Open the XML file, and make sure that you choose ‘Use the XML source task pane’.
Click ‘OK’ on the next box that pops up. You should see something like this:
This is a schema based on the XML data that you have opened — it’s the structure of the data.
Select Which Fields you Want to View
Select the fields by dragging and dropping them onto the Excel sheet:
Drag and drop as many fields as you need as columns in row 1. You’ll end up with something like this:
To populate the sheet with your chosen data, right-click somewhere on row 1, and choose ‘Refresh XML data’:
That will pull in the data for the fields you want. It will look something like this:
Note: You may well end up with less results in your Excel sheet than in your search on opencorporates.com. This is because there is a default page limit in the XML results (see the bottom of the page in your browser):
To get all the results, you need to add this to the URL that you used to query the API:
&per_page=100&page=1
Your URL will look something like this:
https://api.opencorporates.com/companies/searchq=company+name&api_token=XXXX&format=xml&per_page=100&page=1
Hit enter to refresh the page, and save that page as XML.If your search results on the OpenCorporates website were more than 100, just change the end of the URL to &page=2 and hit enter to get the next page of data.
Use this method when you have a list of companies for which you want to find matching legal entities across all jurisdictions on OpenCorporates. This method is more involved and requires the use of OpenRefine, a popular open source data wrangling tool that has built-in functions for accessing data through APIs, something that is not possible with Excel or other spreadsheet programs. Naturally, this will involve more learning if OpenRefine is new to you. But it’s worth it if you want to draw down company data at scale – this is where using the API gets really powerful!
A note: In this method you are using the OpenCorporates reconciliation API (a.k.a. the legal entity resolution API).
This method involves:
Get Approval from OpenCorporates
If you want to access the API via OpenRefine you will need to get on the “approved” list by contacting [email protected].
Install and Set Up in OpenRefine
Reconcile (Match) your List against OpenCorporates
(Note: This URL may change. If it doesn’t work, check the API documentation on api.opencorporates.com.)
If the companies in your list are all in one jurisdiction, you can constrain the reconciliation to only look for companies in that jurisdiction in the OpenCorporates database.
In this image, two jurisdiction-constrained reconciliations have been added – one for the UK and one for the US state of Montana.
Improve your Reconciliation: Add Multiple Parameters
Adding multiple parameters to your reconciliation will improve the matching results. The reconciliation service can handle these two: country (jurisdiction) and date. You have to have those two columns in your CSV list.
To transform your date column, click on the arrow next to ‘Date Active’ (or whatever the column name was in your data), then ‘Edit cells’, then ‘Common transforms’, then select ‘To date’:
Your uploaded CSV will now look this this (don’t worry about the 00:00 times):
(In the case above, a non-jurisdiction specific service was selected.)
(Note: You need to do this to get the full results of the new reconciliation.)
Pull data from OpenCorporates into OpenRefine
Now that you have your CSV list of companies matched to a company in the OpenCorporates database, you can pull in more data to give you a richer data set for your investigation. To do this:
In the ‘Add column based on column Name’ box that pops up, put this into the Expression text box: ‘http://opencorporates.com’+cell.recon.match.id, and give your new column a name:
Click ‘OK’.
You should now see these columns:
The OpenCorporates URLs are also a unique ID for each company, although they have been obscured in the image above.
In the ‘Add column by fetching URLs based on column Name’ box that pops up, put this into the Expression text box: ‘http://api.opencorporates.com’ + cell.recon.match.id + ‘?api_token=XXXX’, give your new column a name and enter 200 into the ‘Throttle delay’ box so that you don’t overload the server:
(The new column has been named ‘OpenCorporates_JSON’ as JSON is used to pull the data out, but you can call it what you want.)
Click ‘OK’. It may take a little time for all the data to come down, especially if you have a long list.
You should end up with something that looks like this:
Parse the Fields that you Want
The data above is not usable and needs to be parsed into columns.
In the above image you can see: name, company_number, jurisdiction_code.
It takes work to learn to use the OpenCorporates API, but it’s worth it – you can save yourself days of manual labour. If you’re going to be researching companies at scale, learn to use OpenRefine. It may be useful for other investigations — see the list of reconciliation services that allow for data to be brought into OpenRefine (e.g. OpenSanctions).
This guide is built on the video ‘CIJ Summer Workshops: OpenCorporates API for Beginners’ on this page.
The author would like to confirm that they have no financial or legal interest in OpenCorporates, nor prior personal relationship with its founders and employees. This article has been undertaken strictly from the perspective of an open source researcher.
Bellingcat is a non-profit and the ability to carry out our work is dependent on the kind support of individual donors. If you would like to support our work, you can do so here. You can also subscribe to our Patreon channel here. Subscribe to our Newsletter and follow us on Instagram here, Twitter here and Mastodon here.