The Pros and Cons of Open Data
Open data is an increasingly important topic in MERL. Many MERL practitioners advocate for open data given the benefits of sharing data that others can use to analyze, reanalyze and draw new and beneficial conclusions. However, making data open does not come without risks and could result in unintended consequences. The following guide outlines some of the pros and cons of open data and things to consider when making your data open. For summary of the key points highlighted in the article, see Table 1.
Table 1: Summary of key points
|Accessibility of data: increased community engagement, improved efficiency and reduced cost, encourages progress and innovation||Incorrect use of data and the problem of missing information|
|Increased transparency||Privacy and consent|
|Reduced corruption||Mosaic effect|
|Interpretation of data||Costs and sustainability of open data projects|
Accessibility of data
One of the overarching benefits of open data is accessibility within a thematic area or sector. Data collection and cleaning can be expensive, and many projects or organizations are limited in their capacity. Making data open increases the number of datasets available for others to analyze and draw conclusions. This can result in:
Increased community engagement:
Open data has the potential to build a community around the data; bringing people together who are working on similar issues who can exchange ideas, findings and discuss challenges. This can encourage data collaboration rather than competitiveness. Both users and creators of open data have formed communities around the dataset or topic of interest, and these two groups are not mutually exclusive.
- For example, YouthMappers is a network of 260 student-led chapters, each of which leads mapathons, during which participants come together to contribute to a common mapping project on OpenStreetMap. Mappers trace buildings, roads and other infrastructure from satellite images in areas at risk of natural disasters. The data is made available on OSM and is used by humanitarian organizations and on-the-ground groups use the data to create maps and add important local, contextual information to those maps. Open data platforms like OSM and the communities they create through MissingMaps encourage the public to participate in a project aimed to improve the lives of others around the globe.
Improved efficiencies and reduced costs:
Access to open data increases the rate and ease of discovery, thus enabling researchers to have more resources to fortify their work across disciplines. Open data can be used to enhance data that is already at the disposal of organisations and companies of all sizes. Small companies can particularly benefit from open data that is in an industry in which they would like to expand. Open data can also reduce the chance of duplication in data collection efforts, thus saving time and money for organizations.
- One widely known source of demographic information is Census data, which is accessible and freely available in the United States by visiting data.census.gov. Individuals and organizations can easily tap into this data source in order to save resources otherwise used for primary data collection. For example, one could use U.S. Census data instead of designing and implementing their own household survey in the United States. Census data can be used as a baseline data for programs as part of Monitoring & Evaluation, reducing costs for both the programme stakeholders and the donor.
Progress and innovation:
Because open data is offered without a monetary barrier, more people have access and can use new methods of analysis, which can further the field of study or contribute to programmatic advancements, encouraging innovation and progress.
- New York City maintains NYC Open Data, a repository of datasets created by various city offices and agencies. Site visitors can access datasets and projects that used the open data. Example projects are a map that shows zoning and building lot data and research on school bullying in NYC. These projects show the potential that open data has in producing important tools and research.
Open data can also lead to increased transparency for users around topics or issues that the data addresses. Since open data is freely and publicly available, it lowers the barrier for the general public (and specific stakeholders) to understand the topic or issue the data addresses. Having the data at hand also empowers stakeholders to act on the data, advocating for themselves and their community.
- For example, Mejora Tu Escuela is an open data platform created by the Mexican Institute for Competitiveness that shows information about individual schools’ performance. The goal of the platform is to “transformar la educación en México,” or transform education in Mexico. This data about school performance which previously was unavailable to families and parents, allows them to enroll their children in the schools best for them and to push schools and the government to improve education when their school is lacking in its performance.
Open data is an important element in the fight against corruption. It strengthens public integrity and accountability between policymakers, government, companies, and citizens through the use of evidence that is open data of either maladministration, governance gaps or blatant corruption. While a significant amount of important and useful government data remains inaccessible, there are examples of governments taking stances to support open data initiatives.
- The Brazilian Office of the Comptroller General created the Transparency Portal as a government tool that aimed to increase fiscal transparency of the Brazilian Federal Government. In this initiative, the Brazilian government published information such as federal-agency expenditures, the cost of elected officials to the government’s fiscal budget, and a list of companies banned from doing work for the government.
Interpretation of data
Open data allows additional individuals to analyze the data and interpret and validate the findings in numerous ways. A Mckinsey report on the benefits of open data stated that open data has three value levers namely: decision making, innovation and accountability. It also highlighted the fact that open data value levers benefit a wide range of stakeholders and that a single open-data initiative has the ability to empower governments, private sector as well as NGOs but derive different value depending on the use and the interpretation of the data.
- An example of different stakeholders using the same open dataset to achieve different results is that of a Singaporean initiative about residential energy consumption. They organized a “hackathon,” or a community meeting where researchers, sustainability experts, tech start-ups and developers came together to analyze the data and explore ways to create technological interventions to mitigate the impact of increasing energy use. With invitees being from different backgrounds but accessing the same open data, the ability to interpret the data from their own contexts contributed to the creation of apps that helped in decision-making and increasing accountability.
Cons / Risk Factors
Incorrect use of data and missing data
When using open data, proper consideration of data collection methods and metadata is paramount for accuracy. When these are misunderstood, erroneous conclusions may be drawn from data.
- For example, an article published by the Institute for Family Studies in 2019 highlighted the results of a study based on an American Time Use Survey (ATUS), an open data survey. The study claimed that childless single people are happier than married ones. However, this was based on a misunderstanding of how a survey conducted by the US Census classifies single people and contradicted results from other open data sources.
Privacy and Consent
- A 2020 study of consent provision options typically offered by large technology companies (Google, Amazon, Facebook, Apple and Microsoft) found several problems with the current frameworks for obtaining consent and concluded that they violated principles of fairness, accountability and transparency. Further, although a 2015 study found no significant differences in consent provision rates with or without open data policies (suggesting that publishing of open data by itself does not influence consent), consent must be care
The mosaic effect is a term used when discussing confidentiality. It is derived from the mosaic theory of intelligence gathering, in which disparate pieces of information become significant when combined with other types of information. Applied to data in the MERL sector, this occurs when multiple datasets are linked to reveal new information. Even if data is appropriately anonymized and efforts are made to remove personal identifiers, if there are multiple datasets containing similar or complementary information, it’s possible to determine identity based on the various data combined across the datasets such as gender, location, educational status etc. Resources are now available to help MERL practications think about how their data may contain certain linkages or risks that may require additional levels of security or anonymization. Figure 1 displays an example of how identity theft can occur when the mosaic effect takes place.
Figure 1: Mosaic Effect Example of Identity Theft
Costs and sustainability of open data projects
Open data has been described as a public good. While the data is offered for free, there is usually a huge cost for the organization implementing the open data initiative. According to recent literature, beginning costs of open data initiatives vary from €20,000 to €100,000 per organisation. Start up costs are also followed by adaptation costs, infrastructural costs, and maintenance/operational costs. Additionally, from an NGO/non-profit perspective, funding these open data projects is also reliant on being able to pitch the usefulness of open data to funders. There is a risk of funders’ priorities changing, which can harm the long-term sustainability of the open data project. Another risk is that if funders’ and users’ agendas don’t align, the open data project may end up not serving the needs of the people who actually use the data. All of these sustainability factors affect decision-making around open data initiatives and often end up proving to be insurmountable.