Methodology

How the Subnational Corruption Index is built.

A short, on-site walkthrough of the construction of the SCI - the source surveys, the harmonization, the regional matching, the imputation, and the validation. The full data descriptor in Nature Scientific Data remains the canonical reference.

The 0-100 scale

All CorruptionRadar indices report on a 0-100 scale where 0 is the largest level of corruption and 100 is the lowest level. The orientation matches Transparency International's Corruption Perceptions Index and the World Bank's Control of Corruption Index, meaning higher scores mean cleaner governance, in line with international convention.

Question wording differs across the source instruments. We harmonize by mapping each question to a common dimension framework (19 dimensions in total) and rescaling responses to a common 0-100 metric before aggregation.

Grand corruption (SGCI)

Grand corruption is measured from perception questions about high-level public-office abuse: judges, parliamentarians, the executive, the public sector at large. Source surveys include the Global Corruption Barometer, Afrobarometer, Latinobarometro, the European Quality of Government Index, the Asian Barometer, and a range of national household surveys.

Petty corruption (SPCI)

Petty corruption is measured from experience questions: did a household member actually pay a bribe to access health, education, police, permits, or utility services in the past 12 months? These are direct, observable behaviours, not perceptions, and form a complementary lens to the SGCI.

From individual responses to a regional index

  1. Standardization. Every survey question used is mapped to one of 19 harmonized corruption dimensions, recoded onto the 0-100 scale, and assigned to a sub-national region using the Global Data Lab's regional definitions where available.
  2. Aggregation to area-years. Individual-level responses are aggregated to region-by-year cells using survey weights when provided. The Baseline Dataset SCI contains 6,701 such area-years from 804 standardized surveys.
  3. Imputation. Where the SPCI is observed but the SGCI is missing (or vice versa), we exploit the within-area co-movement of the two components to predict the missing value, producing the predicted variants petty2, grand2, and SCI2.
  4. Inter- and extrapolation. The Comprehensive Dataset SCI fills out the panel from 1995 to 2022 using methods documented in the Stata syntax shipped alongside the dataset, producing 4,930 country-years suitable for descriptive comparisons.
  5. Anchoring to existing indices. The SUB-CPI and SUB-CCI datasets attach the SCI's sub-national variation around the World Bank's Control of Corruption (CCI, 1995-2022) and Transparency International's CPI (2012-2022), making it straightforward to combine SCI sub-national resolution with established cross-country measurement.

Source surveys

The current release (v1.0) draws on the following source instruments:

  • Afrobarometer (rounds 1 to 8)
  • Arab Barometer
  • Asian Barometer
  • Eurobarometer
  • European Quality of Government Index (QoG)
  • International Social Survey Programme (ISSP)
  • Latinobarometro
  • LAPOP - AmericasBarometer
  • The Asia Foundation surveys
  • Transparency International - Global Corruption Barometer
  • World Bank Country Group Surveys
  • World Bank Enterprise Surveys
  • World Values Survey

The next release (v1.1, in development) extends coverage to the Transparency International regional barometers (EU 2021, ECA 2016, LAC 2017-2019, MENA 2016-2019, Asia 2020), Afrobarometer rounds 3 & 9, additional LAPOP waves, and the 2019 Local Governance Performance Index (LGPI).

Validation

Construct validity is assessed by comparing the SCI against established indices at the country level (the CPI and CCI). Convergent validity is high: the SCI captures the cross-country variation of established indices while adding the within-country sub-national structure they lack. We do not benchmark against external sub-national governance proxies because no such measures are currently available in panel form.

Limitations & transparency

The SCI inherits the limitations of its source surveys: question wording differs, social-desirability bias affects experience questions in some contexts, and survey coverage is uneven across years and regions. Imputation reduces but does not eliminate these gaps. Every step of the construction is documented in the data descriptor and the accompanying Stata do-files, so users can replicate, audit, or modify the pipeline for their own use.