Mapping the people and citations in UK policy

Introduction

Where in the UK is academic engagement with policy coming from?

Mark Geddes' excellent paper "Committee Hearings of the UK Parliament: Who gives evidence and does this matter?" (OA copy here) looks at who gives evidence to House of Commons committees and their geographic distribution based on a database of witnesses from the 2013-2014 parliamentary session.

It shows a clear preference for Russell Group universities (accounting for 75% of witnesses) and for universities based close to parliament, with 37% of academic witnesses coming from London and a further 22% from the south of England.

Might this hold true more broadly? In this interactive lab we'll use the citation and people mention data from Overton to check.

Use the tool below to select different sources of UK policy and see where the academics being cited or mentioned by that source are located.

Please note that this kind of data can only ever be a rough indicator - the data recipe below lists some of its limitations and potential biases. You can cite this resource as:

Adie, Euan (2020): Mapping the people and citations in UK policy. figshare. Online resource. https://doi.org/10.6084/m9.figshare.12697850.v1

Choose a policy source



Results from 536,082 matching citations & mentions and 225 universities

Most frequently seen regions (NUTS 1)

NUTS 1 Region As %
UKI London 26.0%
UKJ South East (England) 12.7%
UKM Scotland 10.4%
UKE Yorkshire and The Humber 8.8%
UKD North West (England) 7.6%
UKH East of England 7.4%
UKK South West (England) 6.3%
UKG West Midlands (England) 5.5%
UKF East Midlands (England) 5.3%
UKL Wales 4.6%
UKC North East (England) 3.8%
UKN Northern Ireland 1.6%

Most frequently seen regions (NUTS 2)

NUTS 2 Region As %
UKI3 Inner London - West 26.0%
UKJ1 Berkshire, Buckinghamshire and Oxfordshire 7.8%
UKH1 East Anglia 6.2%
UKM7 Eastern Scotland 5.3%
UKK1 Gloucestershire, Wiltshire and Bath/Bristol area 4.5%
UKD3 Greater Manchester 4.1%
UKE3 South Yorkshire 3.1%
UKF1 Derbyshire and Nottinghamshire 3.0%
UKM8 West Central Scotland 3.0%
UKG3 West Midlands 2.9%
UKE4 West Yorkshire 2.8%
UKL2 East Wales 2.6%
UKE2 North Yorkshire 2.4%
UKC2 Northumberland and Tyne and Wear 2.4%
UKD7 Merseyside 2.4%
UKJ3 Hampshire and Isle of Wight 2.3%
UKF2 Leicestershire, Rutland and Northamptonshire 2.2%
UKM5 North Eastern Scotland 2.1%
UKJ2 Surrey, East and West Sussex 1.9%
UKL1 West Wales 1.9%
UKG1 Herefordshire, Worcestershire and Warwickshire 1.8%
UKK4 Devon 1.6%
UKN0 Northern Ireland 1.6%
UKC1 Tees Valley and Durham 1.4%
UKD4 Lancashire 1.0%

University groupings

Grouping As %
Russell Group 60.6%
Other 39.4%

Gender

Gender determination by first name (please see notes) As %
Female 32.4%
Male 41.4%
Could not be determined 26.2%

Most frequently seen universities

University Count As %
University of Oxford 30,504 5.7%
University College London 29,784 5.6%
King's College London 26,388 4.9%
University of Cambridge 21,403 4.0%
Imperial College London 19,827 3.7%
University of Bristol 17,407 3.2%
University of Manchester 16,397 3.1%
University of Nottingham 13,924 2.6%
London School of Hygiene & Tropical Medicine 13,170 2.5%
University of Sheffield 12,445 2.3%
University of Birmingham 12,433 2.3%
University of Edinburgh 12,235 2.3%
University of Leeds 10,904 2.0%
Cardiff University 10,807 2.0%
University of Glasgow 10,634 2.0%
University of York 10,582 2.0%
Newcastle University 10,524 2.0%
University of Southampton 10,250 1.9%
University of Liverpool 9,898 1.8%
University of Aberdeen 9,724 1.8%
Queen Mary University of London 8,324 1.6%
University of Leicester 7,592 1.4%
University of East Anglia 7,319 1.4%
University of Exeter 7,295 1.4%
University of Warwick 6,641 1.2%

The data - recipe, limitations & biases

Overton is a large database and citation index of policy documents collected from government, IGOs and think tanks worldwide.

We're a small start-up supported entirely by customers and collaborators. We're not externally funded which gives us the freedom to experiment with a mix of commercial and non-profit data models: please get in touch if you'd like to use this or similar data in your own research, academic or otherwise.

Step Limitations and biases

We queried the Overton database for relevant documents and retrieved approximately 202k matches from UK government sources, ranging from documents on GOV.UK to committee reports from parliament, Hansard, and clinical guidelines from NICE

There's no time period constraint

The Geddes study linked at the top of this page looked specifically at 2013-2014. We're looking at all of the policy documents available to us in Overton, which are typically but not always from 2015 onwards

It's only public documents

Overton only knows about publicly available documents: it can't see internal Civil Service documents or when interactions haven't been recorded publicly

Some local policy sources aren't tracked

Overton tracks policy documents at the UK and devolved nations level, and from large city councils - London, Greater Manchester, Edinburgh, Leeds, Liverpool etc. - but not from smaller cities, so some local interactions will be missed

Citations of academic books and papers in each document were fetched using the Overton API The humanities and some social sciences will be underrepresented

Overton works best for citations with DOIs, and many books and older papers especially in HSS don't have these: they may not be picked up and so won't be counted

We're looking at citations of scholarly work

Overton tracks research from think tanks and NGOs too, but in this dataset we're only looking at work in the scholarly record. That means we're ignoring any engagement academics might have through e.g. publishing a report via a think tank or foundation

The affiliations of any UK authors of those books and papers were mapped to GRiD (a standard identifier for research producing institutions) We don't have good affiliation data for every academic

Affiliation data comes from Microsoft Academic and while coverage is good it is not complete. Some academics won't be counted and there's no obvious pattern to this that we can compensate for

Only educational institutions are covered

We're only including data from UK based institutions in GRiD classed as 'Education'. GRiD treats healthcare facilities separately so experts from e.g. university affiliated teaching hospitals are not counted

Mentions of UK academics in government sources were fetched using the Overton API Academics whose work hasn't been cited but who have been mentioned won't be counted

Overton only knows about academics who have been cited at least once somewhere in the policy literature, so the set is biased towards people whose work has already made it into policy

It's not just witnesses

Overton can't robustly classify mention types: it can't tell if somebody is mentioned because they have given evidence, are being quoted, because they were commissioned to write a report or they just attended a workshop or discussion

The affiliations of these academics were also mapped to GRiD GRiD matching is imperfect

Some policy documents use informal names or acronyms in affiliations e.g. Edinburgh University instead of University of Edinburgh. GRiD includes many name variants but not all, so some affiliation strings may not be mapped correctly, especially for smaller institutions

The first names of mentioned and cited academics were used to guess their gender using the genderize.io web service

Determining gender from first names poses ethical and technical challenges

See Stacy Konkiel's post on the Bibliomagician blog for a fuller overview of why

We're counting people mentioned or cited at least once in policy, rather than gendering each appearance: an academic may be mentioned in three different documents but her name would only be counted once

First names that were just initials - common in Overton's citation metadata - or fewer than three characters long were automatically marked "could not be determined" and we used 85% as the cutoff probability score for the guessed gender from genderize.io, following the methodology in Elsevier's Gender in the Global Research Landscape report

GRiD includes NUTS Level 3 (a set of standard identifiers for different geographical regions within Europe) and these were then used to total up counts by region and to draw the map GRiD has mapped some universities to incorrect (but geographically adjacient) NUTS3 codes

Thanks to @carlbaker on Twitter for pointing this out. An example is the University of East Anglia, which is placed in South Norfolk by GRiD but should actually be in Norwich and East Norfolk. We found about this after doing the data analysis, so it hasn't yet been fixed.

We had to manually tweak some regions

NUTS has been updated recently and a few regions have been added, removed or changed, mostly in Scotland and Northern Ireland. We've tried to map old NUTS codes to the new ones as best we can

Our map uses the December 2019 boundary data file from the Office for National Statistics