Mapping the people and citations in UK policy

Introduction

Where in the UK is academic engagement with policy coming from?

Mark Geddes' excellent paper "Committee Hearings of the UK Parliament: Who gives evidence and does this matter?" (OA copy here) looks at who gives evidence to House of Commons committees and their geographic distribution based on a database of witnesses from the 2013-2014 parliamentary session.

It shows a clear preference for Russell Group universities (accounting for 75% of witnesses) and for universities based close to parliament, with 37% of academic witnesses coming from London and a further 22% from the south of England.

Might this hold true more broadly? In this interactive lab we'll use the citation and people mention data from Overton to check.

Use the tool below to select different sources of UK policy and see where the academics being cited or mentioned by that source are located.

Please note that this kind of data can only ever be a rough indicator - the data recipe below lists some of its limitations and potential biases. You can cite this resource as:

Adie, Euan (2020): Mapping the people and citations in UK policy. figshare. Online resource. https://doi.org/10.6084/m9.figshare.12697850.v1

Choose a policy source



Results from 65,282 matching mentions and 151 universities

Most frequently seen regions (NUTS 1)

NUTS 1 Region As %
UKI London 21.3%
UKJ South East (England) 15.0%
UKM Scotland 12.7%
UKE Yorkshire and The Humber 12.0%
UKD North West (England) 7.7%
UKH East of England 7.7%
UKG West Midlands (England) 6.0%
UKK South West (England) 4.9%
UKF East Midlands (England) 4.1%
UKL Wales 3.5%
UKC North East (England) 3.1%
UKN Northern Ireland 1.9%

Most frequently seen regions (NUTS 2)

NUTS 2 Region As %
UKI3 Inner London - West 21.3%
UKJ1 Berkshire, Buckinghamshire and Oxfordshire 10.2%
UKM7 Eastern Scotland 7.2%
UKH1 East Anglia 6.3%
UKE3 South Yorkshire 5.0%
UKD3 Greater Manchester 4.4%
UKM8 West Central Scotland 3.7%
UKE2 North Yorkshire 3.7%
UKK1 Gloucestershire, Wiltshire and Bath/Bristol area 3.4%
UKE4 West Yorkshire 2.9%
UKG3 West Midlands 2.8%
UKG1 Herefordshire, Worcestershire and Warwickshire 2.5%
UKF1 Derbyshire and Nottinghamshire 2.3%
UKJ2 Surrey, East and West Sussex 2.1%
UKL2 East Wales 2.1%
UKD7 Merseyside 2.1%
UKJ3 Hampshire and Isle of Wight 2.0%
UKN0 Northern Ireland 1.9%
UKC2 Northumberland and Tyne and Wear 1.7%
UKF2 Leicestershire, Rutland and Northamptonshire 1.6%
UKM5 North Eastern Scotland 1.6%
UKC1 Tees Valley and Durham 1.5%
UKL1 West Wales 1.4%
UKK4 Devon 1.3%
UKD4 Lancashire 1.3%

University groupings

Grouping As %
Russell Group 70.0%
Other 30.0%

Gender

Gender determination by first name (please see notes) As %
Female 32.7%
Male 52.9%
Could not be determined 14.4%

Most frequently seen universities

University Count As %
University of Oxford 5,887 9.0%
University College London 4,258 6.5%
University of Cambridge 3,275 5.0%
London School of Economics and Political Science 3,056 4.7%
University of Sheffield 2,603 4.0%
University of Manchester 2,458 3.8%
University of York 2,415 3.7%
University of Edinburgh 2,381 3.6%
Imperial College London 2,362 3.6%
University of Birmingham 1,691 2.6%
University of Leeds 1,509 2.3%
University of Glasgow 1,463 2.2%
University of Bristol 1,421 2.2%
University of Warwick 1,356 2.1%
Cardiff University 1,290 2.0%
University of Nottingham 1,263 1.9%
University of Liverpool 1,189 1.8%
University of Southampton 1,089 1.7%
University of Aberdeen 956 1.5%
University of Stirling 940 1.4%
Durham University 898 1.4%
Queen's University Belfast 892 1.4%
Newcastle University 867 1.3%
University of Exeter 803 1.2%
University of East Anglia 803 1.2%

The data - recipe, limitations & biases

Overton is a large database and citation index of policy documents collected from government, IGOs and think tanks worldwide.

We're a small start-up supported entirely by customers and collaborators. We're not externally funded which gives us the freedom to experiment with a mix of commercial and non-profit data models: please get in touch if you'd like to use this or similar data in your own research, academic or otherwise.

Step Limitations and biases

We queried the Overton database for relevant documents and retrieved approximately 202k matches from UK government sources, ranging from documents on GOV.UK to committee reports from parliament, Hansard, and clinical guidelines from NICE

There's no time period constraint

The Geddes study linked at the top of this page looked specifically at 2013-2014. We're looking at all of the policy documents available to us in Overton, which are typically but not always from 2015 onwards

It's only public documents

Overton only knows about publicly available documents: it can't see internal Civil Service documents or when interactions haven't been recorded publicly

Some local policy sources aren't tracked

Overton tracks policy documents at the UK and devolved nations level, and from large city councils - London, Greater Manchester, Edinburgh, Leeds, Liverpool etc. - but not from smaller cities, so some local interactions will be missed

Citations of academic books and papers in each document were fetched using the Overton API The humanities and some social sciences will be underrepresented

Overton works best for citations with DOIs, and many books and older papers especially in HSS don't have these: they may not be picked up and so won't be counted

We're looking at citations of scholarly work

Overton tracks research from think tanks and NGOs too, but in this dataset we're only looking at work in the scholarly record. That means we're ignoring any engagement academics might have through e.g. publishing a report via a think tank or foundation

The affiliations of any UK authors of those books and papers were mapped to GRiD (a standard identifier for research producing institutions) We don't have good affiliation data for every academic

Affiliation data comes from Microsoft Academic and while coverage is good it is not complete. Some academics won't be counted and there's no obvious pattern to this that we can compensate for

Only educational institutions are covered

We're only including data from UK based institutions in GRiD classed as 'Education'. GRiD treats healthcare facilities separately so experts from e.g. university affiliated teaching hospitals are not counted

Mentions of UK academics in government sources were fetched using the Overton API Academics whose work hasn't been cited but who have been mentioned won't be counted

Overton only knows about academics who have been cited at least once somewhere in the policy literature, so the set is biased towards people whose work has already made it into policy

It's not just witnesses

Overton can't robustly classify mention types: it can't tell if somebody is mentioned because they have given evidence, are being quoted, because they were commissioned to write a report or they just attended a workshop or discussion

The affiliations of these academics were also mapped to GRiD GRiD matching is imperfect

Some policy documents use informal names or acronyms in affiliations e.g. Edinburgh University instead of University of Edinburgh. GRiD includes many name variants but not all, so some affiliation strings may not be mapped correctly, especially for smaller institutions

The first names of mentioned and cited academics were used to guess their gender using the genderize.io web service

Determining gender from first names poses ethical and technical challenges

See Stacy Konkiel's post on the Bibliomagician blog for a fuller overview of why

We're counting people mentioned or cited at least once in policy, rather than gendering each appearance: an academic may be mentioned in three different documents but her name would only be counted once

First names that were just initials - common in Overton's citation metadata - or fewer than three characters long were automatically marked "could not be determined" and we used 85% as the cutoff probability score for the guessed gender from genderize.io, following the methodology in Elsevier's Gender in the Global Research Landscape report

GRiD includes NUTS Level 3 (a set of standard identifiers for different geographical regions within Europe) and these were then used to total up counts by region and to draw the map GRiD has mapped some universities to incorrect (but geographically adjacient) NUTS3 codes

Thanks to @carlbaker on Twitter for pointing this out. An example is the University of East Anglia, which is placed in South Norfolk by GRiD but should actually be in Norwich and East Norfolk. We found about this after doing the data analysis, so it hasn't yet been fixed.

We had to manually tweak some regions

NUTS has been updated recently and a few regions have been added, removed or changed, mostly in Scotland and Northern Ireland. We've tried to map old NUTS codes to the new ones as best we can

Our map uses the December 2019 boundary data file from the Office for National Statistics