T2Melo2013 Labor Pooling and Agglomeration UK

doi:10.1111/j.1435-5957.2012.00462.
Testing for labour pooling as a source of agglomeration

economies: Evidence for labour markets in England
and Wales*
Patricia C. Melo1, Daniel J. Graham1
1
Centre for Transport Studies, Dept of Civil and Environmental Engineering, Imperial College London, London
SW7 2AZ, UK (e-mail: patricia.melo@imperial.ac.uk, d.j.graham@imperial.ac.uk)
Received: 2 July 2010 / Accepted: 12 July 2012
Abstract. This paper generates new evidence for England and Wales on the importance of
labour pooling as a source of agglomeration economies. Estimates of worker and firm produc-
tivity are obtained from longitudinal worker and firm micro-data and used to test the hypothesis
that denser labour markets increase the quality of the matching between employees and employ-
ers across labour markets. Our findings provide evidence supportive of a positive relationship
between the quality of the employee-employer matching and the economic size of labour
markets.
JEL classification: D24, J24, J31, R12
Key words: Agglomeration economies, labour pooling, matched worker-firm longitudinal

micro-data
1 Introduction
In a recent review of the magnitude and causes of agglomeration economies, Puga (2010)
discusses the contrast between the well established evidence on the magnitude of agglomeration
economies and the gap in the empirical understanding of its causes.1 Little is known about the
actual channels through which the spatial clustering of economic activities impacts on produc-
tivity. Without an appropriate understanding of these mechanisms, however, the success of
regional and urban policies may be limited and of uncertain outcome.
* This work was based on data from the Annual Survey of Hours and Earnings (ASHE) and the Annual Respondents
Database (ARD), produced by the Office for National Statistics (ONS) and supplied by the Secure Data Service at the
UK Data Archive. The data are Crown Copyright and reproduced with the permission of the controller of HMSO and
Queen’s Printer for Scotland. The use of the data in this work does not imply the endorsement of ONS or the Secure Data
Service at the UK Data Archive in relation to the interpretation or analysis of the data. This work uses research datasets
which may not exactly reproduce National Statistics aggregates.
1
For recent qualitative reviews of the empirical literature see Rosenthal and Strange (2004) and Puga (2010); for a
quantitative review of the literature see Melo et al. (2009) and de Groot et al. (2009).
© 2013 the author(s). Papers in Regional Science © 2013 RSAI. Published by Blackwell Publishing, 9600 Garsington Road,
Oxford OX4 2DQ, UK and 350 Main Street, Malden MA 02148, USA.
Papers in Regional Science, Volume •• Number •• •• 2013.

2 P.C. Melo, D.J. Graham
The most accepted drivers of increasing returns to the spatial concentration of economic
activity are known the Marshallian sources, after Marshall (1920), and include linkages between
intermediate and final goods suppliers, labour market pooling, and knowledge spillovers (e.g.,
Fujita et al. 1999; Fujita and Thisse 2002). Input-output linkages occur because firms benefit
from locating close to their suppliers and customers through savings on transport costs. Labour
market pooling externalities arise because larger and denser labour markets allow for greater job
specialization, a more efficient job search and matching between workers and firms, and, among
other possible factors, a greater incentive for workers to invest in their skills. Finally, knowledge
or human capital spillovers arise because spatially concentrated firms and/or workers can learn
from one another more easily than if they were spread out over space.
An alternative but related classification has been provided by Duranton and Puga (2004),
who group the causes of agglomeration economies into three mechanisms: sharing, matching
and learning. Sharing is beneficial because it allows for a reduction in costs of accessing and
using common facilities, being those suppliers, customers, or services. Matching refers to the
advantages derived both by firms and workers from being in proximity to each other. These
benefits materialize into the form of a higher probability of finding a productive job-worker
match more easily and at a lower cost. Finally, learning concerns the exchange of information
and knowledge between individuals (i.e., firms, workers), which promotes the emergence and
diffusion of innovation, both producing a positive impact on productivity.
The main distinction between the two frameworks above is that the Duranton and Puga
classification provides a description of the processes through which the sources of agglomera-
tion economies materialize, while the Marshallian sources can be better understood as a type of
agglomeration externality. Taking Marshall’s labour pooling as an example, both sharing and
matching mechanisms can be identified as processes underlying the realization of labour pooling
externalities. Labour market pooling effects can arise through the sharing of the benefits arising
from increased worker specialization in larger markets, through a better matching between
employers and employees in larger and denser labour markets, or/and through a greater ability
of firms to respond to idiosyncratic shocks in larger markets (Puga 2010).
In this paper we test for labour market pooling as a Marshallian source of agglomeration
economies by examining the matching mechanism discussed by Duranton and Puga (2004).
The hypothesis being tested is that denser labour markets improve the quality of employee-
employer matches, measured in terms of the correlation between worker and firm productivity.
The idea is that agglomeration makes it easier to match the right workers to right jobs,
improving the quality of the matching.2 As we discuss in Section 2, this is only one of the
channels through which it is possible to test for labour market pooling as a source of agglom-
eration economies.3
This paper generates new evidence for England and Wales on the importance of labour
market pooling as a driver of agglomeration economies. The empirical framework developed,
based on the matching of longitudinal employee and employer surveys, is particularly suitable
to test for the presence of a more productive matching between workers and firms in more
agglomerated areas. This hypothesis was studied before by Andersson et al. (2007) for two US
states and Mion and Naticchioni (2009) for Italy. Andersson et al. (2007) find a positive and
significant relationship between agglomeration (employment density) and spatial assortative
matching between firms and workers, whereas Mion and Naticchioni (2009) obtain a negative
association between agglomeration (employment density) and the quality of the matching
2
The term agglomeration economies is generally divided into both urbanization economies and localization econo-
mies (e.g., Rosenthal and Strange 2004). In this paper, we use the term agglomeration economies to mean urbanization
type agglomeration economies.
3
For a survey of the main theoretical models supporting a positive contribution of spatial agglomeration, through
labour market pooling, to worker productivity see Duranton and Puga (2004).

Labour market pooling: Matched worker-firm data 3
between firms and workers. We hope that our analysis can provide some insight on the contra-
dictory findings obtained by Andersson et al. (2007) and Mion and Naticchioni (2009).
Our study can be differentiated from those conducted by Andersson et al. (2007) and Mion
and Naticchioni (2009) in two main aspects. It makes use of what we believe to be a more
appropriate measure of firm productivity and uses statistical estimators that address the key
estimation issues in the empirical literature. Andersson et al. (2007) and Mion and Naticchioni
(2009) use measures of firm and worker productivity based on firm and worker fixed-effects
obtained from the estimation of wage models.4 Measures of firm productivity based on firm
fixed-effects obtained from wage regressions are likely to capture sources of firm heterogeneity
that are not related to agglomeration economies (e.g., effects from monopoly mark-up, degree
of unionization, firm wage policies) and hence may provide an incomplete and possibly erro-
neous measure of firm productivity. Our approach is to first obtain a proxy for worker and firm
productivity from the estimation of wage and production function models respectively, and then
relate the matched worker-firm productivity pairs to agglomeration economies through the
estimation of a matching model. Therefore, we argue that our measure of the employee-
employer matching provides an improvement on the measure previously used by Andersson
et al. (2007) and Mion and Naticchioni (2009). In addition, our analysis corrects for simultaneity
bias between agglomeration economies and the quality of the employee-employer matching in
the matching regression, an issue which was not addressed by Andersson et al. (2007) and Mion
and Naticchioni (2009).
We use longitudinal micro-data from the Annual Survey of Hours and Earnings (ASHE) and
the Annual Respondents Database (ARD) to obtain estimates of both employee and employer
productivity, which are then used to test for a positive relationship between agglomeration
economies and the quality of the matching between workers and firms as evidence of labour
market pooling effects across travel to work areas (TTWAs) in England and Wales.5 Our findings
indicate that the quality of the employee-employer matching increases with the size and the
density of labour markets.
The paper is organized as follows. In Section 2 we provide a review of previous evidence on
the importance of labour market pooling to agglomeration economies. Section 3 describes the
empirical methodology, while Section 4 describes the data used in the empirical analyses.
Section 5 reports and discusses the main results. Finally, Section 6 provides some concluding
remarks.
2 Previous research on labour pooling externalities
From our review of the empirical literature on the sources of agglomeration economies, we
found 11 papers that provide evidence on the role of labour market pooling as a driver of
agglomeration economies. Table 1 summarizes the empirical approach, hypothesis tested, and
main results obtained by previous studies examining the relationship between labour market
pooling and agglomeration economies.
We identified two main empirical approaches used in the empirical literature to test for
labour pooling externalities, shown in the second column of Table 1. The first main approach is
labelled as ‘productivity on sources’ and refers to studies that consider the relationship between
4
Besides firm fixed-effects, Mion and Naticchioni (2009) also use firm size to measure differences in productivity
across firms.
5
TTWAs are the best available approximations of self-contained labour markets, they are defined as regions where
the proportion of people who live (work) in the area is at least 75 per cent of the total number of people who work (live)
in the area. We use TTWAs as defined in 1998 using 1991 Census data. See Appendix Figure A1 for a map of TTWAs
in England and Wales.

4
Table 1. Previous empirical evidence on labour market pooling externalities
Author Empirical approach Labour market pooling Country
Description Effect
Kim et al. (2000) Agglomeration on sources Share of high, medium, low skilled labour Positive effect of high skilled and low skilled US
on concentration
Rosenthal and Strange (2001) Agglomeration on sources Managerial share of workers; share of workers Positive effect at all geography levels US
with Bachelor, Master and Doctorate; net

productivity
Rigby and Essletzbichler (2002) Productivity on sources Labour mix Positive effect for aggregate analysis, mixed US
evidence for individual industries.
Wheeler (2006) Productivity (wage) on sources Between-job wage growth Positive and strong effect of agglomeration on US
between-job wage growth
Andersson et al. (2007) Productivity on sources Matching regression for firms and worker Assortative matching increases productivity US
quality
Amiti and Cameron (2007) Productivity(wages) on sources Similarity of labour requirements (education) Positive effect Indonesia
across firms within region
Ellison et al. (2010) Agglomeration on sources Similarity of labour requirements (occupation) Positive effect US
across firms within region
Mion and Naticchioni (2009) Productivity (wage) on sources Wage equation to estimate matching between Positive assortative matching increases labour Italy
worker skills and firm effects productivity
Overman and Puga (2010) Agglomeration on sources Industry-level measure of idiosyncratic Positive effect UK
volatility experienced by establishments
Gabe and Abel (2012) Agglomeration on source Dissimilarity between an occupation’s Positive effect of specialized (similar) US
knowledge profile and the knowledge profile knowledge profile on occupational
of the average US occupation agglomeration (co-agglomeration)
Di Addario (2011) Search-matching model Labour market population Market size increases job seekers probability of Italy
finding a job
P.C. Melo, D.J. Graham
productivity (measured by firm output, worker productivity, or worker wage rates) and the
sources of agglomeration economies (e.g., Rigby and Essletzbichler 2002; Andersson et al.
2007; Mion and Naticchioni 2009). The second main approach is labelled as ‘agglomeration on
sources’ and includes the studies that test the relationship between the spatial concentration of
economic activity (measured with some index of geographic concentration) and the sources of
agglomeration economies (e.g., Rosenthal and Strange 2001; Overman and Puga 2010). Overall,
and in spite of the differences in the empirical approach, previous studies have found evidence
supporting a positive relationship between labour pooling and agglomeration economies.
Mainly due to reasons of data availability, some studies account for the different Mashallian
sources of agglomeration simultaneously (e.g., Kim et al. 2000; Rigby and Essletzbichler 2002;
Amiti and Cameron 2007; Ellison et al. 2010; Overman and Puga 2010), while other focus on
each source separately (e.g., Andersson et al. 2007; Mion and Naticchioni 2009).
Studies can also be distinguished in terms of the hypothesis used to test for a positive
relationship between labour pooling and agglomeration economies. The hypothesis formulated
usually falls under one of three alternatives: (i) using measures of labour force quality as
captured by educational levels and/or type of occupation (e.g., Kim et al. 2000; Rosenthal and
Strange 2001; Gabe and Abel 2012); (ii) using measures of the similarity of workers’ skills
across economic sectors (e.g., Rigby and Essletzbichler 2002; Amiti and Cameron 2007, Ellison
et al. 2010; Gabe and Abel 2012); and, to a lesser extent, (iii) a more efficient job search and
matching between workers and firms in more agglomerated areas (e.g., Andersson et al. 2007;
Mion and Naticchioni 2009).
One of the hypotheses used to test for labour pooling consists of testing for a positive
relationship between the share of skilled and higher education workers and spatial agglomera-
tion in industries employing specialized and skilled workers. This approach has two main
shortfalls: its limited ability to capture industry-specific specialized skills, and the fact that it
also captures agglomeration effects arising from knowledge spillovers. Rosenthal and Strange
(2001) find a positive effect of labour market pooling, represented by the share of management
workers and the share of workers holding a Bachelor, Master, or Doctorate degree, on the spatial
agglomeration of manufacturing industries. To measure the degree of spatial agglomeration of
manufacturing industries they use the Ellison and Glaeser (1997) index of spatial concentration.
Evidence by Kim et al. (2000) is less supportive of a positive contribution of labour pooling to
spatial agglomeration, which is found to be associated both with the share of high skill and low
skill workers. Gabe and Abel (2012) find that occupations with a specialized knowledge base
tend to agglomerate spatially.
Another hypothesis used to test for labour pooling effects is based on the idea that industries
that use workers with similar skills tend to locate close to each other in order to enjoy the
benefits from easy access to a labour force with relevant skills. Rigby and Essletzbichler (2002)
and Amiti and Cameron (2007) test this hypothesis by constructing an index of similarity
between workers’ observable skills, and find a positive effect from increased similarity of the
occupational distribution of the labour force on productivity. Ellison et al. (2010) examine the
relationship between inter-industry employment similarities and pairwise co-agglomeration of
manufacturing industries in the US. Gabe and Abel (2012) examine the role of specialized
knowledge contents on occupational agglomeration and find that occupations with similar
knowledge profiles and more specialized knowledge contents tend to co-agglomerate. Overman
and Puga (2010) test for labour pooling effects by examining the risk sharing hypothesis based
on the idea that firms benefit from co-locating in the same labour market in the presence of
idiosyncratic shocks.
Labour market pooling effects have also been examined using a job search and matching
framework. The hypothesis being tested is that more agglomerated labour markets increase
both the probability and the quality of job-worker matches. Inspired by job search and match-

ing models from labour economics, Andersson et al. (2007) focus on the importance of the
quality of the job-worker matching to explain urban productivity differentials in two US states
(California and Florida). They show that differences in the patterns of firm-worker matching
affect productivity differentials and that this favours urban areas, where the concentration of
high quality workers and firms is greater. Following a similar approach, Mion and Naticchioni
(2009) use a matched employer-employee data set to test for positive spatial assortative match-
ing as one of the drivers of agglomeration externalities. The results support a positive matching
between the quality of firms (measured by firm size and firm fixed-effects) and the quality of
workers (captured by worker fixed-effects), but the association between this positive assorta-
tive matching and the density of Italian Provinces was found to be negative. In a recent paper,
Di Addario (2011) uses a search-matching model to test for a positive effect of labour market
size (population) on the probability of job seekers finding a job. The paper did not however
examine the relationship between urban agglomeration and the quality of the employee-
employer matches.
We also find one study by Wheeler (2006) that tests for the role labour market pooling and
knowledge spillovers as reflected in between-job and within-job wage growth respectively.
The hypothesis tested is that greater between-job wage growth is a manifestation of a match-
ing process, whereas greater within-job wage growth is a manifestation of a learning mecha-
nism more related to the presence of knowledge spillovers. The results show that
agglomeration economies have a stronger impact on between-job wage growth, which is
indicative of a greater importance of Marshall’s labour market pooling effects relative to
knowledge spillover effects. However, because workers are likely to move jobs only when the
wage increase compensates the risks associated with leaving a secured job, there could be
positive bias in the results.
In this paper we test for a positive relationship between labour market pooling and agglom-
eration economies by focusing on the matching mechanism described by Duranton and Puga
(2004). The models discussed therein show that the improved matching between workers and
jobs results from increased chances of job seekers finding suitable matches and the increased
chances of those employee-employer matches being of higher quality. Testing the hypothesis of
a higher quality of the job-worker matches is difficult because it requires having matched
employee-employer data and the development of empirical measures of that match. The empiri-
cal analyses estimated in this paper are valuable at least for two reasons. First, they explore the
possibilities of matching worker and firm longitudinal datasets in the UK to test for a positive
relationship between agglomeration economies and labour pooling. Second, they improve the
measures of job-worker quality used by Andersson et al. (2007) and Mion and Naticchioni
(2009).
3 Methodology
3.1 Empirical models
The empirical approach adopted in this paper consists of two stages. The first stage involves
the estimation of wage and production functions to obtain estimates of worker and firm
productivity, based on worker and firm fixed-effects respectively. The second stage of the
empirical approach tests for a positive relationship between urban agglomeration economies
and the quality of the worker-firm matching. The dependent variable in the second stage
regression analysis consists of the coefficients of correlation between the worker and
firm fixed-effects obtained from the wage and production functions estimated in the first
stage.

3.1.1 Wage and production functions
The wage and production functions estimated are described below in Equation 1 and Equation
2 respectively.
ln Wit = α 0 + ∑ β k ln Xit ,k + δ t + λo + σ s + ηi + ε it (1)

k
where i identifies the worker, s refers to the industry, o denotes the occupational group, and t
specifies the year. The dependent variable is the logarithm of real net hourly earnings, that is, the
gross hourly earnings discounted of any overtime pay and adjusted for the price level using the
average earnings index (AEI) provided by the Office for National Statistics (ONS). The term Xit,k
denotes worker characteristics commonly considered in labour economics, including the work-
er’s age, age squared and gender. In addition to these worker level covariates, we also include
a measure local area educational attainment and average house prices (more details about the
explanatory variables are given in Section 4). Worker’s skills are captured by the time invariant
worker fixed-effects (hi). The terms dt, lo, and ss consist of a set of dummy variables that
account for time-specific factors, occupation-specific factors,6 and industry-specific factors7
respectively. eit is the residual error term assumed to be normally distributed while allowing for
heteroscedasticity and clustering on workers.
To represent firms’ productivity we obtain a measure of firm total factor productivity (TFP)
from the estimation of production functions.
ln Y ft = α 0 + ∑ β k ln X ft ,k + δ t + σ s + η f + ε ft (2)
k
where f identifies the firm, s refers to the industry, and t specifies the year. Firm output is
measured with gross output (Yft) and there are three input factors (Xft,k): labour, gross capital
expenditure-disposals, and materials. The terms dt and ss consist of a set of dummy variables
that account for time-specific factors and industry-specific factors respectively; hf denote firm
fixed-effects, and eft is the residual error term assumed to be normally distributed while allowing
for heteroscedasticity and clustering on firms.
The various sets of dummy variables included in the estimation of the wage and production
functions serve the purpose of controlling for sources of unobserved heterogeneity that can lead
to omitted variable bias and inconsistency of the model parameter estimates. In subsection 3.2 we
provide a discussion of the main issues concerning the estimation of Equation 1 and Equation 2.
3.1.2 Matching regression
To test for a positive relationship between the worker-firm matching and agglomeration econo-
mies, we examine the empirical relationship between labour markets (TTWAs) agglomeration
economies and the correlation between workers’ and firms’ productivity, as measured by the
6
Occupational groups are defined according to the Standard Occupational Classification (SOC). The SOC provides
a detailed characterization of the nature of the job performed by a given worker; it accounts for competences acquired
through non-school qualification, training and work experience. We include controls for one-digit SOC occupation
groups: Managers and senior officials; professional occupations; associate professional and technical occupations;
administrative and secretarial occupations; skilled trades occupations; personal service occupations; sales and customer
service occupations; process, plant and machine operatives: and elementary occupations.
7
We use two-digit Standard Industrial Classification (SIC) industry groups, based on the UK Standard Industrial
Classification of Economic Activities 2003. URL: http//www.statistics.gov.uk/methods_quality/sic/downloads/
UK_SIC_Vol1(2003).pdf

worker and firm fixed-effects obtained from Equations 1 and 2 respectively. To match workers
to their respective employers, we use the variable for the enterprise reference that is present in
both the ASHE and the ARD surveys (more details in Section 4). The matching regression is
described as follows:
corr (ηi , η f )rt = α 0 + ∑ β k ln Xrt ,k + δ t + ε rt (3)

k
where corr (hi, hf)rt is the average coefficient of correlation between worker and firm fixed-
effects for TTWA r at time t, and Xrt,k denotes a set of labour market characteristics (described
in Section 4) including measures of urban agglomeration economies (employment density), area
size, market potential and industrial specialization.
The estimation of the matching regression shown in Equation 3 allows us to test for a
positive effect of TTWA agglomeration economies (employment density) on the employee-
employer matching. Obtaining a positive and statistically significant coefficient for labour
market employment density provides evidence in favour of a positive role of labour market
pooling externalities as a source of agglomeration economies, manifested here through a more
efficient matching between employees and employers. The main issues concerning the estima-
tion of Equation 3 are discussed in subsection 3.2 below.
3.2 Estimation issues
3.2.1 Unobserved heterogeneity
Worker unobserved characteristics, such as worker’s skills and education, affect worker pro-
ductivity and earnings. Similarly, firm unobserved characteristics (e.g., differences in organiza-
tional structures, managerial competences, training schemes, etc.) can also affect firm
productivity. To account for these sources of heterogeneity we use the within groups, or
fixed-effects (FE), estimator to estimate the wage and production functions.
3.2.2 Simultaneity bias of input factors
This issue can arise from unobserved firm heterogenity that is correlated with input factors in the
production function and/or the fact that firms’ decision on input factor endowments depends on
expected output levels. The main approaches used to correct for this source of bias include the
dynamic generalized method of moments (GMM) estimators proposed by Arellano and Bond
(1991) and Arellano and Bover (1995) and Blundell and Bond (1998) (i.e., difference GMM and
system GMM respectively), and the control function approach developed by Olley and Pakes
(1996) and Levinsohn and Petrin (2003).8
3.2.3 Spatial sorting of workers and firms
This issue arises when workers and firms sort spatially based on their (unobserved) character-
istics. Nocke (2006) and Baldwin and Okubo (2006) show that more productive firms self select
8
The Olley and Pakes (1996) estimator uses investment as a proxy for unobserved time variant productivity shocks,
while the Levinsohn and Petrin (2003) approach uses intermediate inputs (e.g., electricity, materials) as a proxy variable
because they are likely to have fewer zero-value observations at the firm level than investment. In addition, Levinsohn
and Petrin also argue that intermediate inputs are likely to respond more smoothly to productivity shocks than
investment.

into larger areas. Wheeler (2001, 2006), Glaeser and Maré (2001), Yankow (2006), and Combes
et al. (2008, 2010) show that the spatial sorting of workers explains a great part of the differ-
ences in workers’ productivity. Not accounting for such characteristics could therefore result in
inconsistent and biased estimates of productivity. Using worker and firm fixed-effects can
overcome this problem.
3.2.4 Endogeneity of agglomeration economies
This issue results from reverse causality between productivity and agglomeration economies and
is well documented in the empirical literature (e.g., Graham et al. 2010). To address reverse
causality, we use an instrumental variables (IV) estimator, using long-lagged values (based on
nineteenth century population census) of the endogenous variable(s) as instruments. This type of
instruments has been widely used in the empirical literature (e.g., Ciccone and Hall 1996,
Combes et al. 2008, 2010). The case for instrument exogeneity rests on the idea that the urban
system in the nineteenth century can explain to some extent the distribution of present urban
densities, but does not explain the distribution of current productivity levels. Our instruments
consist of long lags of population data, as gathered in the 1831 and the 1851 censuses.
Unfortunately, data for population in the nineteenth century census is not available at the level
of TTWAs. The best geographical aggregation we can use is based on registration counties
(RC).9 To obtain long-lagged values of the agglomeration measures (described in Section 4) for
TTWAs we apportioned the population data in each RC to each TTWA using as weights the
share of the total land area of a given TTWA inside a given RC.
4 Data sources and variables
We use data from two main sources. The Annual Survey of Hours and Earnings (ASHE) is used
to estimate worker level wage equations, while the Annual Respondents Database (ARD) is used
to estimate firm level production functions. To ensure that the final ASHE and ARD datasets are
free from non-response, misreporting, and other issues we performed a series of cleaning and
treatment procedures, which are described in the Appendix. Table 2 and Table 3 provide some
Table 2. Summary statistics of explanatory variables – wage function
Variable Label Mean SD Min Max
Real net hourly wage (£) W 8.18 4.92 N/A N/A

Age Age 40.03 11.81 16 65
Age squared Age2 1,741.68 961.83 256 4,225
Full-timera Ft 1.24 0.43 1 2
Femaleb Female 0.48 0.50 0 1
Average house pricec AHP 175.314 74,974 38,371 874,844
Percentage of people aged 16–64 with NVQ level 4 or higherc NVQ4+ 25.92 8.19 5.6 82
Notes: N/A Subject to data disclosure. a 1 = full-timer, 2= part-timer. b 0 = male, 1=female. c The values shown in the
table are based on the 376 UA/LAD before being matched to the worker dataset.
9
The registration counties consist of statistical units that were used for the purpose of registering births, deaths, and
marriages. We use geographical boundaries as at 1851, which correspond to 88 divisions. The historic boundary shape
files for the English and Welsh registration counties and Scottish administrative counties are provided by the Great
Britain Historic GIS Project (Portsmouth University) and are available online from the Edina UKborders website:
http://edina.ac.uk/ukborders/.

Table 3. Summary statistics of explanatory variables – production function
Gross output (£) Go 28,375 324,570 N/A N/A

Labour (workers) Labour 210.34 2,194 <10 N/A
Capital (gross capital expenditure-disposals) (£) Capital 1,218 25,878 N/A N/A
Materials purchased (£) Materials 3,468 41,343 N/A N/A
Energy purchased (£) Energy 400.33 9,446 N/A N/A
Note: N/A Subject to data disclosure.
Table 4. Summary statistics of explanatory variables – matching function
Employment density Dens 230.85 212.98 5.56 1,409.37

Market potential MP 186.13 40.58 68.73 267.98
Hirschmann-Herfindhal index HHI 0.07 0.01 0.05 0.19
Area (km2) Area 731.31 414.60 93.91 2,964.98
basic descriptive statistics of the variables used in the estimation of the wage and production
functions respectively. Table 4 provides basic descriptive statistics of the variables used in the
estimation of the matching function.
The ASHE is an employee level longitudinal survey conducted by the ONS and contains a
rich set of information about workers, including: hourly earnings, hours worked, gender, age,
occupation, industry, whether the worker is part-time or full-time, whether earnings are affected
by absence, home and workplace location, etc. Unfortunately, there is no measure of worker’s
education. To account for differences in education levels we collected data from the ONS
Annual Population Survey (APS). The APS provides labour market data for (workplace) unitary
authorities (UA)/local authority districts (LAD) on national vocational qualification (NVQ)
levels. NVQ levels are work-based awards that reflect worker qualification beyond educational
level. There is a correspondence between each NVQ level and education degrees. We use the
percentage of people aged between 16–64 with NVQ level 4 or higher, which includes higher
education degrees.10
Another important factor that impacts on wage levels is local area living cost, which is also
not available from the ASHE. To account for differences in local living cost we obtained data
from the ONS housing statistics (based on Land Registry data) for mean house prices. However,
regional data were available for LAD in England and Wales only. As a result, we exclude
Scotland from our analyses.
The final dataset used in the estimation of the wage function – Equation 1 – consists of an
unbalanced panel of 190,151 individuals (corresponding to 539,533 observations) over the five
year period from 2002 to 2006. On average, we observe each worker 2.8 times.
The ARD is a longitudinal micro-level dataset with information on businesses in the UK.
The main variables covered in the ARD include turnover, output, employment, capital expen-
ditures, intermediate consumption, industry, owner nationality, acquisitions and disposals of
capital goods, etc. (for more details see Barnes 2002, and Barnes and Martin 2002). The final
dataset used to estimate the production function in Equation 2 consists of an unbalanced panel
of 143,913 reporting units, corresponding to 200,959 observations over the five year period from
2002 to 2006. On average, we observe each reporting unit 1.41 times. We use data at the level
10
For more details on the composition of the various NVQ levels consult the DirectGov website: www.direct.gov.uk/
en/educationAndlearning/QualificationsExplained/DG_10039017.

of the reporting units for gross output, labour (employment), net capital expenditure (gross
capital expenditure minus disposals of land, buildings, vehicles, machines, etc.), and materials
purchased.
5 Results and discussion
In this section we first consider the estimation of the wage and production functions respectively,
and then examine the matching regression estimated to test for a positive relationship between
urban agglomeration economies and labour pooling externalities. The results obtained from the
estimation of the wage and production functions are shown in Table 5 and Table 6 respectively.
Table 7 reports the results obtained from the matching regression.
For the reasons discussed in Section 3, we select the fixed-effects (FE) estimator as our
preferred model because it provides consistent model parameter estimates in the presence of
non-zero correlation between the error term and worker unobserved heterogeneity. The results
indicate that wage rates increase with worker’s age but the increase becomes smaller as workers
become older, suggesting a concave relationship between earnings and age. Full-time workers
have hourly wage rates on average about 3.4 per cent higher than part-timers, and female
workers receive on average nearly 12 per cent less than their male counterparts (POLS; pooled
ordinary least squares).
Looking at the effect of local area human capital, the models indicate that an increase of 1
percentage point in the percentage of population aged 16–64 holding a level 4 NVQ or higher
is associated with an increase of 0.09 per cent in hourly wage rates.11 We also observe that local
area living costs, measured by average house prices, impacts wages positively. The results
indicate that a 10 per cent increase on average house prices is associated with an increase of
Table 5. Results from the wage function
POLS FE
Age 0.0375*** 0.0441***

(0.0004) (0.0028)
Age2 –0.0004*** –0.0006***
(0.0000) (0.0000)
Full-timer –0.1009*** 0.0344***
(0.0018) (0.0023)
Female –0.1163***
(0.0019)
Log of average house price 0.1263*** 0.0395***
(0.0023) (0.0039)
Percentage of people with NVQ level 4 or higher 0.0042*** 0.0009***
(0.0001) (0.0001)
Observations 539,533 539,533
R-squared (Overall) 0.60 0.41
R-squared (Within) 0.08
R-squared (Between) 0.43
Notes: Robust standard errors corrected for clustering at the worker level (in parentheses). Significance levels: *** p-
value <0.01. Control variables include: two-digit industry fixed-effects; one-digit occupation fixed-effects; year fixed-
effects.
11
This effect is noticeably lower for the FE estimator because the worker fixed-effects capture most of the variation
in the measure of human capital, which changes very little over time.

Table 6. Results from the production function
POLS FE LP1 LP2
Log of labour 0.8041*** 0.4156*** 0.818*** 0.7045***

(0.0034) (0.0138) (0.0036) (0.0034)
Log of capital 0.0783*** 0.0132*** 0.0483*** 0.0340***
(0.0016) (0.0011) (0.0022) (0.0022)
Log of materials 0.2752*** 0.0493*** 0.0697***
(0.0023) (0.0025) (0.0020)
Returns to scale 1.16 0.48 0.87 0.81
Observations 200,959 200,959 200,959 200,959
R-squared (Overall) 0.88 0.84 0.90 0.91
R-squared (Within) 0.27
R-squared (Between) 0.81
Notes: Robust standard errors corrected for clustering at the firm level (in parentheses). Significance levels: *** p-value
<0.01. Control variables include: two-digit industry fixed-effects; year fixed-effects.
Table 7. Results from the matching function
POLS RE POLS-IV RE-IV
Log of employment density (Dens) 0.0628** 0.0713** 0.0877* 0.0844

(0.0262) (0.0347) (0.0526) (0.0641)
Log of market potential (MP) 0.0732 0.0959 0.0197 0.0706
(0.1164) (0.1420) (0.1109) (0.1405)
Hirschmann-Herfindhal Index (HHI) –0.1069 0.0194 –0.0998 –0.0181
(0.1940) (0.2285) (0.2185) (0.1982)
Log of area (km2) 0.0875** 0.0740* 0.0893** 0.0828*
(0.0356) (0.0397) (0.0380) (0.0462)
Breusch and Pagan Lagrangian 2.57 (0.109)
multiplier test for random effects
(H0: Var(ui) = 0)
First stage partial R-squared 0.25 0.37
First stage partial F-statistic 25.61 34.9
Under identification test – 23.44 (0.000) 154.55 (0.000)
Kleibergen-Paap rank LM statistic
(p-value)
Weak identification test – 25.61 87.77
Kleibergen-Paap rank Wald F
statistica
Hansen J statistic (p-value)b 0.36 (0.546)
Observations 542 542 542 542
R-squared (overall) 0.041 0.04 0.04 0.04
R-squared (within) 0.01 0.01
R-squared (between) 0.06 0.06
Notes: Robust standard errors corrected for clustering at the TTWA level (in parentheses). Significance levels: * p <
0.10, ** p < 0.05. Control variables include year fixed-effects.
a
Using Stock-Yogo (2005) weak ID test critical values. The test rejects the null of weak ID if the test statistic exceeds
the critical value. 5% maximal IV bias relative to OLS: 13.91. 10% maximal IV size: 19.93.
b
Instruments: log of employment density in 1831 and 1851.
nearly 0.4 per cent of hourly wages. To look for evidence of spatial sorting of workers based on
unobservable skills (e.g., Combes et al. 2008) we compared the distribution of worker fixed-
effects by labour market size. We discuss the results on worker spatial sorting in the Appendix.
We now turn to the results obtained from the estimation of the production function. As
discussed in Section 3, there are two main estimation issues – unobserved heterogeneity and

simultaneity bias of input factors. The fixed-effects estimator addresses the first issue, while the
Levinsohn and Petrin (2003) control function approach, combined with fixed-effects, can
address both estimation issues. Four models were estimated and their results are reported in
Table 6: pooled OLS (POLS), fixed-effects (FE), and the Levinsohn and Petrin (2003) control
function approach using either materials (LP1) or electricity purchases (LP2) as the proxy
variable for unobserved time variant productivity shocks.
The labour elasticity ranges between 0.42 (FE) and 0.82 (LP1), while the capital elasticity
ranges between 0.01 (FE) and 0.08 (POLS), and the materials elasticity ranges between 0.05
(FE) and 0.28 (POLS). The POLS estimates tend to be upward biased, while the FE estimates
tend to be downward biased. Moreover, neither the two estimators can address the issue of
simultaneity bias of the input factors. Models LP1 and LP2 are therefore preferred to the POLS
and the FE estimators. Overall, the results are similar between both of the Levinsohn and Petrin
(2003) models, although model LP1 does not estimate an input elasticity for intermediate
materials.
To measure the quality of the employee-employer match we compute coefficients of corre-
lation between the (matchable) worker and firm fixed-effects obtained from the FE wage model
and the LP1 production function model respectively, across labour markets in England and
Wales. The number of matched workers and firms is equal to 14,851 employee-employer
matches. We then compute coefficients of correlation at the level of labour markets (i.e.,
TTWAs). Besides the measure of urban agglomeration (i.e., employment density), we add other
covariates, as discussed in Section 4, to represent the degree of labour market relative industrial
specialization, market potential, and labour market area in the matching regression model. On
average we observe each TTWA 3.2 times over the five year period between 2002 and 2006.
Table 7 reports the results obtained from the estimation of the matching regression. We
considered four models: pooled OLS (POLS), random-effects (RE), pooled OLS combined with
instrumental variables (POLS-IV), and random effects combined with instrumental variables
(RE-IV). To evaluate whether we should use the pooled OLS or the random effects estimator we
use the Breush-Pagan Lagrangian multiplier test for random-effects based on the null hypothesis
that the variance across TTWAs is equal to zero. The test indicated that the pooled OLS
estimator (both the non-IV and IV models) should be selected.
The validity of the IV estimates depends on the exogeneity and relevance of the instruments
used, and can be evaluated through appropriate diagnostic tests and statistics. Starting with
instrument relevance, the first stage regression indicates that the instruments for TTWAs
employment density have considerable explanatory power. The first stage partial R-squared
statistic and F-statistic is 0.25 and 25.61 respectively for the POLS-IV, and 0.37 and 34.9
respectively for the RE-IV model. Moreover, the Kleibergen-Paap tests of under-identification
and weak identification also indicate that our instruments are not weak.12 To evaluate instrument
exogeneity, we use the Hansen’s J (1982) test of over-identifying conditions. The test fails to
reject the null hypothesis of instrument exogeneity and hence suggests that our instruments can
be treated as exogenous.
The results indicate that urban agglomeration economies have a positive and statistically
significant impact on the quality of the employee-employer matching. This result also provides
evidence in favour of a positive relationship between labour pooling and agglomeration econo-
mies. A 10 per cent increase in labour market employment density is found to be associated with
an increase of between 0.63 (POLS) and 0.88 (POLS-IV) percentage points in the average
correlation between worker and firm fixed-effects (our measure of the quality of the employee-
12
The Kleibergen-Paap rank LM statistic (2006) test of under-identification shows that we reject the null of
under-identification, implying that the instruments are relevant. In addition, the weak identification Kleibergen-Paap
rank Wald F statistic (2006) is higher than the Stock and Yogo (2005) critical value for maximal IV bias relative to OLS,
also suggesting that our instruments are not weak.

employer matching). Correcting for reverse causality between urban agglomeration and the
quality of the employee-employer matching affects the coefficient of employment density only
marginally. However, for the RE-IV model the effect is not statistically significant as the
standard error of the estimate is relatively larger. We choose the POLS-IV estimator as our
preferred model (Breush-Pagan test of random-effects indicates that the RE estimator does not
offer an improvement over the POLS estimator), with an estimate for the effect of labour market
agglomeration on the employee-employer matching equal to 0.088.
On the other hand, the estimates for the effect of market potential and industrial specializa-
tion are not statistically significant, while there is a positive effect of labour market area. One
possible explanation for the insignificant effect of market potential on the quality of the
employee-employer matching is that the geographic scale of our analysis, referring essentially
to interactions within labour markets (intra-TTWAs), is not appropriate to capture interactions
between different labour markets (inter-TTWAs), which are less related to labour pooling
effects.
This is in agreement with existing empirical evidence on the spatial scale of the productivity
effects from agglomeration economies (e.g., Rosenthal and Strange 2004), and in particular the
understanding that the effects associated with proximity of firms to thick labour markets are
generally thought to dominate within geographic ranges defined by the borders of local labour
markets (typically defined on the basis of daily commuting patterns). Rice et al. (2006) indicate
a value of 45 minutes driving time as the appropriate range for labour market effects.
Although we find a significant positive impact of labour market urban agglomeration on the
employee-employer matching, the average correlation between worker and firm fixed-effects is
rather low, about 0.10. This can be explained, at least partially, by two reasons. First, it can be
argued that individual worker fixed-effects provide only an incomplete measure of worker’s
human capital over time. Knowledge spillovers are likely to accumulate at faster rates in more
agglomerated areas, but our measure of worker skills cannot appropriately account for these
effects. Second, the spatial scale of knowledge spillovers is likely to be more localized in areas
considerably smaller than labour markets (e.g., Fu 2007).
Compared to previous evidence obtained by Andersson et al. (2007) and Mion and Natic-
chioni (2009), our results are in agreement to those found by Andersson et al. (2007) for the US
(California and Florida), while in disagreement with the results obtained by Mion and Natic-
chioni (2009) for Italy. Andersson et al. (2007) estimated a positive and significant coefficient
for the effect of density on the degree of spatial assortative matching between firms and workers:
the coefficient for (the logarithm of) employment density is found to be equal to 0.020 and 0.019
for California and Florida respectively. Our findings indicate a stronger impact of labour market
urban agglomeration on the quality of the employee-employer matching, between 0.06–0.09.
We believe that this difference can be partly explained by the fact that Andersson et al. (2007)
measured urban agglomeration at the level of US census tracks and counties, which are
administrative geographical units much smaller than the borders of labour markets. Moreover,
because labour markets (e.g., metropolitan statistical areas in the US) are geographical units
strongly integrated in economic terms we can expect labour market pooling effects measured at
this level to be stronger than if measured at the level of smaller and administrative geographical
units.
The wage and production function models discussed earlier used on average 2.8 and 1.41
observations per worker and firm, respectively, over the five year period between 2002 and 2006.
This degree of ‘unbalancedness’ may affect the results, particularly those based on fixed-effects
estimators. This is a concern particularly in the estimation of the production function, which
uses on average less than 2 observations per firm. As a result, we carried out robustness checks
on the unbalanced nature of the worker and firm panel datasets. In particular, we re-estimated the
production function, and matching regression, using a more balanced version of the dataset

using only those firms observed at least 2 years or more. The Appendix provides a detailed
description of the results obtained for the models re-estimated using a more balanced version of
the firm dataset.
Overall, the effect of urban agglomeration economies was found to be fairly consistent for
the matching models using non-IV estimators. The results indicate that urban agglomeration
economies have a positive and statistically significant impact on the quality of the employee-
employer matching. The POLS (RE) estimate is similar to that obtained in the matching model
based on the full sample of firms – 0.063 against 0.075 (0.071 against 0.075). However, when
we consider the IV estimators, the coefficients for labour market density are no longer statisti-
cally significant.
It is difficult to fully reconcile the two sets of results obtained from the matching model
estimated using the full and restricted sample of firms. If we take the non-IV regressions as our
preferred models, the effect of labour market density on the employee-employer matching is
positive and ranges between 0.063 and 0.075. If we take the IV regressions as our preferred
models, only the POLS-IV model based on the full sample of firms predicts a positive relation-
ship between urban density and the labour pooling effects manifested in a better quality
employee-employer matching.
Although our analyses make use of a unique matched worker and firm dataset, there are data
limitations which constrain our ability to empirically test the hypothesis of a better quality
employee-employer matching in more agglomerated labour markets. The main data limitation
results from the reduced number of employee-employer matches (14,851 when using the full
sample of firms, and 11,323 when using the more balanced panel of firms) obtained from the
matching of the worker (ASHE) and firm (ARD) longitudinal surveys. When considered at the
level of labour markets in England and Wales, the individual matches represent between 50–69
per cent of all the TTWAs when we use the full sample of firms and between 38–57 per cent
when we use the more balanced sample of firms. As a result, our analyses cannot provide a
complete picture of the matching mechanism across labour markets in England and Wales. To
improve the analysis we need to increase the number of individual employee-employer matches
so that we can provide a more complete representation of labour markets. To achieve this we
need to either search for alternative worker and/or firm longitudinal surveys, or attempt to
combine other surveys with the ones used in this analysis.
6 Conclusions
This paper generates new evidence for England and Wales on the importance of labour market
pooling as a source of agglomeration economies. The hypothesis being tested is that denser
labour markets facilitate a more productive matching between workers and firms. We use
worker and firm longitudinal micro-data to obtain estimates of worker and firm productivity,
which are then used to test for a positive relationship between the quality of the employee-
employer matching and agglomeration economies across Travel to Work Areas in England and
Wales.
Compared to previous studies examining the relationship between labour pooling and
agglomeration economies through a matching mechanism between workers and firms (Ander-
sson et al. 2007; Mion and Naticchioni 2009), our analysis makes a contribution to existing
research by using a more appropriate measure of firm productivity and statistical estimators that
address some of the main estimation issues in the empirical literature. Our findings are in
agreement with those obtained by Andersson et al. (2007) for two US states and indicate that
there is a positive relationship between labour market density and the quality of the employee-
employer matching. This provides evidence in favour of labour market pooling as a Marshallian

source of agglomeration economies, and of matching as one of the channels through which these
advantages arise.
Although the availability of matched worker-firm longitudinal data offers a unique oppor-
tunity to test for labour pooling externalities through a employee-employer matching mecha-
nism, the results can be influenced by data quality issues. One limitation of our analysis results
from the use of unbalanced longitudinal micro-data, which constrains our ability to provide a
comprehensive test of agglomeration economies effects on the efficiency of the employee-
employer matching across all labour market areas in England and Wales. To evaluate the degree
to which the unbalanced structure of the data affects the results we conducted robustness checks
by re-estimating the empirical models using a more balanced version of the data. This in turn
further reduces the number of labour market areas included in the matching regression and the
ability to test for urban agglomeration effects on the efficiency of the employee-employer
matching.
Appendix
The ASHE is an employee level longitudinal survey conducted by the ONS, which replaced the
New Earnings Survey (NES) in October of 2004. The main changes in ASHE consist of
improvements in the coverage of the survey, the use of a better designed questionnaire, and the
use of imputation methods to deal with non-response and weighting of responses to reflect the
population. The NES was mainly based on a 1 per cent sample of national insurance numbers
(NINo) of employees on the Inland Revenue (IR) pay-as-you-earn (PAYE) register in February,
whereas the ASHE also uses other supplementary samples with the purpose of improving the
coverage of the survey and reducing unit non-responses.13 The questionnaires are completed by
the employers, which helps reduce the incidence of both misreported and underreported records.
Additional samples include the IR PAYE register in April and the Inter Departmental
Business Register (IDBR), which includes businesses that are not registered for PAYE but are
registered for VAT. These two additional samples allow for the inclusion of employees that
either entered the job market or moved jobs between the time of selection (February) and the
date of the survey (April), as well as employees that earn below the PAYE limit (Bird 2004).14
In order to be able to use the data we performed a series of cleaning and treatment
procedures, which we describe below.
Missing values and reporting errors
There were some variables in the survey with missing values and coding errors. We deleted all
the observations for which wage rates were negative or zero, restricted the sample to employees
aged between 16 and 65 years, and deleted the observations for which the values in the variables
main job, double job (i.e., second job), and same job were not consistent over time and across
workers. In addition, missing values and reporting mistakes also rendered some potentially
relevant variables unusable. This was the case for date of start, which could have been used to
build a measure for work tenure, but unfortunately had many missing values and reporting
errors.
13
The sample is based on NINo ending with a specific pair of digits generated by a 1 in 100 random sample of all
jobs registered in the PAYE scheme, so each employee has an equal probability of being selected.
14
The data coming from the VAT-only sample is obtained according to a random sample of registered businesses. The
key difference to the IR PAYE scheme is that all employees in a selected business are included in the sample.

Outliers
The original dataset contained a number of outliers with hourly wages unrealistically high or
low. Such outliers can result from reporting mistakes in the weekly wages or reporting errors in
the number of hours worked. To minimize the impact of outliers on the empirical analysis we
excluded observations for workers paid below the national minimum wage rate (which is
approximately equivalent to percentile 1 of hourly wages) and highest 1 per cent of hourly
wages for every year.15
Loss of pay
To minimize potential biases from external factors influencing average hourly earnings, we also
excluded the observations referring to employees whose earnings were affected by loss of pay
due to absence.
Junior/trainee rates
We excluded the records referring to workers paid at junior or trainee rates because this refers
mainly to temporary workers whose behaviour is likely to differ from permanent full-time
workers.
Excluded industries
To avoid issues of comparability and productivity differentials related to the non-competitive

nature of some industries we restricted the sample to workers in the private sector (over
two-thirds of the sample) that do not work in the following industries:16
• Public Administration and Defence; Compulsory Social Security (SIC 75);

• Sewage and Refuse Disposal, Sanitation and Similar Activities (SIC 90);
• Activities of Membership Organizations Not Elsewhere Classified (SIC 91);
• Activities of Households as Employers of Domestic Staff (SIC 95);
• Undifferentiated Goods Producing Activities of Private Households for Own Use (SIC 96);
• Undifferentiated Services Producing Activities of Private Households for Own Use (SIC 97);
and
• Extra-territorial Organization and Bodies (SIC 99).
TheARD is a longitudinal micro-level dataset with information on businesses in the UK. From
1998 onwards, the dataset is constructed from the Annual Business Inquiry (ABI) survey that
combines data from previous surveys, namely, the Annual Census of Production, the Annual
Census of Construction, and some service sectors (wholesale, retail, motor trades, catering,
property, and service trades, excluding public services, defence and agriculture).
The ABI samples businesses from the IDBR, which gathers the addresses of businesses in
the UK using information from various sources. The IDBR covers businesses from the VAT-
15
To obtain the values of the National Minimum Wage for the years between 2002 and 2006 see the link:
http://www.lowpay.gov.uk/. We excluded workers with hourly wage rates below £4.01/£4.39/£4.55/£5.57/£6.10 and
above £52/£57/£59/£66/£70, for 2002/2003/2004/2005/2006 respectively.
16
A similar approach is adopted by Combes et al. (2008).

Fig. A1. Distribution (quintiles) of employment density across TTWAs in England and Wales (2006)
based register and the PAYE-based register, thus providing a more comprehensive coverage of
the businesses included in the survey. The ABI questionnaires are sent out to all ‘reporting units’
above a certain employment threshold (250 workers), whereas smaller units are sampled accord-
ing to a size-region-industry criterion. Reporting units are a combination of ‘local units’ (plants)
and are the main unit in the ABI. They can comprise one or more plants and have an own unique
identification number, an ‘enterprise’ and ‘enterprise group’ identification number, and the
identification numbers of the local units it comprises.
Figure A1 shows the distribution (quintiles) of employment density across TTWAs in
England and Wales (2006). Figure A2 shows the distribution of worker fixed-effects by labour
market size, defined relative to average TTWA employment density into high-density (HD) and
low-density (LD) TTWAs. High-density TTWAs include all TTWAs with employment density
higher than the average employment density and vice versa. The figure suggests that there is a
skewed distribution of worker’s unobservable skills, unveiling a tendency of higher skilled
workers to work in denser labour markets. Part of this pattern may be explained by worker
educational attainment (which is not recorded in ASHE) and the fact that more educated workers
tend to be found in more agglomerated areas where average education levels are also higher. The
fact that we include a measure of local area human capital in the wage model should however
reduce the scope for confoundedness between worker’s unobserved skills and education levels.

1.5
1
Density
0.5
0
–1 0 1 2
worker fixed-effects
High-density (HD) TTWAs Low-density (LD) TTWAs
Fig. A2. Distribution of worker fixed-effects between HD and LD TTWAs
Table A1. Results from the production function using the restricted sample of firms
Estimator POLS FE LP1 LP2
Log of labour 0.7333*** 0.4156*** 0.6693*** 0.5743***

(0.0054) (0.0139) (0.0049) (0.0062)
Log of capital 0.0925*** 0.0132*** 0.0777*** 0.0552***
(0.0024) (0.0011) (0.0025) (0.0023)
Log of materials 0.2181*** 0.0493*** 0.0589***
(0.0036) (0.0025) (0.0032)
Returns to scale 1.04 0.48 0.75 0.69
Observations 91,717 91,717 91,717 91,717
R-squared (within) 0.27
R-squared (between) 0.74
Notes: Robust standard errors corrected for clustering at the firm level (in parentheses). Significance levels: *** p-value
<0.01. Control variables include: two-digit industry fixed-effects; year fixed-effects.
Furthermore, the unbalanced structure of the worker dataset also limits our ability to investigate
the urban wage premium associated with migration between low-density and high-density work
TTWAs.17
Testing for potential biases
To test for potential biases we re-estimated the production function model using a more balanced
version of the dataset, based on the firms observed at least 2 years or more. This sample is
composed of 34,671 firms observed on average 2.7 times over the five year period between 2002
and 2006. This panel data structure is now much closer to the data structure used in the wage
model, using on average 2.8 observations per worker. Table A1 reports the results for the same
models shown in Table 6. Similarly to the findings reported for the full sample of firms, the
POLS estimator produces the higher estimates for the input elasticities, while the FE estimator
produces the lower estimates.
17
Also note that we cannot reproduce this analysis for firms because the estimation of production function was not
carried out at the plant level, but instead at the firm level. This means we cannot match firm productivity with a specific
TTWA other than through its link to a given worker. We are therefore making the assumption that firm fixed-effects are
constant across all the plants that belong to a certain firm (this is, organizational and management effects at the level of
the firm are equally distributed across plants).

Table A2. Results from the matching function based on the restricted sample of firms
OLS RE POLS-IV RE-IV
Log of employment density (Dens) 0.0747*** 0.0751** 0.0263 0.0270

(0.0296) (0.0386) (0.0784) (0.1070)
Log of market potential (MP) 0.0017 0.1391 0.0120 0.0922
(0.1291) (0.1808) (0.1886) (0.2004)
Hirschmann-Herfindhal Index (HHI) 0.3598 0.5017* 0.1989 0.2475
(0.2230) (0.2766) (0.2958) (0.2618)
Log of area (km2) 0.0737* 0.0665 0.0554 0.050
(0.0391) (0.0510) (0.0441) (0.0596)
Breusch and Pagan Lagrangian multiplier test for 5.39 (0.020)
random effects (H0: Var(ui) = 0)
First stage partial R-squared 0.28 0.38
First stage partial F statistic 26.33 27.89
Underidentification test – Kleibergen-Paap rank LM 21.00 (0.000) 139.24 (0.000)
statistic (p-value)
Weak identification test – Kleibergen-Paap rank Wald 26.33 80.5
F statistica
Hansen J statistic (p-value)b 0.53 (0.469)
Observations 423 423 423 423
R-squared (within) 0.01 0.03
R-squared (between) 0.04 0.02
Notes: Robust standard errors corrected for clustering at the TTWA level (in parentheses). Significance levels: * p <
0.10, ** p < 0.05, *** p < 0.01. Control variables include year fixed-effects.
a
Using Stock-Yogo (2005) weak ID test critical values. The test rejects the null of weak ID if the test statistic exceeds
the critical value. 5% maximal IV bias relative to OLS: 13.91, 10% maximal IV size: 19.93.
b
Instruments: log of employment density in 1831 and 1851.
We then re-estimated the matching regression based on the firm fixed-effects obtained from
the LP1 estimator using the constrained sample of firms. Table A2 reports results for the pooled
OLS (POLS), RE, and the POLS and RE estimators combined with instrumental variables
models. The number of matched workers and firms is relatively smaller when we use the
constrained sample of firms: 11,323 employee-employer matches. This is about 76 per cent of
the employee-employer matches obtained with the full sample of firms. Similarly, the number of
observations included in the matching regression is also reduced to 78 per cent of the original
sample size (423 TTWAs compared to 542 TTWAs over the five year period between 2002 and
2006). On average we observe each TTWA 3 times over the five year period between 2002 and
2006.
Overall, the findings obtained for urban agglomeration economies from the POLS and the RE
models are consistent between the two samples of firms. The results indicate that urban
agglomeration economies have a positive and statistically significant impact on the quality of the
employee-employer matching. The POLS (RE) estimate is similar to that obtained in the matching
model based on the full sample of firms -0.063 against 0.075 (0.071 against 0.075). This means
that a 10 per cent increase in labour market employment density is associated with an increase of
0.75 percentage points in the value of the employee-employer matching (POLS and RE).
On the other hand, we observe that both IV models perform poorly. None of the covariates
are found to have a statistically significant effect. Using the same diagnostic tests discussed
Section 5 to assess instrument exogeneity and relevance we also conclude that the instrumental
variables can be treated as exogenous and that the instruments are not weak. In the previous
section the POLS-IV estimate for the effect of labour market density on the employee-employer

matching indicated a positive effect of 0.88 percentage points for a 10 per cent increase in
employment density; there is however no effect when we consider the restricted sample of firms.
References
Amiti M, Cameron L (2007) Economic geography and wages. The Review of Economics and Statistics 89: 15–29
Andersson F, Burgess S, Lane JI (2007) Cities, matching and the productivity gains of agglomeration. Journal of Urban
Economics 61: 112–128
Arellano M, Bond SR (1991) Some tests of specification for panel data: Monte Carlo evidence and an application to
employment equations. The Review of Economic Studies 58: 277–297
Arellano M, Bover O (1995) Another look at the instrumental variable estimation of error-components models. Journal
of Econometrics 68: 29–51
Baldwin RE, Okubo T (2006) Heterogeneous firms, agglomeration and economic geography: Spatial selection and
sorting. Journal of Economic Geography 6: 323–346
Barnes M (2002) Annual respondents datatase: User guide. Economic Analysis & Satellite Accounts Division, Office
for National Statistics, London
Barnes M, Martin R (2002) Business data linking: An introduction. Business Data Linking Branch and CeRiBA,
Economic Analysis and Satellite Accounts Division, Office for National Statistics, London
Bird D (2004). Methodology for the 2004 annual survey of hours and earnings. Employment Earnings and Productivity
Division, Office for National Statistics, London
Blundell R, Bond S (1998) Initial conditions and moment conditions in dynamic panel data models. Journal of
Econometrics 87: 115–143
Ciccone A, Hall RE (1996) Productivity and the density of economic activity. American Economic Review 86:
54–70
Combes PP, Duranton G, Gobillon L (2008) Spatial wage disparities: Sorting matters! Journal of Urban Economics 63:
723–742
Combes PP, Duranton G, Gobillon L, Roux S (2010) Estimating agglomeration economies with history, geology, and
worker effects. In: Glaeser EL (ed) Agglomeration economics. University of Chicago Press, Chicago, IL
de Groot HLF, Poot J, Smit MJ (2009) Agglomeration, innovation and regional development: Theoretical perspectives
and meta-analysis. In: Capello R, Nijkamp P (eds) Handbook of regional growth and development theories. Edward
Elgar, Cheltenham
Di Addario S (2011) Job search in thick markets. Journal of Urban Economics 69: 303–318
Duranton G, Puga D (2004) Micro-foundations of urban agglomeration economies. In: Henderson JV, Thisse JF (eds)
Handbook of regional and urban economics. Elsevier, Amsterdam
Ellison G, Glaeser EL (1997) Geographic concentration in US manufacturing industries: A dartboard approach. Journal
of Political Economy 105: 889–927
Ellison G, Glaeser EL, Kerr WR (2010) What causes industry agglomeration? Evidence from coagglomeration patterns.
American Economic Review 100: 1195–1213
Fu S (2007) Smart café cities: Testing human capital externalities in the Boston metropolitan area. Journal of Urban
Fujita MM, Krugman P, Venables AJ (1999) The spatial economy: Cities, regions and international trade. MIT Press,
Cambridge, MA
Fujita MM, Thisse JF (2002) Economics of agglomeration: Cities, industrial location and regional growth. Cambridge
University, Cambridge
Gabe TM, Abel JR (2012) Specialized knowledge and the geographic concentration of occupations. Journal of
Economic Geography 12: 435–453
Glaeser EL, Mare DC (2001) Cities and skills. Journal of Labor Economics 19: 316–342
Graham DJ, Melo PC, Jiwattanakulpaisarn P, Noland RB (2010) Testing for bi-directional causality between produc-
tivity and agglomeration economies. Journal of Regional Science 50: 935–951
Hansen LP (1982) Large sample properties of generalized method of moments estimators. Econometrica 50: 1029–1054
Kim Y, Barkley DL, Henry MS (2000) Industry characteristics linked to establishment concentrations in nonmetro-
politan areas. Journal of Regional Science 40: 234–259
Kleibergen F, Paap R (2006) Generalized reduced rank tests using the singular value decomposition. Journal of
Econometrics 133: 97–126
Levinsohn J, Petrin A (2003) Estimating production functions using inputs to control for unobservables. Review of
Economic Studies 70: 317–341
Marshall A (1920) Principles of economics. Library of Economics and Liberty. URL: http://www.econlib.org/library/
Marshall/marP.html

Melo PC, Graham DJ, Noland RB (2009) A meta-analysis of estimates of urban agglomeration economies. Regional
Science and Urban Economics 39: 332–342
Mion G, Naticchioni P (2009) The spatial sorting and matching of skills and firms. Canadian Journal of Economics 42:
28–55
Nocke V (2006) A gap for me: Entrepreneurs and entry. Journal of the European Economic Association 4: 929–956
Olley GS, Pakes A (1996) The dynamics of productivity in the telecommunications equipment industry. Econometrica
64: 1263–1297
Overman HG, Puga D (2010) Labour pooling as a source of agglomeration: An empirical investigation. In: Glaeser EL
(eds) Agglomeration economics. University of Chicago Press, Chicago, IL
Puga D (2010) The magnitude and causes of agglomeration economies. Journal of Regional Science 50: 203–219
Rice P, Venables AJ, Patacchini E (2006) Spatial determinants of productivity: Analysis for the regions of Great Britain.
Regional Science and Urban Economics 36: 727–752
Rigby DL, Essletzbichler J (2002) Agglomeration economies and productivity differences in US cities. Journal of
Economic Geography 2: 407–732
Rosenthal SS, Strange CW (2001) The determinants of agglomeration. Journal of Urban Economics 50: 191–229
Rosenthal SS, Strange WC (2004) Evidence on the nature and sources of agglomeration economies. In: Henderson JV,
Thisse JF (eds) Handbook of urban and regional economics. Elsevier, Amsterdam
Stock JH, Yogo M (2005) Testing for weak instruments in linear IV regression. In: Andrews DWK, Stock JH (eds)
Identification and inference for econometric models. Cambridge University Press, Cambridge
Wheeler CH (2001) Search, sorting, and urban agglomeration. Journal of Labor Economics 19: 879–899
Wheeler CH (2006) Cities and the growth of wages among young workers: Evidence from the NLSY. Journal of Urban
Yankow JJ (2006) Why do cities pay more? An empirical examination of some competing theories of the urban wage
premium. Journal of Urban Economics 60: 139–161

T2Melo2013 Labor Pooling and Agglomeration UK

Transféré par

Informations du document

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

T2Melo2013 Labor Pooling and Agglomeration UK

Transféré par

Droits d'auteur :

Formats disponibles

doi:10.1111/j.1435-5957.2012.00462.

Testing for labour pooling as a source of agglomeration

Received: 2 July 2010 / Accepted: 12 July 2012

JEL classification: D24, J24, J31, R12

Key words: Agglomeration economies, labour pooling, matched worker-firm longitudinal

Papers in Regional Science, Volume •• Number •• •• 2013.

Papers in Regional Science, Volume •• Number •• •• 2013.

2 Previous research on labour pooling externalities

Papers in Regional Science, Volume •• Number •• •• 2013.

Table 1. Previous empirical evidence on labour market pooling externalities

Author Empirical approach Labour market pooling Country

Papers in Regional Science, Volume •• Number •• •• 2013.

Papers in Regional Science, Volume •• Number •• •• 2013.

3.1 Empirical models

Papers in Regional Science, Volume •• Number •• •• 2013.

3.1.1 Wage and production functions

ln Wit = α 0 + ∑ β k ln Xit ,k + δ t + λo + σ s + ηi + ε it (1)

3.1.2 Matching regression

Papers in Regional Science, Volume •• Number •• •• 2013.

corr (ηi , η f )rt = α 0 + ∑ β k ln Xrt ,k + δ t + ε rt (3)

3.2 Estimation issues

3.2.1 Unobserved heterogeneity

3.2.2 Simultaneity bias of input factors

3.2.3 Spatial sorting of workers and firms

Papers in Regional Science, Volume •• Number •• •• 2013.

3.2.4 Endogeneity of agglomeration economies

4 Data sources and variables

Table 2. Summary statistics of explanatory variables – wage function

Variable Label Mean SD Min Max

Real net hourly wage (£) W 8.18 4.92 N/A N/A

Papers in Regional Science, Volume •• Number •• •• 2013.

Table 3. Summary statistics of explanatory variables – production function

Variable Label Mean SD Min Max

Gross output (£) Go 28,375 324,570 N/A N/A

Note: N/A Subject to data disclosure.

Table 4. Summary statistics of explanatory variables – matching function

Variable Label Mean SD Min Max

Employment density Dens 230.85 212.98 5.56 1,409.37

Papers in Regional Science, Volume •• Number •• •• 2013.

5 Results and discussion

Table 5. Results from the wage function

Age 0.0375*** 0.0441***

Papers in Regional Science, Volume •• Number •• •• 2013.

Table 6. Results from the production function

POLS FE LP1 LP2

Log of labour 0.8041*** 0.4156*** 0.818*** 0.7045***

Table 7. Results from the matching function

POLS RE POLS-IV RE-IV

Log of employment density (Dens) 0.0628** 0.0713** 0.0877* 0.0844

Papers in Regional Science, Volume •• Number •• •• 2013.

Papers in Regional Science, Volume •• Number •• •• 2013.

Papers in Regional Science, Volume •• Number •• •• 2013.

Papers in Regional Science, Volume •• Number •• •• 2013.

Missing values and reporting errors

Papers in Regional Science, Volume •• Number •• •• 2013.

To avoid issues of comparability and productivity differentials related to the non-competitive

• Public Administration and Defence; Compulsory Social Security (SIC 75);

Papers in Regional Science, Volume •• Number •• •• 2013.

Papers in Regional Science, Volume •• Number •• •• 2013.

Fig. A2. Distribution of worker fixed-effects between HD and LD TTWAs

Estimator POLS FE LP1 LP2

Log of labour 0.7333*** 0.4156*** 0.6693*** 0.5743***

Testing for potential biases

Papers in Regional Science, Volume •• Number •• •• 2013.

Age 0.0375* 0.0441*

Log of labour 0.8041* 0.4156* 0.818* 0.7045*

Log of employment density (Dens) 0.0628 0.0713 0.0877* 0.0844

Log of labour 0.7333* 0.4156* 0.6693* 0.5743*

Log of employment density (Dens) 0.0747* 0.0751 0.0263 0.0270