Earnings Dynamics and Mobility in Australia: New Evidence from ALife Data 1991-2020

Introduction

This appendix provides detailed statistics on earnings inequality and dynamics in Australia from 1991 to 2020 using ALife data. We follow the methodology developed by the Global Repository of Income Dynamics (GRID). Please note, this project is not officially associated with GRID.

For details on the data source, sample selection and variable construction, please refer to the main paper.

Data

Data description

Our primary data source is the ATO Longitudinal Information Files (ALife). This consists of a 10% random sample of individual tax filers in ATO’s 2016 client register. The data contains tax records for each individual over the period. Each year, a 10% random sample of new tax filers are added to the sample.¹ Our unit of measurement is the individual. In the Australian income tax system, all income tax liabilities are at the individual level, and there is no joint-filing of tax returns. Our cross-sectional sample provides us a point-in-time snapshot of annual income, tax, and public transfer data between 1991–2020.

Variable construction

Our main earnings measure is real total labour income indexed by the 2020 CPI. We construct the following measures of earnings for worker \(i\) in year \(t\):

  1. Raw real earnings in levels \(y_{it}\), and logs, \(\log(y_{it})\).

  2. Residualized log earnings, \(\epsilon_{it}\). We regress log real earnings on a full set of age dummies, separately for each year and gender. This controls for trends in earnings across workers at different stages of their life or business cycle.

  3. Permanent earnings, \(P_{it-1} = \left( \sum_{s=t-3}^{t-1} y_{is} \right) / 3\), defined as the average earnings over the previous three years. We compute percentiles of permanent earnings.

  4. Residualized permanent earnings, \(\epsilon_{it}^P\), computed from \(P_{it-1}\) using the same method as we used to compute \(\epsilon_{it}\).

  5. 1-year change in residualized log earnings, \(g_{it}^1 = \Delta \epsilon_{it} = \epsilon_{it+1} - \epsilon_{it}\). This represents the 1-year forward change in \(\epsilon_{it}\).

  6. 5-year change in residualized log earnings, \(g_{it}^5 = \Delta^5 \epsilon_{it} = \epsilon_{it+5} - \epsilon_{it}\). This represents the 5-year forward change in \(\epsilon_{it}\).

We use the consumer price index (CPI) to convert variables to 2020 Australian dollars.

Samples

We construct the following three samples for our analysis.

  1. Cross-sectional (CS) sample: All individuals who satisfy these two criteria at a given year t form the cross-sectional (CS) sample for that year. The CS sample includes years 1991-2020. This sample is used to compute cross-sectional inequality statistics.

  2. Longitudinal (LX) sample: In order to study the distribution of earnings changes we restrict our CS sample to those individuals who have 1-year and 5-year forward earnings changes. This forms our longitudinal sample (LX) for the years 1991-2015.

  3. Heterogeneity (H) sample: We further restrict the LX sample to individuals for whom a permanent earnings measure can be constructed (see below). This restricts the sample to those who have been in the sample for the three previous consecutive years. The H sample includes years 1993-2015 and is used to study variation across demographic groups.

The following table compares the number of observations and the percentage of women in each sample with the raw data.

Number of individuals
Percentage of women
Year Original CS LX H Original CS LX H
1991 983,476 530,283 378,260 - 44.92 43.14 41.76 -
1992 979,065 527,550 380,470 - 44.96 43.42 42.13 -
1993 977,567 533,715 386,543 320,466 44.91 43.68 42.51 41.25
1994 989,879 545,664 395,941 323,470 45.03 43.96 42.86 41.64
1995 1,012,618 562,889 409,693 331,145 45.4 44.31 43.23 41.99
1996 1,034,423 590,827 426,375 341,581 45.74 44.48 43.51 42.31
1997 1,045,595 600,838 432,706 352,414 46.01 44.78 43.7 42.69
1998 1,048,281 609,306 434,783 362,646 46.01 44.97 43.83 42.88
1999 1,056,571 616,042 439,602 367,774 46.12 45.19 44.08 43.13
2000 1,076,253 626,512 446,972 372,791 46.31 45.48 44.34 43.28
2001 1,095,857 635,920 453,911 376,923 46.64 45.69 44.48 43.46
2002 1,112,807 640,395 457,142 382,168 47 45.79 44.63 43.76
2003 1,138,673 643,056 465,497 389,958 47.44 45.9 44.74 43.9
2004 1,171,995 652,977 469,845 391,155 47.79 46.07 44.89 43.97
2005 1,205,964 666,143 477,674 395,078 47.89 46.14 44.97 44.06
2006 1,235,593 679,819 488,542 402,392 47.96 46.34 45.08 44.16
2007 1,269,997 696,736 503,394 412,287 47.98 46.55 45.32 44.31
2008 1,318,165 725,584 516,141 419,451 47.86 46.65 45.45 44.5
2009 1,327,342 733,132 521,422 427,969 48.06 46.7 45.5 44.68
2010 1,340,228 739,348 528,695 439,966 48.05 46.7 45.53 44.81
2011 1,363,749 755,250 538,667 446,686 48.05 46.62 45.65 44.95
2012 1,370,301 771,205 546,236 451,853 47.67 46.7 45.81 45.04
2013 1,370,705 779,184 552,304 459,918 47.58 46.7 45.9 45.14
2014 1,403,134 788,363 559,697 467,106 47.76 46.76 46.18 45.39
2015 1,432,924 798,600 564,879 470,454 47.99 47.01 46.57 45.67
2016 1,467,041 808,594 - - 48.22 47.24 - -
2017 1,499,854 819,852 - - 48.47 47.42 - -
2018 1,527,016 833,686 - - 48.58 47.64 - -
2019 1,556,649 848,159 - - 48.78 47.92 - -
2020 1,557,642 854,916 - - 49 48.22 - -

Summary statistics

The following table presents summary statistics for our cross-section sample. y denotes the earnings variable,and log denotes the log of earnings.

The following table consists of summary statistics for the 1-year and 5-year change in residualized log earnings for the longitudinal sample.

The following table presents summary statistics for our heterogeneity sample. y denotes the permanent earnings variable,and log denotes the log of permanent earnings.

Cross-sectional statistics

Earnings by key percentiles

The main paper plots log earnings by key percentiles relative to their respective values in 1991. In this section we display trends in levels.

Earnings at the top

Earnings inequality

We provide some further metrics to measure inequality in addition to those displayed in the main paper.

Earnings dynamics over time

In this section, we use our longitudinal sample to explore earnings dynamics over time. In addition to statistics on 1-year change in earnings provided in the main paper, we show those for 5-year change in earnings in this section.

Dispersion of earnings changes

Skewness of earnings change distribution

Kurtosis of earnings change distribution

Earnings dynamics by age and permanent income rank

In this section, we show how earnings dynamics vary by age and permanent income rank. The main paper examined 1-year changes in earnings by age and permanent income rank. Here, we show those for 5-year changes.

Dispersion of earnings changes

Skewness of earnings change distribution

Kurtosis of earnings change distribution

Earnings mobility

In this section, we use our Heterogeneity sample to compute some further measures of earnings mobility and examine 5 year mobility in addition to the 10 year mobility examined in the main paper.

Mean rank by quantiles of permanent income

Rank-rank slope

The rank-rank slope (RRS) is the coefficient \(\beta\) of the following regression:

\[ R_{i, t+5} = \alpha + \beta R_{i, t} + \epsilon_{i, t}. \]

This indicator, also common in the literature on intergenerational mobility (Chetty, Hendren, Kline, and Saez (2014)), measures rank persistence over the life cycle. In addition, we calculate a set of mobility indicators conditional on various initial positions within the distribution.

In a world without any rank persistence, \(\beta = 0\), and the indicators of bottom and top mobility would all be equal to 50 (the median rank). In a world with maximum persistence, \(\beta = 1\), and ranks would perpetuate: AUM would equal 25 and ADM would equal 75.

Absolute upward mobility

AUM is the expected rank at \(t + 5\) for individuals who are below the median at time \(t\):

\[ \text{AUM} = \mathbb{E}[R_{i, t+5} | R_{i, t} \leq 50]. \]

Absolute downward mobility

ADM is an index of mobility from the top, or absolute downward mobility (ADM):

\[ \text{ADM} = \mathbb{E}[R_{i, t+5} | R_{i, t} > 50]. \]

Mobility at top 1%

Mobility at the very top of the earnings distribution and estimate an indicator for those in the top 1% (M99) is measured as

\[ M99 = \mathbb{E}[R_{i, t+5} | R_{i, t} > 99]. \]