Cameron D. Campbell 康文林

Family, Social Mobility, and Inequality in China and in Comparative Perspective

Menu
  • Research
    • Abridged CV
    • Full CV (PDF)
    • 2 page CV (PDF)
    • Google Scholar
    • 百度学术
    • ORCID
    • HKUST Repository
  • News
  • Data
    • China Government Employee Database – Qing (CGED-Q) 中国历史官员量化数据库(清代)
      • Download Data
      • Search by Name
      • CGED-Q Jinshenlu Public Release – Resources for Users
    • China Multigenerational Panel Databases 中國多代人口数据庫
      • Download Data
  • Lee-Campbell Group
    • People
    • Projects
    • Publications
  • Photography
    • Photo site 摄影网站
    • Map view
    • Updates
  • Contact
Menu

Stacked area graphs in STATA

Posted on April 7, 2020May 9, 2024 by camecamp

A few years ago I began looking for STATA commands to produce stacked area graphs that would allow us to look at the evolution of the distribution of a categorical variable (levels of a factor) over time. Applications included looking at the distributions of chushen (出身) of Qing officials over time, or the positions held by jinshi (进士) degree holders as a function of years since earning their agree. I know that this is straightforward in R, but as far as I can tell there is nothing in STATA that does this easily.

I created two commands, taking as my starting point the examples provided by Andrew Musau in his 2018 post in a thread on stacked area graphs on StataList.

The first is stackedcount, which plots the number of records in the categories of a variable as a function of a cardinal x variable that takes on discrete values, for example, calendar year or age. I have deposited it at SSC and it can be installed with

ssc install stackedcount 

The second is stackedpercent, which plots of percent of records in each category as a function of the x variable. I will deposit it at SSC after I am sure there are no problems with stackedcount.

As far as I could tell, STATA doesn’t have a native command to do this sort of basic descriptive graph. Interestingly, a few months ago when I was dabbling with R it turned out to be very straightforward there, so I am not sure why STATA doesn’t provide this.

If I missed something and there is already a package for this or it has been added to STATA, by all means point it out to me.

stackedcount

stackedcount shows the count of records in each of the categories of a y variable as a function of a discrete numeric x variable.

stackedcount varlist [if] [in] [,options]
where varlist is
y x

y is a categorical numeric variable
x is a numeric variable that takes on discrete values (for example, calendar year)

x can be non-integer, but again, the values should be discrete. The program will not ‘bin’ values of x. The program will not bin values of x.

If the categorical variable y you want to plot is string, run encode beforehand to create a labeled numeric categorical variable and pass that to the command.

The areas will be stacked according to the numeric value for each category, with the lowest on the bottom. I generally do the categorizations manually rather than relying on encode so that I can control the order in which the areas are stacked.

The areas are presented as stacked bars, with the height of each cumulative value of y set according to the most recent value of x. Heights remain fixed until the next available value of x. This is achieved by additional code that if removed would allow for the cumulative values of y for successive values of x to be connected by diagonal lines. I may add code for an option to turn off the code that produces the bars and connect the cumulative values of y, but haven’t decided yet.

Most of the important options for twoway are available via pass-through:

xlabel, xtick, xmtick, xtitle, xrange, ylabel, ytick, ymtick, ytitle, yscale, caption, scheme, note, legend

The only option that may need explanation is xrange, which allows for missing values of x to be filled in, so that no bars are plotted for them. The reason for this is that I commonly use stackedcount to plot the distributions of characteristics of officials recorded in quarterly editions of a historical source (缙绅录). Some of these editions are missing, and in that case, I don’t want the values from the most recent preceding edition to be carried forward to the next existing edition. Rather, I would prefer that no bar be plotted for that edition.

xrange will fill in values of x according to the numlist specified with it. numlist works as usual in STATA. In the sample figure below, editions of the 缙绅录 are quarterly, and year is coded as (for example) 1870, 1870.25, 1870.5, 1870.75 and so forth, where 1870 corresponds to spring, 1870.25 to summer, 1850.50 to autumn, and 1870.75 to winter. For the sample below, xrange is specified as 1830(0.25)1912 so every missing season is plotted as empty rather than carried forward from the most recent season.

Others are easy to add by following the model in the code.

To install:

ssc install stackedcount

As examples, here are a couple of figures from our forthcoming paper in the Journal of Chinese History introducing the China Government Employee Dataset-Qing (CGED-Q).

Chushen of Non-Banner non-central government officials, 1830-1912.

stackedcount chushen year if xuhao < 20000 & !qiren & ming != "" & year >= 1830 & !central & !fangkeben_only & !irregular & !ignore & !new_in_1911, xtitle("Year") xtick(1830(5)1910) xmtick(1830(1)1910) xlabel(1830(10)1910) ytitle("Records of officials") ylabel(0(2000)8000,labsize(small)) ymtick(0(1000)8000) legend(cols(4) size(vsmall)) xrange(1830(0.25)1912)

Chushen of Non-Banner central government officials, 1830-1912.

stackedcount chushen year if xuhao < 20000 & !qiren & ming != "" & year >=start_year' & central & !fangkeben_only & !irregular & !ignore & !new_in_1911, xtick(1830(5)1910) xmtick(1830(1)1910) xlabel(1830(10)1910) ytitle("Records of officials") ytitle("Records of officials") ylabel(0(500)2000, labsize(small)) ymtick(0(100)2000) legend(cols(4) size(vsmall))

stackedpercent

stackedpercent shows the percentage of records in each of the categories of a y variable as a function of a continuous x variable.

stackedpercent varlist [if] [in] [,options]

where varlist is

x y

x is a categorical numeric variable
y is a continuous numeric variable (for example, year)

If the categorical variable you want to plot is string, run encode beforehand to create a labeled numeric categorical variable and pass that to the command.

To install, you will need to download the ado file and place it in your personal ado directory. You can find out where that is by typing sysdir. If you have no idea what I am talking about, probably to wait until I deposit it at SSC.

The areas will be stacked according to the numeric value for each category, with the lowest on the bottom. I generally do the categorizations manually rather than relying on encode so that I can control the order in which the areas are stacked.

Most of the important options for twoway are available via pass-through:

xlabel, xtick, xmtick, xtitle, xrange, ylabel, ytick, ymtick, ytitle, yscale, caption, scheme, note, legend

Others are easy to add by following the model in the code.

Here is an example in which I use the command to plot the percent distribution of positions held by non-Banner jinshi (进士) degree holders in the Qing bureaucracy according to the number of years since they earned for the period covered by the China Government Employee Database-Qing (CGED-Q) that I have been working on with Bijia Chen, James Lee, Yuxue Ren and others. I produced two separate plots, one for first and second tier degree holders (一甲,二甲) and another for third tier degree holders (三甲). The results are generally in line with expectations based on the appointment regulations.

stackedpercent guanzhi_js gap if gap >= 0.5 & gap <= 20 & (甲第 == 1 | 甲第 == 2) & !qiren, legend(size(small) cols(4)) xtitle("Years since exam") ytitle("Percent") caption("Positions held by jinshi since years since exam 甲第 1 2 - non-Banner") note("$note_time_stamp")

stackedpercent guanzhi_js gap if gap >= 0.5 & gap <= 20 & (甲第 == 3) & !qiren, legend(size(small) cols(4)) xtitle("Years since exam") ytitle("Percent")
caption("Positions held by jinshi since years since exam 甲第 3 - non-Banner") note("$note_time_stamp")

  • Instagram
  • Photography website
  • Bluesky
  • LinkedIn

Recent Posts

  • New piece in Guangdong Social Science

    March 29, 2025
  • New article in Explorations in Economic History

    March 21, 2025
  • China Government Employee Dataset-Beiyang (CGED-BY) added to online search

    February 11, 2025
  • Paper in 历史档案 (Historical Archives) by Chen Jun on mid- and low-level Qing military officials

    October 20, 2024
  • Kinship information in the 同年齿录 and related sources completed in August 2024

    August 28, 2024
  • CGED-Q Meeting at Central China Normal University, July 29, 2024-August 2, 2024

    August 6, 2024

Recent Photography

  • HKUST Guangzhou 香港科技大學(廣州)

    March 29, 2025
  • Guozijian and Confucius Temple in Beijing 北京國子監及孔廟

    March 29, 2025
  • Yonghegong in Beijing 北京雍和宮

    March 29, 2025
  • Sunset at Razor Hill, near the HKUST campus 鷓鴣山日落竟

    February 15, 2025
  • Taiwan Province City God Temple 台灣省城隍廟

    February 15, 2025
  • Zhongzheng District in Taipei 臺北中正區

    February 15, 2025
  • A walk from Oban to Dunbeg and back, in the winter

    February 15, 2025

©2025 Cameron D. Campbell 康文林 | Theme by SuperbThemes