Our original record linkage paper published in 2022 in Historical Life Course Studies has been published in Chinese, in 大数据与中国历史研究 (Big Data and the Study of Chinese History).
Here is the Chinese language abstract
本文介绍了利用中国历史官员量化数据库———清代( China Government Employee Dataset-Qing, CGED-Q) 进行人名匹配和官员记录连接的方法。CGED-Q 包括缙绅录( Jinshenlu, JSL) 和科举记录( Examination Records, ER) 两大部分, 前者收录官、坊刻本文武官员季度名册, 后者收录科举中式者记录名册。本文首先重点评估了原始史料中各项变量的多样性和识别不连贯记录的潜力, 以此确定能够用于有效消歧的主要变量。民人官员的主要变量包括姓、名、籍贯省县, 旗人官员的主要变量则包括名和旗分等。其次评估了可能有助于进行连接匹配的次要变量。最后, 描述了主次变量记录匹配中各项问题的解决方法。
华中师大历史大数据微信公众号:微信公众号
Download the PDF: 大数据与中国历史研究 第5辑 康文林 陈必佳
Reference: 康文林,陈必佳. 2025. 中国历史官员量化数据库———清代( CGED-Q) 的人名匹配与官员记录连接. 大数据与中国历史研究 (Big Data and the Study of Chinese History). 第 5辑, 35-72
Here is the English language original, if you arrived here by mistake while looking for that: https://hlcs.nl/article/view/11902
We have recently been overhauling our linkage process so that it can be adapted easily to other sources beyond the CGED-Q JSL. As such, the bespoke approach described in this article may be a bit dated. That said, the article should be very useful for its exhaustive documentation of the challenges that come up in Chinese language record linkage, and the frequency in an important historical source of issues like replacement of characters with ones that look similar.
We now have a manuscript describing our new approach to linkage.