Cameron D. Campbell 康文林

Family, Social Mobility, and Inequality in China and in Comparative Perspective

Menu
  • Research
    • Abridged CV
    • Full CV (PDF)
    • 2 page CV (PDF)
    • Google Scholar
    • CNKI
    • 百度学术
    • ORCID
    • HKUST Repository
  • News
  • Data
    • China Government Employee Database – Qing (CGED-Q) 中国历史官员量化数据库(清代)
      • Download Data
      • Search by Name
      • CGED-Q Jinshenlu Public Release – Resources for Users
    • China Multigenerational Panel Databases 中國多代人口数据庫
      • Download Data
  • Lee-Campbell Group
    • People
    • Projects
    • Publications
  • Photography
    • Photo site 摄影网站
    • Map view
    • Updates
  • Contact
Menu

Paper on machine learning approach to nominative record linkage in Chinese historical sources

Posted on March 27, 2026March 27, 2026 by camecamp

Our paper introducing a machine-learning approach to nominative record linkage in Chinese historical sources is now online at Historical Methods. The paper was lead-authored by Yue YU, and co-authored with Yueran Hou and Yibei Wu. The paper is titled “A machine learning approach for nominative record linkage in Chinese historical databases.” It is a revised version of the working paper that we previously uploaded at SocArXiv.

Reference:

YU Yue, Yueran Hou, Yibei Wu, Cameron Campbell. 2026. A Machine Learning Approach for Nominative Record Linkage in Chinese Historical Databases. (w/ Yue Yu, Yueran Hou, Yibei Wu). Historical Methods. Online access, 26 March 2026: https://www.tandfonline.com/doi/full/10.1080/01615440.2026.2641454

Abstract

We introduce a generic machine learning-based pipeline for nominative linkage of records within and across Chinese historical datasets. The pipeline addresses key challenges, including character variations, incomplete data, and scalability issues specific to historical datasets in which names and other attributes are recorded with Chinese characters, not just for China, but potentially for Korea, Japan and Vietnam. Techniques developed for attributes recorded in phonetic alphabets are of limited use for Chinese characters not only because homonyms are common, but characters that are similar enough in appearance to be mistaken for each other may sound different. Our approach integrates stroke-based character embeddings for efficient blocking, supervised classification with active learning for record matching, and graph-based clustering for final linkage. We demonstrate the effectiveness of this pipeline using the career records of officials in the China Government Employee Database-Qing Jinshenlu (CGED-Q JSL). We achieve improved linkage quality compared to standard probabilistic methods, with longer linked sequences of career records and fewer aberrant transitions. To validate the generalizability, we also successfully apply the pipeline to another database and a cross-database linkage task. By minimizing the need for manual tuning, our pipeline offers a more accessible and effective solution for Chinese historical data linkage.

  • Instagram
  • Photography website
  • Bluesky
  • LinkedIn

Recent Posts

  • Paper on kin networks of local officials in History of the Family

    April 20, 2026
  • Paper on age dynamics of Qing officials in 近代史研究

    March 27, 2026
  • Working paper on kin networks of local officials in the late Qing

    January 22, 2026
  • New edited volume Quantitative History of China: State Capacity, Institutions, and Development

    November 12, 2025
  • Chinese translation of our original record linkage paper

    November 5, 2025
  • New manuscript about Kin Networks of Exam Degree Holders

    November 4, 2025

Recent Photography

  • Pearl River Promenade at Night, near Canton Tower, in Guangzhou

    April 4, 2026
  • Canton Tower at Night, Guangzhou 廣州廣州塔夜景

    April 4, 2026
  • Haixin Pedestrian Bridge at Night, Guangzhou 廣州海心橋夜景

    April 4, 2026
  • Vancouver Waterfront Near the Convention Centre, at Night

    March 18, 2026
  • Dihua Street in Taipei, at night 臺北迪化街夜景

    February 28, 2026
  • Taiping Elementary School at night, Keelung

    February 28, 2026
  • Downtown Keelung seen from Huzishan, at night

    February 28, 2026

©2026 Cameron D. Campbell 康文林 | Theme by SuperbThemes