Skip Navigation
Skip to contents

Diabetes Metab J : Diabetes & Metabolism Journal



Page Path
HOME > Diabetes Metab J > Volume 47(2); 2023 > Article
Original Article
Guideline/Fact Sheet Comparison of Operational Definition of Type 2 Diabetes Mellitus Based on Data from Korean National Health Insurance Service and Korea National Health and Nutrition Examination Survey
Jong Ha Baek1,2orcid, Yong-Moon Park3,4, Kyung Do Han5, Min Kyong Moon6, Jong Han Choi7, Seung-Hyun Ko8orcid
Diabetes & Metabolism Journal 2023;47(2):201-210.
Published online: February 8, 2023
  • 219 Download
  • 2 Web of Science
  • 3 Crossref
  • 3 Scopus

1Department of Internal Medicine, Gyeongsang National University Changwon Hospital, Gyeongsang National University College of Medicine, Changwon, Korea

2Institute of Health Science, Gyeongsang National University, Jinju, Korea

3Department of Epidemiology, Fay W. Boozman College of Public Health, University of Arkansas for Medical Sciences, Little Rock, AR, USA

4Winthrop P. Rockefeller Cancer Institute, University of Arkansas for Medical Sciences, Little Rock, AR, USA

5Department of Statistics and Actuarial Science, Soongsil University, Seoul, Korea

6Department of Internal Medicine, Seoul Metropolitan Government Seoul National University Boramae Medical Center, Seoul National University College of Medicine, Seoul, Korea

7Division of Endocrinology and Metabolism, Konkuk University Medical Center, Konkuk University School of Medicine, Seoul, Korea

8Division of Endocrinology and Metabolism, Department of Internal Medicine, St. Vincent’s Hospital, College of Medicine, The Catholic University of Korea, Suwon, Korea

Corresponding author: Seung-Hyun Ko orcid Division of Endocrinology and Metabolism, Department of Internal Medicine, St. Vincent’s Hospital, College of Medicine, The Catholic University of Korea, 93 Jungbu-daero, Paldal-gu, Suwon 16247, Korea E-mail:
• Received: October 26, 2022   • Accepted: December 5, 2022

Copyright © 2023 Korean Diabetes Association

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • Background
    We evaluated the validity and reliability of the operational definition of type 2 diabetes mellitus (T2DM) based on the Korean National Health Insurance Service (NHIS) database.
  • Methods
    Adult subjects (≥40 years old) included in the Korea National Health and Nutrition Examination Survey (KNHANES) from 2008 to 2017 were merged with those from the NHIS health check-up database, producing a cross-sectional dataset. We evaluated the sensitivity, specificity, accuracy, and agreement of the NHIS criteria for defining T2DM by comparing them with the KNHANES criteria as a standard reference.
  • Results
    In the study population (n=13,006), two algorithms were devised to determine from the NHIS dataset whether the diagnostic claim codes for T2DM were accompanied by prescription codes for anti-diabetic drugs (algorithm 1) or not (algorithm 2). Using these algorithms, the prevalence of T2DM was 14.9% (n=1,942; algorithm 1) and 20.8% (n=2,707; algorithm 2). Good reliability in defining T2DM was observed for both algorithms (Kappa index, 0.73 [algorithm 1], 0.63 [algorithm 2]). However, the accuracy (0.93 vs. 0.89) and specificity (0.96 vs. 0.90) tended to be higher for algorithm 1 than for algorithm 2. The validity (accuracy, ranging from 0.91 to 0.95) and reliability (Kappa index, ranging from 0.68 to 0.78) of defining T2DM by NHIS criteria were independent of age, sex, socioeconomic status, and accompanied hypertension or dyslipidemia.
  • Conclusion
    The operational definition of T2DM based on population-based NHIS claims data, including diagnostic codes and prescription codes, could be a valid tool to identify individuals with T2DM in the Korean population.
The prevalence of type 2 diabetes mellitus (T2DM) has increased worldwide, and diabetes itself is closely related to an increased risk of atherosclerotic cardiovascular diseases such as myocardial infarction and ischemic stroke, as well as mortality. As a result, population-based data have been widely used in epidemiologic studies [1-3] to identify individuals with diabetes and evaluate diabetes-related comorbidities and risk factors. The population-level classification of T2DM can also provide informative data to guide and prioritize populations at the greatest risk and those most likely to benefit from interventions and treatment. However, there is a limitation in the population-based claim database (DB) because accurate diagnoses cannot be made due to limited clinical and laboratory information, despite the advantage of the vast amount of data.
In Korea, two representative population-based DBs have been used, the Korea National Health and Nutrition Examination Survey (KNHANES) DB, with a cross-sectional design, and the National Health Insurance Service (NHIS) DB, with a national claims DB cohort design [4]. The Korean NHIS, a single-payer system for all residents, covers 97.1% of Koreans (approximately 50 million individuals), and this DB could be an efficient resource for diabetes research based on the entire population [5]. These big DBs have different advantages and disadvantages, depending on their characteristics.
Clinical measures, including glycosylated hemoglobin (HbA1c) and the oral glucose tolerance test (OGTT), are the gold standards for diagnosing diabetes [6]. However, it is difficult to routinely conduct an HbA1c test or OGTT in a study involving an entire population, especially for subjects with mild hyperglycemia. Instead, an operational definition was adopted to define diabetes using claims-based data and national health examination data in the NHIS DB. Generally, T2DM can be defined as the assignment of an International Classification of Disease, 10th Revision (ICD-10) code corresponding to T2DM (E11-14), with or without accompanying prescription codes for anti-diabetic drugs, or a high fasting glucose level (≥126 mg/dL) in the health check-up DB [7]. However, different operational definition criteria for diabetes were adopted for previous studies, depending on whether the diagnosis was based only on the corresponding ICD-10 codes [8,9], the use of concomitant drugs prescription were included [10-15], or fasting glucose results were included [16,17].
Whether the accuracy of defining diabetes based on claims data using diagnostic codes (ICD-10) with or without prescription codes (anti-diabetic drug use) is consistent with actual diabetes in the real-world is unknown. The quality of data must first be evaluated for fitness for use. Previous validation studies were performed based on comparisons with self-reports, telephone-based surveys, or medical chart reviews [18]. These methods may include biases, such as recall bias and selection bias, that affect accuracy and concordance. Our study aimed to evaluate the validity and reliability of the NHIS data-based definition of T2DM by comparing it with other population-based KNHANES data as a standard reference. The overall sensitivity, specificity, positive and negative predictive value, accuracy, and agreement were analyzed. We also compared the prevalence and concordance of T2DM when the two algorithms were applied, depending on whether the prescription codes and diagnostic codes were included in the criteria. To the best of our knowledge, this was the first study to validate the operational definition of T2DM using two big, linked Korean national DBs.
The Institutional Review Board of The Catholic University of Korea (IRB No.: VC18FESI0240) approved this study. The study was conducted in compliance with the Declaration of Helsinki. Written informed consent by the subjects was waived due to a retrospective nature of our study and anonymous and de-identified information was used for analysis.
Data sources
The Korean NHIS program is a computerized DB containing all claims data, including patient demographics, drug prescriptions, diagnostic codes for the disease coding system (ICD), insurers’ payment coverage, patients’ deductions, and claimed treatment details [7]. Among the total datasets in the NHIS DB, qualifications, claims, health check-up DB, and death information were used. We investigated whether there were fasting glucose levels in the health check-up DB and whether there were ICD-10 codes corresponding to T2DM and claimed prescription data for anti-diabetic drugs in the Korean Health Insurance Review and Assessment. All Korean citizens are encouraged to receive regular biannual or pre-employment health evaluations provided by NHIS. This regular health examination included assessments of anthropometric measures, blood pressure, social history, physical activity levels, and laboratory tests after overnight fasting, including serum glucose, total cholesterol, creatinine, liver function, and urinalysis.
KNHANES is a population-based cross-sectional survey designed to assess Koreans’ health-related behavior, health conditions, and nutritional status [19]. A retrospective sample of non-institutionalized civilians was obtained from all geographic regions in the country. In the KNHANES data, we analyzed the laboratory test results (fasting glucose and HbA1c levels) and collected responses to a questionnaire on whether the people included took anti-diabetic drugs or were diagnosed with T2DM. Among the eight phases of the KNHANES, data from the IV to VII phases (2008 to 2017) were analyzed, and adults over 40 years old were included in the study. The subjects surveyed by the KNHANES each year were matched to the first claims data in the NHIS health check-up DB.
We identified a cohort of 39,701 subjects in the KNHANES from 2008 to 2017. Subjects who had no data on glucose levels in the medical check-up DB or did not undergo blood tests in a fasting state (for more than 8 hours) were excluded (n=1,598). Among them, 14,294 subjects in the NHIS health check-up DB matched those in the KNHANES. Finally, 13,006 subjects were included in the study, excluding those missing values for age, sex, body mass index, household income, alcohol or smoking status, regular exercise, or the presence of dyslipidemia, hypertension, or chronic kidney disease (CKD) in the KNHANES data (Fig. 1).
Definition of T2DM
According to the KNHANES, the presence of T2DM was defined if any of the following were present: (1) fasting glucose level of ≥126 mg/dL; (2) current use of any anti-diabetic medications; (3) a previous T2DM diagnosis; or (4) an HbA1c level of ≥6.5%. The use of medications and information on medical conditions were collected through the health interview questionnaire, using the face-to-face interview method [19]. According to the NHIS, T2DM was identified by the presence of at least one of these criteria: (1) fasting glucose level of ≥126 mg/dL in the health check-up DB or (2) the presence of ICD-10 codes corresponding to T2DM (E11-14) with or without accompanying prescription codes for any anti-diabetic drugs in the claims data. Concerning defining T2DM by the NHIS dataset, two algorithms based on claims data were applied, an algorithm for diagnosing T2DM when prescription codes were accompanied by diagnostic codes (algorithm 1) and an algorithm that only required diagnostic codes (algorithm 2).
Definition of hypertension, dyslipidemia, and socioeconomic variables
Variables were defined based on the KNHANES data. Hypertension was defined as a systolic blood pressure of ≥140 mm Hg or diastolic blood pressure of ≥90 mm Hg or taking anti-hypertensive drugs [20]. Dyslipidemia was defined as a total cholesterol level of ≥240 mg/dL or taking lipid-lowering drugs [21]. CKD was defined when the estimated glomerular filtration rate was <60 mL/min/1.73 m2 [22]. Information on household income was obtained through a questionnaire and dichotomized at the higher 25th percentile or divided into quartiles. Household income was calculated as an equivalent income by dividing monthly income into the square root of the family size. Alcohol intake was classified into three categories: never drinker, mild drinker (0 to 30 g/day), and heavy drinker (>30 g/day) [23]. The final education level was classified as elementary school graduation (education duration ≤6 years), middle school graduation (≤9 years), high school graduation (≤12 years), and university or higher (>12 years). When the education level was classified into two groups, they were classified as those who graduated from middle school or lower (education duration ≤9 years) and those who graduated from high school or higher (>9 years). Regular walking was defined as walking for at least 30 minutes per day at least five times a week [24].
Statistical methods
T2DM was classified based on whether it satisfied the diagnostic criteria of the NHIS and KHNANES, respectively. Accordingly, the subjects were divided into four subgroups (NHIS-/KNHANES-, NHIS+/KNHANES-, NHIS-/KNHANES+, and NHIS+/KNHANES+, where positivity indicated a case corresponding to T2DM according to the criteria used). We summarized the characteristics of the participants by the presence or absence of T2DM according to four groups. An independent t-test was conducted on the continuous variables, and a chi-squared test was conducted on the categorical variables. The validity of the NHIS definition of T2DM was measured by estimating the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy using the KNAHNES criteria as the standard. Accuracy was expressed as a proportion of correctly classified subjects (true positive and true negative) among all subjects [25]. The Kappa coefficient with corresponding 95% confidence intervals (CI) was also calculated to assess the reliability of the two diagnostic criteria for T2DM. In general, when the Kappa coefficient was larger than 0.8, there was excellent consistency, and when the Kappa value was between 0.6 and 0.8, there was good consistency [26]. Additionally, we evaluated whether there were differences in the agreement between the two T2DM criteria according to age, sex, household income, educational level, and the presence of hypertension or dyslipidemia. Data analysis was performed using SAS version 9.4 (SAS Institute, Cary, NC, USA).
The prevalence of T2DM according to operational definitions by the NHIS and KNHANES
The overall prevalence of T2DM satisfying KNHANES criteria was 14.2% (n=1,843). The prevalence of T2DM in the NHIS using algorithm 1 was 14.9% (n=1,942), and using algorithm 2, it was 20.8% (n=2,707) (Table 1). When classifying T2DM using the diagnostic criteria of the NHIS (algorithm 1) or KNHANES data, the prevalence of subjects who did not meet both the NHIS and KHNANES diagnostic criteria (true negative) was 82.1% (n=10,683); 381 subjects (2.9%) only met the KNHANES diagnostic criteria (false negative), 480 subjects met (3.7%) only the NHIS criteria (false positive), and 1,462 (11.2%) met both (true positive) (Table 2). When the condition of using an anti-diabetic drug was excluded from the NHIS criteria (algorithm 2), 10,025 (77.1%) subjects did not meet either set of criteria, 274 subjects (2.1%) met only the KHNANES diagnostic criteria, 1,138 (8.7%) met only the NHIS, and 1,569 (12.1%) met both criteria (Supplementary Table 1). According to algorithm 1, the subgroup that satisfied both criteria (NHIS+/KNHANES+) was older; had a higher proportion of male gender, hypertension, and CKD; higher HbA1c levels, and lower income and education levels than the subgroup that satisfied only one set of criteria (NHIS+/KNHANES-, NHIS-/KNHANES+) or were in the non-diabetic group (NHIS-/KNHANES-) (Table 2).
Concordance measures
The overall sensitivity, specificity, PPV, NPV, accuracy, and Kappa coefficient of the NHIS diagnostic criteria (algorithm 1) compared to the KNHANES criteria was 79% (95% CI, 77 to 81), 96% (95% CI, 95 to 96), 75% (95% CI, 73 to 77), 97% (95% CI, 96 to 97), 93% (95% CI, 93 to 94), and 0.73 (95% CI, 0.72 to 0.75), respectively. When algorithm 2 was adopted in the NHIS criteria, sensitivity, specificity, PPV, NPV, accuracy, and the Kappa coefficient were 85% (95% CI, 84 to 87), 90% (95% CI, 89 to 90), 58% (95% CI, 56 to 60), 97% (95% CI, 97 to 98), 89% (95% CI, 89 to 90), and 0.63 (95% CI, 0.61 to 0.64) (Fig. 2). The mean sensitivity (ranging from 73% to 83%), specificity (ranging from 93% to 97%), PPV (ranging from 67% to 82%), NPV (ranging from 94% to 98%), accuracy (ranging from 91% to 95%), and agreement (Kappa index, ranging from 0.68 to 0.78) of the NHIS definition criteria (algorithm 1) were not different by age, sex, income level, education status, and accompanied hypertension or dyslipidemia (Table 3).
Overall good validity and consistency of the diagnostic criteria using NHIS data were observed, which did not differ by age, sex, socioeconomic factors, or accompanied hypertension or dyslipidemia. When two diagnostic algorithms were applied to NHIS data according to whether the diagnostic codes were accompanied by prescription codes (algorithm 1) or not (algorithm 2), the prevalence of T2DM by algorithm 1 was lower than by algorithm 2, which was similar to the prevalence using the KNHANES data. In addition, although good reliability was observed for both algorithms, specificity and accuracy tended to increase in the algorithm that included both diagnostic and prescription codes (algorithm 1).
The prevalence of T2DM in the NHIS data using algorithm 1 (adopting both diagnostic and prescription codes) was lower, around 5.9% lower than when algorithm 2 (adopting only diagnostic codes) was applied. False positives (cases identified in NHIS claims data as having T2DM that were not diagnosed with T2DM by KNHANES criteria) increased when T2DM was defined only by diagnostic codes (8.7% in algorithm 2, 3.4% in algorithm 1). The overall prevalence of T2DM identified using algorithm 1 in this study was similar to the overall prevalence published in the 2021 Korea Diabetes Fact Sheet using KNHANES data (16.7%, approximately 6.05 million people) [15]. The mean HbA1c level in the NHIS+/KNHANES-group (false positives) was 5.9% using algorithm 1 and 5.7% using algorithm 2 in the study. There may be cases in which claims were issued for a T2DM diagnosis in subjects with prediabetes or early T2DM who did not require medications. Also, even though both algorithms (whether or not prescription claims data were included) provided good agreement based on the Kappa index, higher specificity, and accuracy for defining T2DM based on the NHIS were observed when claims for diagnostic codes were present along with prescription codes. When both diagnostic codes and prescription codes were included in the criteria for defining T2DM in the NHIS dataset, it helped to distinguish between patients who were in a prediabetic or early diabetic state and those who were in overt diabetes requiring treatment.
Concordance and the consistency of the diagnostic value based on NHIS criteria (algorithm 1) were not different according to age, sex, socioeconomic factors, and accompanied hypertension or dyslipidemia. The accuracy and specificity were over 90%, and the mean Kappa index showed good reliability (ranging from 0.68 to 0.78). These trends were consistent when algorithm 2 was applied (data not shown). A previous validation study compared accuracy and consistency using self-reports or telephone surveys as a reference standard [18]. Self-reports and telephone surveys are prone to recall bias, social desirability bias, poor understanding of the survey questions, incomplete knowledge, or their accurate diagnosis information. The literature review demonstrated that participants’ sociodemographic characteristics, such as age, gender, race, setting, and socioeconomic and health status, were associated with incomplete data linkage and the potential for systematic bias in reported outcomes [27]. Otherwise, our study used KNHANES data, a population-based surveillance system, as a reference standard. The KNHANES data has the advantage of minimizing selection bias compared to a diagnosis based on an electric medical chart review or interviews because the target population of the KNHANES comprises nationally representative non-institutionalized civilians in Korea. In addition, including clinical measures (HbA1c) as one of the diagnostic criteria in the reference standard for assessing validation can help overcome the potential limitation with systemic bias. Also, data linkage between the KNHANES and NHIS compensated for the shortcoming in the claims data, which was a lack of clinical information such as disease duration or glycemic control status, by adding information about self-reported surveys and urine or blood sample measurements in the KNHANES.
Validity of national claims administrative data was also evaluated in other countries such as Japan [28], Canada [29], and the USA [18]. Based on the Japanese national claim DB, the algorithm that contains both diagnosis-related codes for diabetes and medication codes had higher specificity (mean, 99.4% vs. 91.6%) and agreement (mean Kappa index, 0.80 vs. 0.49) than the algorithm that contains only diagnosis-related codes [28]. According to healthcare administrative data from Canada, compared with electronic medical records, the algorithm with the best specificity and PPV while maintaining sensitivity above 80% was either one hospitalization or physician claim and either one prescription for drug or diabetes-specific fee code at any time [29]. Validity of physician claims data-based on ICD-9 codes in the USA demonstrated that the sensitivity ranged from 26.9% to 97.0%, specificity ranged from 94.5% to 99.4%, and the Kappa index ranged from 0.8 to 0.9 [18]. Comparing the sensitivity, specificity, and Kappa agreement to other countries, the algorithm based on Korean NHIS data also demonstrated good validity and reliability.
Several limitations to this study should be considered. First, selection bias may have occurred because two-thirds of the subjects were excluded due to missing data on fasting glucose levels in the medical check-up DB or covariates in the KNHANES data, as well as cases where the person refused to provide personal information. In addition, only subjects aged 40 years or older were included in this study because national health check-up was conducted for 40 years or older. Second, among the diagnostic criteria in the KNHANES data used as a standard reference, questionnaires were also used to classify patients with T2DM through a self-reported survey. Other laboratory tests and data, such as the OGTT or hyperglycemia-accompanied symptoms, were not present in the data used to diagnose T2DM. As a result, the KNHANES data also did not fully reflect all patients with T2DM in real-world settings. Third, defining T2DM according to claims-based data can overlook patients with untreated diabetes or those who did not require treatment. Clinical factors such as disease duration, diabetes management status, or accompanied hypertension or dyslipidemia, were not assessed through the NHIS data. Despite these limitations, validating the operational diagnosis of T2DM by linking these two big national DBs, including clinical measures (HbA1c), represents a very important and timely investigation approach for future diabetes research in Korea.
In conclusion, population-based NHIS claims data can be useful in identifying subjects with T2DM by using diagnostic and prescription codes as diagnostic criteria in epidemiologic studies. The validity and accuracy of the population-based claims data for identifying T2DM were well documented and independent of sociodemographic and metabolic risk factors.
Supplementary materials related to this article can be found online at
Supplementary Table 1.
Characteristics of the study population classified as having type 2 diabetes mellitus according to KNHANES and NHIS criteria (algorithm 2)


Seung-Hyun Ko has been executive editor of the Diabetes & Metabolism Journal since 2022. Yong-Moon Park has been statistical advisor of the Diabetes & Metabolism Journal since 2021. They were not involved in the review process of this article. Otherwise, there was no conflict of interest.


Conception or design: J.H.B., K.D.H., S.H.K.

Acquisition, analysis, or interpretation of data: Y.M.P., K.D.H

Drafting or revising the work: J.H.B., M.K.M., J.H.C.

Final approval of the manuscript: K.D.H., S.H.K.


This research was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health and Welfare, Republic of Korea (grant number: HI18-C0275).

Fig. 1.
Study diagram. KNHANES, Korea National Health and Nutrition Examination Survey; NHIS, National Health Insurance System; HbA1c, glycosylated hemoglobin; BMI, body mass index; CKD, chronic kidney disease. aSystolic blood pressure ≥140 mm Hg and/or diastolic pressure ≥90 mm Hg or on medication, bTotal cholesterol ≥240 mm Hg and/or on medication, cEstimated glomerular filtration rate <60 mL/min/1.73 m2.
Fig. 2.
Sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and agreement according to the different algorithms of defining type 2 diabetes mellitus in Korean National Health Insurance System data. Algorithm 1: at least one of the following criteria was met: (1) fasting glucose ≥126 mg/dL or (2) International Classification of Disease, 10th Revision (ICD-10) codes corresponding to type 2 diabetes mellitus (E11-14) with accompanying prescription codes for any anti-diabetic drugs. Algorithm 2: at least one of the following criteria was met: (1) fasting glucose ≥126 mg/dL or (2) ICD-10 codes corresponding to type 2 diabetes mellitus (E11-14).
Table 1.
The prevalence of type 2 diabetes mellitus according to the NHIS and KNHANES diagnostic criteria stratified by the NHIS algorithm used
NHIS criteria KNHANES criteria
No Yes
Algorithm 1
 No 10,683 (82.1) 381 (2.9) 11,064 (85.1)
 Yes 480 (3.7) 1,462 (11.2) 1,942 (14.9)
Algorithm 2
 No 10,025 (77.1) 274 (2.1) 10,299 (79.2)
 Yes 1,138 (8.7) 1,569 (12.1) 2,707 (20.8)
Total 11,163 (85.8) 1,843 (14.2) 13,006 (100)

Values are presented as number (%). Algorithm 1: at least one of the following criteria was met: (1) fasting glucose level ≥126 mg/dL or (2) International Classification of Disease, 10th Revision (ICD-10) codes corresponding to type 2 diabetes mellitus (E11-14) with accompanying prescription codes for any anti-diabetic drugs; Algorithm 2: at least one of the following criteria was met: (1) fasting glucose level ≥126 mg/dL or (2) ICD-10 codes corresponding to type 2 diabetes mellitus (E11-14).

NHIS, National Health Insurance System; KNHANES, Korea National Health and Nutrition Examination Survey.

Table 2.
Characteristics of the study population classified as type 2 diabetes mellitus according to the KNHANES and NHIS criteria (algorithm 1)
NHIS criteria (algorithm 1) KNHANES criteria
P value
Yes No Yes No
Number 1,462 (11.2) 381 (2.9) 480 (3.7) 10,683 (82.1)
Age, yr 63.5±9.5 61.1±10.9 60.8±10.5 56.3±10.9 <0.001
Age ≥65 years 746 (51.0) 152 (39.9) 190 (39.6) 2,718 (25.4) <0.001
Male sex 799 (54.7) 190 (49.9) 174 (36.3) 4,702 (44.0) <0.001
Height, cm 156.6±9.0 156.4±9.4 156.4±9.4 156.8±9.2 <0.001
Weight, kg 60.3±10.7 60.8±11.3 60.8±11.3 57.5±10.7 <0.001
BMI, kg/m2 25.0±3.0 25.3±3.2 25.3±3.2 23.8±2.9 <0.001
Household income <0.001
Quartile 1 (lowest) 425 (29.1) 103 (27.0) 116 (24.2) 1,681 (15.7)
Quartile 2 416 (28.5) 96 (25.2) 130 (27.1) 2,580 (24.2)
Quartile 3 307 (21.0) 106 (27.8) 118 (24.6) 2,994 (28.0)
Quartile 4 (highest) 314 (21.5) 76 (20.0) 116 (24.2) 3,428 (32.1)
Education duration, yr <0.001
<6 602 (41.2) 157 (41.2) 188 (39.2) 2,757 (25.8)
6–9 268 (18.3) 62 (16.3) 72 (15.0) 1,479 (13.8)
10–12 376 (25.7) 97 (25.5) 151 (31.5) 3,514 (32.9)
≥13 216 (14.8) 65 (17.1) 69 (14.4) 2,933 (27.5)
Occupation (yes) 836 (57.2) 229 (60.1) 281 (58.5) 7,415 (69.4) <0.001
Smoking <0.001
Current 290 (19.8) 78 (20.5) 75 (15.6) 1,761 (16.5)
Ex-smoker 425 (29.1) 87 (22.8) 80 (16.7) 2,339 (21.9)
Non-smoker 747 (51.1) 216 (56.7) 325 (67.7) 6,583 (61.6)
Alcohol consumption <0.001
Heavy 138 (9.4) 31 (8.1) 37 (7.7) 755 (7.1)
Mild 797 (54.5) 213 (55.9) 288 (60.0) 6,986 (65.4)
None 527 (36.1) 137 (36.0) 155 (32.3) 2,942 (27.5)
Hypertension (yes) 909 (62.2) 217 (57.0) 203 (42.3) 3,603 (33.7) <0.001
Dyslipidemia (yes) 203 (13.9) 75 (19.7) 72 (15.0) 1,212 (11.4) <0.001
CKD (yes) 120 (8.2) 25 (6.6) 14 (2.9) 238 (2.2) <0.001
Laboratory findings
Fasting glucose, mg/dL 141±40 123±24 100±11 95±9 <0.001
HbA1c, % 8.3±2.1 7.2±1.3 5.9±0.7 5.6±0.6 <0.001
Total cholesterol, mg/dL 181±41 203±40 199±42 195±34 <0.001
HDL-cholesterol, mg/dL 45±11 47±12 49±12 50±12 <0.001
Creatinine, mg/dL 0.88±0.26 0.86±0.22 0.83±0.36 0.83±0.23 <0.001
eGFR, mL/min/1.73 m2 86.8±19.8 87.9±18.5 89.0±16.9 90.4±16.2 <0.001

Values are presented as number (%) or mean±standard deviation.

KNHANES, Korea National Health and Nutrition Examination Survey; NHIS, National Health Insurance System; BMI, body mass index; CKD, chronic kidney disease; HbA1c, glycosylated hemoglobin; HDL, high-density lipoprotein; eGFR, estimated glomerular filtration rate.

Table 3.
Sensitivity, specificity, predictive value, accuracy, and agreement of the operational definition of type 2 diabetes mellitus based on NHIS criteria (algorithm 1) compared to KNHANES criteria as a standard reference
Total T2DM
Sensitivity Specificity PPV NPV Accuracy Kappa
Overall 13,006 1,462 480 10,683 381 0.79 (0.77–0.81) 0.96 (0.95–0.96) 0.75 (0.73–0.77) 0.97 (0.96–0.97) 0.93 (0.93–0.94) 0.73 (0.72–0.75)
Age, yr
40–65 9,200 716 290 7,965 229 0.76 (0.73–0.78) 0.96 (0.96–0.97) 0.71 (0.68–0.74) 0.97 (0.97–0.98) 0.94 (0.94–0.95) 0.70 (0.68–0.73)
≥65 3,806 746 190 2,718 152 0.83 (0.81–0.86) 0.93 (0.93–0.94) 0.80 (0.77–0.82) 0.95 (0.94–0.96) 0.91 (0.90–0.92) 0.75 (0.73–0.78)
Male 5,865 799 174 4,702 190 0.81 (0.78–0.83) 0.96 (0.96–0.97) 0.82 (0.80–0.85) 0.96 (0.96–0.97) 0.94 (0.93–0.94) 0.78 (0.76–0.80)
Female 7,141 663 306 5,981 191 0.78 (0.75–0.80) 0.95 (0.95–0.96) 0.68 (0.65–0.71) 0.97 (0.96–0.97) 0.93 (0.92–0.94) 0.69 (0.66–0.71)
Q1 2,325 425 116 1,681 103 0.80 (0.77–0.84) 0.94 (0.92–0.95) 0.79 (0.75–0.82) 0.94 (0.93–0.95) 0.91 (0.89–0.92) 0.73 (0.70–0.77)
Q2–4 10,681 1,037 364 9,002 278 0.79 (0.77–0.81) 0.96 (0.96–0.97) 0.74 (0.72–0.76) 0.97 (0.97–0.97) 0.94 (0.94–0.94) 0.73 (0.71–0.75)
Education, yr
<9 5,585 870 260 4,236 219 0.80 (0.78–0.82) 0.94 (0.94–0.95) 0.77 (0.75–0.79) 0.95 (0.94–0.96) 0.91 (0.91–0.92) 0.73 (0.71–0.75)
≥9 7,421 592 220 6,447 162 0.79 (0.76–0.81) 0.97 (0.96–0.97) 0.73 (0.70–0.76) 0.98 (0.97–0.98) 0.95 (0.94–0.95) 0.73 (0.70–0.75)
Yes 4,932 909 203 3,603 217 0.81 (0.78–0.83) 0.95 (0.94–0.95) 0.82 (0.79–0.84) 0.94 (0.94–0.95) 0.91 (0.91–0.92) 0.76 (0.74–0.78)
No 8,074 553 277 7,080 164 0.77 (0.74–0.80) 0.96 (0.96–0.97) 0.67 (0.63–0.70) 0.98 (0.97–0.98) 0.95 (0.94–0.95) 0.69 (0.66–0.71)
Yes 1,562 203 72 1,212 75 0.73 (0.68–0.78) 0.94 (0.93–0.96) 0.74 (0.69–0.79) 0.94 (0.93–0.95) 0.91 (0.89–0.92) 0.68 (0.63–0.73)
No 11,444 1,259 408 9,471 306 0.80 (0.78–0.82) 0.96 (0.95–0.96) 0.75 (0.73–0.78) 0.97 (0.97–0.97) 0.94 (0.93–0.94) 0.74 (0.73–0.76)

Values are presented as point estimate (95% confidence interval).

NHIS, National Health Insurance System; KNHANES, Korea National Health and Nutrition Examination Survey; TP, true positive; FP, false positive; TN, true negative; FN, false negative; PPV, positive predictive value; NPV, negative predictive value; Q1, lowest quartile; Q2–4, second to the fourth quartile.

  • 1. Cascini S, Agabiti N, Davoli M, Uccioli L, Meloni M, Giurato L, et al. Survival and factors predicting mortality after major and minor lower-extremity amputations among patients with diabetes: a population-based study using health information systems. BMJ Open Diabetes Res Care 2020;8:e001355.ArticlePubMedPMC
  • 2. Choi Y, Choi JW. Association of sleep disturbance with risk of cardiovascular disease and all-cause mortality in patients with new-onset type 2 diabetes: data from the Korean NHISHEALS. Cardiovasc Diabetol 2020;19:61.ArticlePubMedPMCPDF
  • 3. Jung I, Kwon H, Park SE, Han KD, Park YG, Kim YH, et al. Increased risk of cardiovascular disease and mortality in patients with diabetes and coexisting depression: a nationwide population-based cohort study. Diabetes Metab J 2021;45:379-89.ArticlePubMedPDF
  • 4. Kim MK, Han K, Lee SH. Current trends of big data research using the Korean National Health Information Database. Diabetes Metab J 2022;46:552-63.ArticlePubMedPMCPDF
  • 5. Ko SH, Han K, Lee YH, Noh J, Park CY, Kim DJ, et al. Past and current status of adult type 2 diabetes mellitus management in Korea: a National Health Insurance Service Database Analysis. Diabetes Metab J 2018;42:93-100.ArticlePubMedPMCPDF
  • 6. Jagannathan R, Neves JS, Dorcely B, Chung ST, Tamura K, Rhee M, et al. The oral glucose tolerance test: 100 years later. Diabetes Metab Syndr Obes 2020;13:3787-805.PubMedPMC
  • 7. Lee YH, Han K, Ko SH, Ko KS, Lee KU; Taskforce Team of Diabetes Fact Sheet of the Korean Diabetes Association. Data analytic process of a nationwide population-based study using National Health Information Database established by National Health Insurance Service. Diabetes Metab J 2016;40:79-82.ArticlePubMedPMCPDF
  • 8. Jo SH, Nam H, Lee J, Park S, Lee J, Kyoung DS. Fenofibrate use is associated with lower mortality and fewer cardiovascular events in patients with diabetes: results of 10,114 patients from the Korean National Health Insurance Service Cohort. Diabetes Care 2021;44:1868-76.ArticlePubMedPDF
  • 9. Jeong JS, Kim JS, Yeom SW, Lee MG, You YS, Lee YC. Prevalence and comorbidities of bronchiolitis in adults: a population-based study in South Korea. Medicine (Baltimore) 2022;101:e29551.PubMedPMC
  • 10. Kim J, Yang PS, Park BE, Kang TS, Lim SH, Cho S, et al. Association of proteinuria and incident atrial fibrillation in patients with diabetes mellitus: a population-based senior cohort study. Sci Rep 2021;11:17013.ArticlePubMedPMCPDF
  • 11. Hong JS, Kang HC. Body mass index and all-cause mortality in patients with newly diagnosed type 2 diabetes mellitus in South Korea: a retrospective cohort study. BMJ Open 2022;12:e048784.ArticlePubMedPMC
  • 12. Kim JE, Choi J, Park J, Shin A, Choi NK, Choi JY. Effects of menopausal hormone therapy on cardiovascular diseases and type 2 diabetes in middle-aged postmenopausal women: analysis of the Korea National Health Insurance Service Database. Menopause 2021;28:1225-32.ArticlePubMed
  • 13. Lee CJ, Hwang J, Lee YH, Oh J, Lee SH, Kang SM, et al. Blood pressure level associated with lowest cardiovascular event in hypertensive diabetic patients. J Hypertens 2018;36:2434-43.ArticlePubMed
  • 14. Lee SE, Kim KA, Son KJ, Song SO, Park KH, Park SH, et al. Trends and risk factors in severe hypoglycemia among individuals with type 2 diabetes in Korea. Diabetes Res Clin Pract 2021;178:108946.ArticlePubMed
  • 15. Bae JH, Han KD, Ko SH, Yang YS, Choi JH, Choi KM, et al. Diabetes fact sheet in Korea 2021. Diabetes Metab J 2022;46:417-26.ArticlePubMedPMCPDF
  • 16. Kim J, Bae YJ, Lee JW, Kim YS, Kim Y, You HS, et al. Metformin use in cancer survivors with diabetes reduces all-cause mortality, based on the Korean National Health Insurance Service between 2002 and 2015. Medicine (Baltimore) 2021;100:e25045.ArticlePubMedPMC
  • 17. Park JH, Ha KH, Kim BY, Lee JH, Kim DJ. Trends in cardiovascular complications and mortality among patients with diabetes in South Korea. Diabetes Metab J 2021;45:120-4.ArticlePubMedPDF
  • 18. Khokhar B, Jette N, Metcalfe A, Cunningham CT, Quan H, Kaplan GG, et al. Systematic review of validated case definitions for diabetes in ICD-9-coded and ICD-10-coded data in adult populations. BMJ Open 2016;6:e009952.ArticlePubMedPMC
  • 19. Kweon S, Kim Y, Jang MJ, Kim Y, Kim K, Choi S, et al. Data resource profile: the Korea National Health and Nutrition Examination Survey (KNHANES). Int J Epidemiol 2014;43:69-77.ArticlePubMedPMC
  • 20. Lee HY, Shin J, Kim GH, Park S, Ihm SH, Kim HC, et al. 2018 Korean Society of Hypertension guidelines for the management of hypertension: part II-diagnosis and treatment of hypertension. Clin Hypertens 2019;25:20.ArticlePubMedPMCPDF
  • 21. Rhee EJ, Kim HC, Kim JH, Lee EY, Kim BJ, Kim EM, et al. 2018 Guidelines for the management of dyslipidemia. Korean J Intern Med 2019;34:723-71.ArticlePubMedPMCPDF
  • 22. Levin A, Stevens PE. Summary of KDIGO 2012 CKD guideline: behind the scenes, need for guidance, and a framework for moving forward. Kidney Int 2014;85:49-61.ArticlePubMed
  • 23. Dufour MC. What is moderate drinking?: defining “drinks” and drinking levels. Alcohol Res Health 1999;23:5-14.PubMedPMC
  • 24. Rosenberg DE, Bull FC, Marshall AL, Sallis JF, Bauman AE. Assessment of sedentary behavior with the International Physical Activity Questionnaire. J Phys Act Health 2008;5 Suppl 1:S30-44.ArticlePubMed
  • 25. Simundic AM. Measures of diagnostic accuracy: basic definitions. EJIFCC 2009;19:203-11.PubMedPMC
  • 26. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-74.ArticlePubMed
  • 27. Chiu CJ, Huang HM, Lu TH, Wang YW. National health data linkage and the agreement between self-reports and medical records for middle-aged and older adults in Taiwan. BMC Health Serv Res 2018;18:917.ArticlePubMedPMCPDF
  • 28. Nishioka Y, Takeshita S, Kubo S, Myojin T, Noda T, Okada S, et al. Appropriate definition of diabetes using an administrative database: a cross-sectional cohort validation study. J Diabetes Investig 2022;13:249-55.ArticlePubMedPDF
  • 29. Lipscombe LL, Hwee J, Webster L, Shah BR, Booth GL, Tu K. Identifying diabetes cases from administrative data: a population-based validation study. BMC Health Serv Res 2018;18:316.ArticlePubMedPMCPDF

Figure & Data



    Citations to this article as recorded by  
    • Metabolic dysfunction-associated fatty liver disease increases the risk of type 2 diabetes mellitus in young Korean adults
      Junchul Ha, Oak-Kee Hong, Kyungdo Han, Hyuk-Sang Kwon
      Diabetes Research and Clinical Practice.2024; 212: 111584.     CrossRef
    • Gamma‐glutamyl transferase and the risk of all‐cause and disease‐specific mortality in patients with diabetes: A nationwide cohort study
      Goh Eun Chung, Su‐Min Jeong, Su Jong Yu, Jeong‐Ju Yoo, Yuri Cho, Kyu‐na Lee, Dong Wook Shin, Yoon Jun Kim, Jung‐Hwan Yoon, Kyungdo Han, Eun Ju Cho
      Journal of Diabetes.2024;[Epub]     CrossRef
    • Risk of Cause-Specific Mortality across Glucose Spectrum in Elderly People: A Nationwide Population-Based Cohort Study
      Joonyub Lee, Hun-Sung Kim, Kee-Ho Song, Soon Jib Yoo, Kyungdo Han, Seung-Hwan Lee
      Endocrinology and Metabolism.2023; 38(5): 525.     CrossRef

    • PubReader PubReader
    • ePub LinkePub Link
    • Cite
      export Copy
      Download Citation
      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
      • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
      • Citation for the content below
      Comparison of Operational Definition of Type 2 Diabetes Mellitus Based on Data from Korean National Health Insurance Service and Korea National Health and Nutrition Examination Survey
      Diabetes Metab J. 2023;47(2):201-210.   Published online February 8, 2023
    • XML DownloadXML Download
    Related articles

    Diabetes Metab J : Diabetes & Metabolism Journal