ABSTRACT
-
Background
- Diabetic kidney disease (DKD) progresses to end-stage renal disease more rapidly than chronic kidney disease due to persistent hyperglycemia and early activation of multiple pathways. Early detection of DKD is crucial to identify subtle kidney damage before clinical symptoms appear.
-
Methods
- This study combined human serum proteomics and public single-cell RNA sequencing and spatial transcriptomics data from diabetic kidneys to identify key biomarkers for DKD diagnosis. These biomarkers were validated in multiple organs of db/db mice at early and advanced stages. In a discovery cohort, sera from 173 healthy adults and 444 type 2 diabetes mellitus (T2DM) patients, with or without kidney disease, were analyzed using metabolomics and enzyme-linked immunosorbent assay (ELISA). Multiple machine learning algorithms were developed to integrate synergistic biomarkers and serum metabolites for DKD early detection, with results validated in 435 participants from four independent clinical cohorts.
-
Results
- Metalloproteinase-7 (MMP-7) and tenascin C (TNC) were elevated in human diabetic kidneys at the single-cell and spatial levels. Proteomics indicated upregulation of serum amyloid A1 (SAA1) and TNC in DKD patients’ serum. In db/db mice, all three biomarkers increased in multiple organs by 18 weeks of age. In DKD patient sera, MMP-7 and TNC levels were consistently elevated across cohorts. The new algorithms combining MMP-7, SAA1, and TNC enhanced early-stage DKD detection, with about 13% improvements in accuracy when serum metabolites were included to distinguish the progression from early to advanced stages after DKD.
-
Conclusion
- Integrating synergistic biomarkers with serum metabolomics enhances early detection of DKD, potentially improving outcomes by slowing disease progression in T2DM patients.
-
Keywords: Biomarkers; Diabetes mellitus, type 2; Diabetic nephropathies; Machine learning; Metabolomics
GRAPHICAL ABSTRACT
Highlights
- • Multi-center cohorts reveal dynamic MMP-7/SAA1/TNC changes across DKD stages.
- • Machine learning improves DKD stage prediction with protein biomarkers.
- • Integrating metabolites further enhances DKD prediction accuracy.
INTRODUCTION
- By 2021, an estimated 537 million adults were living with diabetes worldwide [1], with the vast majority affected by type 2 diabetes mellitus (T2DM). This alarming figure is only set to rise, with projections indicating a staggering 643 million cases by 2030 and 783 million by 2045 if current trends continue [1]. Among the numerous complications that accompany T2DM, diabetic kidney disease (DKD) emerges as one of the most severe and prevalent [2]. DKD is not only the current leading cause of chronic kidney disease (CKD) and end-stage renal disease (ESRD), but it also significantly elevates the risk of cardiovascular events and mortality among diabetic patients [3,4]. However, a challenging point of DKD management is an early and precise identification the kidney damage occurrence.
- The progression of DKD is particularly aggressive, driven by relentless hyperglycemia and the early involvement of harmful pathways like hyperfiltration, inflammation, and oxidative stress [4,5]. These factors accelerate the decline in kidney function, often outpacing the progression seen in CKD from primary glomerular diseases. This rapid decline underscores the critical need for early DKD diagnosis to detect subtle kidney damage before it becomes irreversible. In clinical practice, DKD is typically screened and monitored through the urinary albumin-to-creatinine ratio (UACR), estimated glomerular filtration rate (eGFR), and, in some cases, renal biopsy. While these methods are valuable, they often miss early-stage disease due to the subtlety of initial symptoms, leading to diagnoses only after significant progression. Furthermore, not all T2DM patients with abnormal albuminuria or impaired renal function progress to ESRD, highlighting the complexity and variability of DKD progression [6]. This variability has driven a decades-long search for novel blood and urine biomarkers closely linked to DKD pathogenesis [7–9]. Notably, biomarkers associated with extracellular matrix (ECM) turnover, inflammation, and oxidative stress in DKD progression—such as metalloproteinase-7 (MMP-7) [10], serum amyloid A1 (SAA1) [11], and tenascin C (TNC) [12]—have emerged as promising tools for improving diagnostic and prognostic accuracy. Accumulating evidence, including our own findings, has shown that these proteins are consistently upregulated and have played a prominent role in the pathogenesis and progression of various non-DKD over the past decades [13–17]. However, the ability of these biomarkers, alone or in combination, to predict DKD early remains unproven in robust clinical trials.
- Besides, serum metabolomics has emerged as a powerful complementary approach for DKD diagnosis [18–20]. It involves the comprehensive analysis of small molecules within biological samples, offering the ability to detect subtle metabolic changes that precede overt clinical symptoms. Employing metabolomics, our previous study has profiled the landscapes of serum metabolites in different stages of DKD [18], but the capacity of these metabolites in predicting the occurrence of kidney damage after DKD was not satisfying. In this study, by integrating serum metabolomics with established biomarkers, we aim to develop robust diagnostic machine learning models that can detect DKD at its earliest stages. This integrated approach not only enhances the predictive power of conventional biomarkers but also paves the way for earlier intervention, ultimately leading to better outcomes for DKD patients.
METHODS
- Cohort recruitment
- Five cross-sectional cohorts of DKD patients were established, including one discovery cohort in the primary center (the Affiliated Hospital of Nanjing University of Chinese Medicine) and four external testing cohorts in the 3rd Xiangya Hospital of the Central South University (C.S.), Tongji Hospital of Tongji University (T.J.), Nanfang Hospital (N.F.), and Zhongda Hospital (Z.D.). Of note, the discovery cohort (173 healthy controls [HCs], 277 T2DM, 98 diabetic kidney disease at an early stage [DKD-E], and 69 diabetic kidney disease at an advanced stage [DKD-A] inpatients enrolled from the primary center) and two external cross-sectional cohorts, C.S. (29 HCs, 29 T2DM, 20 DKD-E, and 21 DKD-A) and T.J. (34 HCs, 39 T2DM, 25 DKD-E, and 24 DKD-A), were established in our previous study [18]. N.F. (30 HCs, 17 T2DM, 4 DKD-E, and 33 DKD-A) and Z.D. (79 T2DM, 36 DKD-E, and 15 DKD-A) cohorts were newly recruited for the current study. The same inclusion and exclusion criteria were applied as we previously reported [18].
- Ethics compliance statement
- This investigator-initiated clinical trial was conducted in accordance with the principles of the 1975 Declaration of Helsinki. Approval was obtained from the Ethics Committees of the First Affiliated Hospital of Nanjing University of Traditional Chinese Medicine (2019NL-109-02). The clinical trial was registered with the Chinese Clinical Trial Registry (ChiCTR 2000028949). In this cross-sectional study, additional serum samples were collected after routine clinical biochemical tests. All participants provided informed consent for the academic use of these extra samples.
- Inclusion/exclusion criteria
- Inclusion criteria include (1) age between 20 and 75 years; (2) diagnostic criteria of T2DM (all patients) with a disease history more than 10 years; (3) DKD based on microalbuminuria and macroalbuminuria diagnostic criteria; (4) eGFR ≥90 mL/min/1.73 m2 for T2DM group, and eGFR above 30 mL/min/1.73 m2 in both microalbuminuria and macroalbuminuria groups; (5) blood pressure <140/90 mm Hg; and (6) signed informed consent. Exclusion criteria include (1) presence of primary kidney disease with a definite diagnosis; (2) other systemic diseases known to cause proteinuria; (3) acute diabetic complications or urinary tract infection in the past month; (4) coexisting serious primary diseases involving the cardiovascular, cerebrovascular, hepatic, renal, and the hematopoietic systems, or malignancy; (5) presence of mental illness and unable to cooperate; (6) pregnant or lactating women, or those planning pregnancy; (7) women currently in their menstrual period; and (8) those who have participated in other clinical trials within the past month.
- Public data mining
- This study analyzed data from two published sources (GSE 131882 and GSE261545) [21–23]. Data from study GSE131882 were obtained from the Gene Expression Omnibus (GEO) database, with single-cell RNA-seq data processed using the Seurat R package (Satija Lab, New York, NY, USA) [24,25]. Based on the gene-by-cell expression matrix, dimension reduction and cell visualization were carried out using the uniform manifold approximation and projection (UMAP) algorithm, where UMAP_1 and UMAP_2 represent the first and second reduced dimensions. Each dot on the UMAP plot represents an individual cell, colored by either library or marker expression level. Data GSE261545 were collected from the GEO database, with spatial transcriptomics data on the kidney specimen from a patient with CKD because of diabetic and hypertensive nephropathy [23]. Raw data was pre-processed by the Seurat R package. Spatial feature plots were employed to visualize the gene expression within the tissue slide.
- Human kidney biopsy specimens
- Human kidney specimens were obtained from diagnostic renal biopsies performed at the Affiliated Hospital of Nanjing University of Chinese Medicine. Non-tumor kidney tissue samples from patients who had renal cell carcinoma and underwent nephrectomy were used as normal controls. All studies involving human kidney sections were approved by the Institutional Review Board at the Affiliated Hospital of Nanjing University of Chinese Medicine. Immunohistochemical staining was performed using human samples as previously described [26]. The primary antibodies used were anti-MMP-7 (GTX11716, GeneTex, Irvine, CA, USA), anti-TNC (ab108930), and anti-SAA1 (ab199030, Abcam Inc., Cambridge, MA, USA).
- Mouse models and Western blot analysis
- The heart, kidney, and liver tissues from male, 18-week-old and 24-week-old, lean controls and BKS-Leprem2Cd479/Nju (db/db) mice were generated provided by Dr. Yuanyuan Wang at the Guizhou Medical University (Guiyang, China). Originally, clean grade 8-week-old db/db mice were purchased from the Institute of Model Animals, Nanjing University. These mice are commonly used as a model for T2DM and organ complications due to their obesity, hyperglycemia, and insulin resistance. All mice were housed in a temperature-controlled room (22°C) under a 12-hour light/12-hour dark cycle, with free access to water and food.
- These harvested tissues were lysed with radioimmune precipitation assay (RIPA) buffer containing 1% NP-40, 0.1% sodium dodecyl sulfate (SDS), 100 μg/mL phenylmethanesulfonyl fluoride (PMSF), 1% protease inhibitor cocktail, and 1% phosphatase I and II inhibitor cocktail (Cell Signaling Technology, Danvers, MA, USA) in phosphate buffer saline (PBS) on ice. The supernatants were collected after centrifugation at 13,000 ×g at 4°C for 15 minutes. Protein expression was analyzed by western blot analysis as described previously [27]. In brief, the samples were separated by 12% sodium dodecyl sulfate polyacrylamide gel electrophoresis and electrotransferred to polyvinylidene difluoride (PVDF) membrane (Microporous, Piney Flats, TN, USA). Subsequently, the membrane was blocked for 2 hours and probed with primary antibody at 4°C for 16 hours. The chemiluminescence signal was detected after incubation with the second antibody for 1 hour. Finally, ImageJ software (National Institutes of Health, Bethesda, MD, USA) was used for analysis. The primary antibodies used were anti-MMP-7 (GTX11716, GeneTex), anti-TNC (ab108930), anti-SAA1 (ab207445, Abcam Inc.), and anti-β-actin (sc-130065, Santa Cruz Biotechnology, Dallas, TX, USA).
- Serum proteomic analysis
- As described in our previous study [18], proteomics data were collected across 12 libraries, with three libraries for each of the four groups: HC, T2DM, DKD-E, and DKD-A. After filtering out missing values, 581 proteins were ultimately used for analysis.
- Serum metabolomics
- Serum samples were thawed on ice, and 40 μL of plasma was mixed with 225 μL of ice-cold methanol, then vortexed. After adding 750 μL of cold methyl tert-butyl ether (MTBE), the mixture was shaken at 4°C for 10 minutes. Next, 188 μL of liquid chromatograph/mass spectrometer (LC/MS)-grade water was added, followed by centrifugation at 14,000 rcf for 2 minutes at 4°C. The lower layer was transferred, mixed with 750 μL of methanol (1:1), shaken for 10 minutes, and centrifuged again. The supernatant (475 μL) was dried in a SpeedVac at 45°C for 2 hours. Dried aliquots were combined with an internal standard and methoxyamine hydrochloride, vortexed, and shaken. Bis-(trimethylsilyl)trifluoroacetamide (BSTFA) was then added, and the mixture was shaken before transferring to a vial for gas chromatography-mass spectrometry (GC-MS) analysis. Quality control (QC) samples were prepared by pooling aliquots from all samples, with a QC run every 10 injections. The analysis was performed on a Trace 1310 GC coupled to a TSQ 8000 mass spectrometer (Thermo Fisher, Waltham, MA, USA) [28]. A standard n-alkane mixture was used to correct retention time shifts. The extraction of metabolomics raw data was previously reported [18]. In total, 349 metabolites were identified in the discovery cohort. After raw data pre-processing and filtering out exogenous metabolites, eventually 207 metabolites were measured and used for further analysis.
- Enzyme-linked immunosorbent assay and machine learning prediction
- Enzyme-linked immunosorbent assay (ELISA) kits for human MMP-7, SAA1, and TNC were purchased from R&D Systems (Minneapolis, MN, USA) or Immuno-Biological Laboratories (Fujioka, Japan). The assays utilized a quantitative sandwich enzyme immunoassay technique. Microplates were pre-coated with monoclonal antibodies specific to these proteins. Standards and samples were added to the wells, where the target proteins bound to the immobilized antibodies. After washing away unbound substances, an enzyme-linked polyclonal antibody specific to the proteins was added. Following a final wash, a substrate solution was introduced, and color developed proportionally to the protein levels. The reaction was stopped, and the color intensity was measured, with serum protein levels expressed in nanograms per milliliter. For pairwise comparisons, the associations between individual ELISA markers and the disease status were analyzed by the receiver operating characteristic (ROC) curves, where the area under the curve (AUC) was calculated using the R package ‘pROC’ [29]. For the prediction using two or three ELISA markers, 5-fold cross-validation (CV) was first performed on the primary center using four machine learning methods: linear discriminant analysis (LDA), support vector machine (SVM), random forest (RF), and logistic regression (Logi) [30], applied through the R packages ‘MASS,’ ‘e1071,’ and ‘randomForest.’ Next, to test the new samples, machine learning models optimized at the CV were trained by the whole primary center and then applied to the four external centers for performance evaluation.
- Prediction evaluation
- Prediction results were compared with the true disease status of the patients. True positive (TP) is defined as the case that is correctly predicted as positive; true negative (TN) is defined as the case that is correctly predicted as negative; false positive (FP) is defined as the case that is wrongly predicted as positive, and false negative (FN) is defined as the case that is wrongly predicted as negative. Accuracy, sensitivity, specificity, and Youden index are defined as follows,
- Integration of ELISA and metabolomics data
- ELISA and metabolite features were pooled together for pairwise and multi-outcome prediction analysis. From the primary center, in total 133 healthy, 276 T2DM, 49 DKD-E, and 68 DKD-A serum specimens have both ELISA and metabolomics markers measured. Similar machine learning algorithms were used as described in the ELISA data analysis. Top differential expression metabolomics were selected as described in the previous study [18]. For all the disease pairwise and multi-outcome predictions, the three ELISA markers, the differentially expressed metabolomics markers, or both feature sets were integrated by the machine learning classifiers in the 5-fold CV.
- Association between ELISA and clinical features
- To test the association of the clinical features with the disease status, the ROC curves and AUC values were analyzed using the R package ‘pROC’ [29]. Additionally, the linear regression models were established to check the association between ELISA markers and the clinical features that were highly correlated with DKDs:
- The clinical features include blood glucose, fasting C-peptide (FC), glycosylated hemoglobin (HbA1c), UACR, urinary microalbumin (UmALB), eGFR, total cholesterol (TC), triglycerides (TG), low-density lipoprotein (LDL), high-density lipoprotein (HDL) and albumin. β, β, γ, and λ are the coefficients for the three ELISA markers and the intersection term, and ɛ is the error term. Higher similarity between the predicted and the true values indicates stronger association of the ELISA markers to predict these clinical features.
- Statistics
- All the statistical analyses and data visualization were performed by R programming with corresponding packages available. For the presented animal studies, all data were expressed as mean±standard error of the mean. Statistical analysis of the data was performed using GraphPad Prism 9 (GraphPad Software, San Diego, CA, USA). Comparison between two groups was made using a two-tailed Student’s t-test or the rank-sum test if data failed a normality test. Results are presented in dot plots, with dots denoting individual values. Wherever applied, P≤0.05 (or false discovery rate [FDR]=5% for multiple hypothesis testing) was used to define significance. For the metabolomics subgroup, three comparisons were made per ELISA marker: HC vs. T2DM, T2DM vs. DKD-E, and DKD-E vs. DKE-A. The Bonferroni adjustment for each ELISA marker was performed when calculating the sample size. Per comparison, to control the FDR to be 10%, the Bonferroni method adjusted the probability of type I error per pairwise test to be 0.1/3=0.033. Sample size estimation indicated that a minimum of 23 samples per group was required to achieve 80% statistical power (power=0.8) with an adjusted type I error rate (β) of 0.033, using the non-parametric Mann-Whitney U test (Wilcoxon rank-sum test).
RESULTS
- Multi-omics reveal the expression patterns of MMP-7, SAA1, and TNC in DKD patients
- Amid DKD onset and progression, besides metabolic and hemodynamic factors, inflammatory and fibrotic processes also significantly contribute to the progressive deterioration of kidney function [4,5]. In serum, several prominent biomarkers have been recognized tightly linked to DKD pathogenesis, such as serum or urine matrix protein MMP-7 is increased in DKD and correlates with both cardiac function and renal filtration [10,31,32]; ECM glycoprotein TNC was found to be elevated and positively correlated with both UACR and hypertension [33]; and SAA was predictive of an increased risk of death and progression to ESRD in DKD patients [34,35]. Clearly, these biomarkers offer potential clues for predicting DKD before significant renal damage occurs. However, the predictive value of a single circulating biomarker in identifying early DKD remains a concern [36,37]. To figure out this issue, we first performed public data mining in early human DKD kidneys at single-cell and spatial levels. As shown in Supplementary Fig. 1A–C, single nucleus RNA sequencing (GSE131882) showed that MMP-7 and TNC are induced in the kidneys from early DKD patients [21]. MMP-7 was mainly produced by proximal tubules, while TNC was distributed in pericyte, endothelium, and podocyte. In a separate histopathological-based analysis of spatial transcriptomics data by the 10X Genomics Visium Platform on a human CKD kidney induced by diabetes and hypertension (GSE261545) [23], we also observed significantly increased MMP7 and slightly induced TNC, as shown in Supplementary Fig. 1D. Interestingly, in our serum tandem mass tags (TMT)-labeled proteomics [18], SAA1 and TNC were increased at the early stage of DKD, but SAA1 surprisingly showed a downregulated trend in T2DM patients, as presented in Supplementary Fig. 1E. Given the differential expression patterns of the above three markers in the diseased kidney and circulation of DKD patients, we further validated their expression in DKD patients’ biopsy specimens. As shown in Supplementary Fig. 1F, immunohistochemical staining showed that MMP-7 and SAA1 were markedly induced in the diseased tubules, while TNC was substantially induced in the interstitial compartment.
- Global regulation of MMP-7, SAA1, and TNC in multiple organs of db/db mice after kidney damage occurs
- Because MMP-7, SAA1, and TNC are secreted proteins, we next evaluated whether they could serve as circulating biomarkers for predicting DKD onset and have global regulatory function along with DKD progression. Identifying robust circulating markers with multi-organ involvement could significantly enhance early detection and risk stratification of DKD in clinical practice, especially in patients who lack clear symptoms at early stages. To this end, we assessed their expression in multiple target organs in lean controls and db/db mice at early (18 weeks of age) and advanced stages (24 weeks of age) after kidney damage occurs, given the limited availability of human tissue samples from patients with diabetes. As shown in Supplementary Fig. 2A–C, no matter in 18- or 24-week-old, Western blot assays demonstrated that all three biomarkers were upregulated in all T2DM targeting organs, including heart, kidney, and liver in db/db mice, compared with db lean mice. The densitometry data is presented in Supplementary Fig. 2D. These results reflect the feasibility of using circulating MMP-7, SAA1, and TNC for predicting DKD in the clinic.
- DKD patient cohort dynamics and multidimensional assessment
- Using our previously established cohorts [18], to enhance the capacity of MMP-7, SAA1, and TNC in DKD early prediction, we further added two additional clinical cohorts to the current study. In total, 1,052 serum samples were collected from five independent medical centers which were comprised of one primary center and four sub-centers (designated as C.S., T.J., N.F., and Z.D.), as illustrated in Fig. 1. Among the sub-centers, N.F. and Z.D. cohorts are newly recruited. Meanwhile, 34 additional serum samples for healthy adults were collected in the T.J. cohort. In each cohort, four groups of candidates were enrolled, including HC, T2DM patients (UACR <30 mg/g), DKD patients at the early stage (DKD-E, 30≤ UACR ≤300 mg/g), and DKD patients at the advanced stage (DKD-A, UACR >300 mg/g), according to the albuminuria category classified by the Kidney Disease: Improving Global Outcomes (KDIGO) Diabetes Work Group [38,39]. Of note, all recruited T2DM patients in this study have been clinically diagnosed for at least 10 years. We designed the participants recruited from the primary center as the discovery cohort, a total of 617 individuals were included. Meanwhile, 435 participants recruited from four sub-centers, C.S., T.J., N.F., and Z.D., were designed as independent external testing cohorts. The baseline demographic and clinical characteristics of the participants of the discovery cohort were described in Table 1. The discovery cohort was comprised of 323 males and 294 females, with ages ranging from 20 to 75 years old and body mass index ranging from 18.42 to 39.38 kg/m2. As presented in Table 1, there were few differences in the levels of HbA1c, blood glucose, TC, TG, LDL, and HDL in all enrolled patients. In contrast to T2DM and DKD-E patients, DKD-A patients had significantly decreased levels of albumin and eGFR and elevated levels of blood urea nitrogen (BUN), serum creatinine (Scr), and UACR.
- Newly developed machine learning algorithms optimize the combination of serum MMP-7, SAA1, and TNC in predicting early DKD
- To evaluate the predictive role of these biomarkers in distinguishing DKD stages in the clinic, we first measured serum MMP-7, SAA1, and TNC levels in the discovery cohort using ELISA. As shown in Fig. 2A–C, compared with HCs, MMP-7 was gradually increased along with DKD progression. Interestingly, as an acute-phase inflammation marker, SAA1 levels were unexpectedly decreased in T2DM patients, consistent with our proteomics findings presented in Supplementary Fig. 1E. However, its expression remained unchanged during the progression of kidney damage, regardless of whether the disease was at an early or advanced stage. In comparison, TNC exhibited an initial upregulation in T2DM patients, followed by a decrease at the early stage of DKD and a subsequent increase at the advanced stage. Partial least squares regression was performed on these three proteins at different stages of DKD. It was found that these proteins were able to split the four groups, as presented in Fig. 2D. Then, we applied these three markers to predict the stages of DKD. Five-fold CV was performed on all the pairwise predictions and four machine learning algorithms were employed, including LDA, SVM, RF, and logistic regression. As illustrated in the ROC curves in Fig. 2E, the four algorithms demonstrated comparable performance in distinguishing HC vs. T2DM and DKD-E vs. DKD-A. However, for the comparison between T2DM and DKD-E, LDA and Logistic regression outperformed SVM and RF. Compared to individual biomarkers, combination of two or three markers achieved higher AUC and accuracy values in disease prediction. Specifically, combining MMP-7 and SAA1 resulted in predictive accuracies ranging from 0.69 to 0.74 for HC vs. T2DM, 0.68 to 0.74 for T2DM vs. DKD-E, and 0.72 to 0.78 for DKD-E vs. DKD-A. When combining MMP7 and TNC, the accuracy ranged from 0.69 to 0.74 for HC vs. T2DM, 0.71 to 0.77 for T2DM vs. DKD-E, 0.75 to 0.77 for DKD-E vs. DKD-A. In comparison, the predictive performance was relatively lower when combining SAA1 and TNC. Of note, integrating all three biomarkers produced generally robust that was equal to or better than other combinations across all comparisons: 0.73 to 0.77 for HC vs. T2DM, 0.74 to 0.76 for T2DM and DKD-E, and 0.72 to 0.78 for DKD-E vs. DKD-A. Of particular interest, in the comparison between DKD-E and DKD-A, the combination of MMP-7 and SAA1 achieved an AUC of 0.79–0.80 and an accuracy of 0.72–0.78. Adding TNC produced a similar AUC range (0.78 to 0.80) and the same accuracy range (0.72 to 0.78). These findings suggest that while the current biomarker panel performs reasonably well, the inclusion of additional serum markers may further enhance predictive performance. The pairwise comparisons for the remaining groups, including HC vs. DKD-E, HC vs. DKD-A, and T2DM vs. DKD-A, were presented in Supplementary Fig. 3.
- External validation of MMP7, SAA1, and TNC in DKD early prediction
- To verify the diagnostic values of these three biomarkers in predicting DKD at an early stage, we performed external validations. First, we evaluated the predictive performance of two commonly used clinical markers, BUN and Scr, in distinguishing different stages of DKD. ROC analyses were performed for three pairwise group comparisons: T2DM vs. DKD-E, T2DM vs. DKD-A, and DKD-E vs. DKD-A across independent validation cohorts (C.S., T.J., and Z.D.). Interestingly, the results indicated the limited utility of BUN and Scr alone in detecting early-stage DKD, although they exhibited stronger performance in identifying advanced kidney dysfunction (Supplementary Fig. 4). The variability across cohorts also suggests the need for more robust, multi-marker approaches for early diagnosis. Therefore, Fig. 3A–D showed serum MMP-7, SAA1, and TNC levels in the recruited candidates at four different external medical centers. Similar to the primary center, MMP-7 showed increased trends along with DKD progression. In comparison, TNC was induced at early stage of DKD. Interestingly, SAA1 exhibited inconsistent expression patterns across four validation cohorts. Then, for each pairwise prediction, classifiers trained from the discovery cohort were applied to these validation cohorts to evaluate the model performance. As shown in Fig. 3E–G and Supplementary Fig. 5, the dot plots illustrated the probabilities of RF prediction results on each patient across the discovery and validation cohorts. The dash-line cutoff indicated the binary split of the prediction. The cutoff values were then applied to predict the early stage of DKD in external validation cohorts. For T2DM vs. DKD-E, as shown in Fig. 3H and Supplementary Table 1, prediction accuracy was 69.4% in C.S. cohort, 60.9% in T.J. cohort, 70.4% in Z.D. cohort. In N.F. cohort, the predictive accuracy increased as high as 90.5%, this variation might be due to the DKD-E sample sizes in this center were relatively small. In contrast, the prediction accuracy can reach 82% in C.S. cohort, 84.1% in T.J. cohort, 76% in N.F. cohort, and 85.1% in Z.D. cohort between T2DM and DKD-A. When comparing DKD-E and DKD-A, prediction accuracy reached 80.5% in the external C.S. cohort, 79.6% in T.J. cohort, and 67.6% in N.F. cohort. The detailed results of predictions performed in four external validation cohorts were presented in Supplementary Table 1, Supplementary Figs. 6–9. Collectively, the three selected protein biomarkers demonstrated a promising trend toward enhancing early DKD prediction accuracy.
- Integrating metabolomics with MMP7, SAA1, and TNC to enhance the predictive accuracy for DKD progression
- Considering the reliability of serum biomarkers in early DKD diagnosis [40,41], to fully enhance the power of prediction, we further mapped the patients whose serum sample was subjected to MMP-7, SAA1, and TNC measurement and metabolomics. Serum metabolomics was performed by using Trace 1310 Gas Chromatograph equipped with an AS 1310 autosampler connected to a TSQ 8000 triple quadrupole mass spectrometer, as we previously reported [18]. The detailed information of the identified metabolites is presented in Supplementary Table 2. A total of 526 candidates were enrolled with both metabolomics and ELISA measured. As illustrated in Fig. 4A, four different machine learning models were performed based on three protein biomarkers with or without serum metabolites. Their prediction results suggested that the combinational model improved not only the accuracy but also the stability of DKD diagnosis and prediction. In particular, in DKD-E vs. DKD-A, the accuracy increased about 13% by the RF model, although it has limited improvement in comparing T2DM and DKD-E. Take the prediction results generated by the RF model as an example, similarly, in the binary-outcome prediction (Fig. 4B), the combination of proteins and metabolites enhanced the prediction accuracy in HC vs. T2DM, HC vs. DKD-E, HC vs. DKD-A, DKD-E and DKD-A, but the accuracy for T2DM vs. DKD-E are not significantly improved than protein markers themselves. With regard to these results, at least, using non-invasive approaches to replace invasive methods for early identifying kidney damage in T2DM needs more evidence. From the perspective of the multi-outcome prediction, the contingency heatmap presented in Fig. 4C indicated the percentage of patients whose true label was shown in the row and whose prediction label was shown in the column. When compared with the multi-outcome prediction model using only the three ELISA markers or only the differentially expressed metabolites, the integration model combined both feature sets and achieved the best overall prediction accuracy. Specifically, the integration model successfully identified 85.3% of the healthy and 73.8% of the DKD-A patients. Additionally, it presents improved separation among the DM and DKD-E groups. This result indicates the promising future of diabetic status prediction by integrating multi-omics features.
- Linear regression to analyze the predictive values of MMP7, SAA1, and TNC in DKD
- In real applications, several clinical features are employed clinically for DKD diagnosis, including blood glucose, FC, HbA1c, UACR, UmALB, and eGFR. Specifically for the discovery cohort in the primary center, these features can predict the status of DKD progression. To check the association of the ELISA markers with these clinical indicators, linear regression analysis was performed using the integration of MMP7, SAA1, and TNC to predict the clinical features. As shown in Fig. 5A–D and Supplementary Fig. 10A–F, the predicted values by the three ELISA markers are highly associated with the true indicators, especially UACR and eGFR, representing the ELISA measures are consistent with clinical assessments. A well-calibrated model produces points lying close to the diagonal line, indicating strong agreement between predicted and observed values (Supplementary Fig. 10G). For most features, particularly HbA1c, eGFR, albumin, TG, and blood glucose, predictions align reasonably well with observations, supporting the reliability of the model. Larger deviations in variables like LDL and mALB suggest areas where prediction performance may be more variable. Of note, HbA1c exhibited lower performance in prediction, if compared to other routinely used clinical parameters we presented in the present study. Finally, we evaluated the predictive performance of each individual clinical feature alone and in combination with the three protein biomarkers using ROC curve analysis. As shown in Fig. 5E and F, combining clinical features including HbA1c, eGFR, BUN, and Scr significantly improved the AUC and accuracy in distinguishing between T2DM and DKD-E, performing comparably to the UACR. These findings further support the potential of the three selected protein biomarkers to enhance early-stage DKD prediction beyond conventional clinical parameters.
DISCUSSION
- As the leading cause of ESRD, DKD advances more rapidly than kidney damage resulting from primary glomerulonephritis, presenting significant challenges for early, differential, and precise diagnosis [42]. Traditional diagnostic methods, including clinical assessments, urinary protein analysis, renal function tests, imaging, and biopsies, remain the cornerstone of DKD diagnosis [4]. However, the emergence of new technologies and analytical techniques has led to substantial progress in understanding DKD pathogenesis. By integrating molecular data across various dimensions that closely relate to DKD onset and progression, and applying advanced machine learning algorithms, there is promising potential to enhance diagnostic accuracy [18,43].
- In light of these advancements, our study focuses on three key proteins MMP-7, SAA1, and TNC to evaluate their combined utility in predicting early DKD. Given the limitations of serum samples in DKD early diagnosis, we first confirmed that these proteins were substantially upregulated in the kidneys of patients with early DKD (Supplementary Fig. 1). In a db/db mouse model, we demonstrated that these indicators have global regulatory functions upon kidney damage (Supplementary Fig. 2), suggesting their potential as circulating biomarkers. However, it is important to acknowledge that db/db mice do not fully capture the complex pathophysiology of human DKD, particularly in terms of systemic inflammation and immune cell involvement. The mechanistic role of biomarker expression changes in non-renal organs remains unclear and requires further investigation beyond the concept of ‘multi-organ crosstalk.’ Nevertheless, to validate their clinical relevance, we established a discovery cohort and four validation cohorts across five medical centers (Fig. 1). By applying multiple machine learning algorithms, we found that the combined use of these serum markers significantly improves early DKD diagnosis compared to using a single marker (Fig. 2). Remarkably, integrating these biomarkers with differential metabolites improved diagnostic accuracy by nearly 13% for detecting DKD progression from early to advanced stages (Fig. 4), offering new targets and diagnostic strategies to prevent rapid DKD progression. Additionally, these biomarkers are consistent with the existing clinical assessments (Fig. 5), indicating promising potential for clinical diagnosis of diabetic diseases.
- DKD often progresses silently, with early symptoms being subtle, until irreversible damage has occurred. The disease’s pathogenesis involves a complex interplay of metabolic, hemodynamic, inflammatory, and fibrotic processes, which collectively contribute to the rapid progression to ESRD in diabetic patients [44]. MMP-7, a significant biomarker of DKD, is an enzyme crucial for ECM remodeling and tissue repair [15,16]. Its upregulation in response to hyperglycemia and other metabolic disturbances links MMP-7 directly to the underlying mechanisms of DKD [10]. Elevated serum or urinary MMP-7 levels can indicate early kidney injury before significant clinical symptoms or reductions in eGFR [31,45], making it a valuable tool for early diagnosis and risk stratification in patients with T2DM. Our study corroborates this, showing a gradual increase in MMP-7 levels with DKD progression across all cohorts, underscoring its reliability as a stable circulating marker (Figs. 2 and 3). However, the specificity of MMP-7 as a single biomarker is limited, given its role as a surrogate marker for non-diabetic CKD [16]. Of note, not limited to MMP-7, other members in MMP family such as MMP-10 also has been reported as a promising biomarker for DKD early detection [46].
- The ECM undergoes significant remodeling in DKD, characterized by increased production and decreased degradation of components like collagen and fibronectin, leading to ECM accumulation, glomerular basement membrane thickening, and interstitial fibrosis [47,48]. TNC, a glycoprotein involved in inflammation and fibrosis [13,14,49], is upregulated in response to chronic hyperglycemia in most cohorts, except in some cases where it exhibited a downregulated trend in DKD-E patients compared to T2DM patients (Figs. 2 and 3). Given that TNC is often localized at injury and fibrosis sites [14], its potential as a strong single circulating marker for early DKD prediction warrants further investigation. Additionally, serum TNC has been associated with increased cardiovascular events and mortality in individuals with T2DM [12].
- SAA1, another biomarker selected for this study, is an acute-phase inflammation protein primarily produced by the liver in response to pro-inflammatory cytokines [50], with potential local synthesis in the kidneys. SAA1’s expression pattern, showing downregulation in T2DM patients’ sera and upregulation upon kidney damage, underscores its relevance to DKD. Despite its utility in detecting early kidney dysfunction, our cohort studies revealed inconsistent SAA1 expression patterns across different medical centers. Previous studies have reported elevated SAA1 levels in diabetic patients with kidney complications [11], but some data suggest that higher circulating SAA concentrations may be associated with a reduced risk of ESRD in American Indians with T2DM [11]. This variability highlights the challenges of using SAA1 as a single biomarker for early DKD prediction. Of note, among the three ELISA-based markers, SAA1 did not show significant differences across disease groups, particularly between DKD-E and DKD-A (P=0.649). To achieve adequate statistical power for detecting potential group differences, a larger sample size will be considered in future studies.
- While individual biomarkers like MMP-7, SAA1, and TNC have shown promise in DKD diagnosis [10–12,31,32,35], their limitations in accuracy and specificity as standalone markers are evident. This study emphasizes the synergistic effects of combining these serum biomarkers to enhance early DKD prediction. Our analysis demonstrates that combining two or three biomarkers consistently outperforms the use of a single marker, reducing system errors and improving diagnostic performance. Among the combinations, integrating TNC with SAA1 yielded the least favorable results due to their variable serum expression in DKD patients. The inflammatory status of DKD individuals, especially with SAA1, is challenging to evaluate, even under strict glycemic control.
- To further reduce system errors, we integrated MMP-7, SAA1, and TNC with differentially expressed serum metabolites identified in our previous study [18]. Serum metabolomics, which analyzes metabolites in the blood, provides insights into early metabolic changes, tracks metabolite alterations, and reflects real-time kidney dysfunction. Specific metabolites associated with glucose metabolism, lipid metabolism, amino acid metabolism, oxidative stress, inflammation, and kidney function can be detected in serum, providing early indicators of renal injury. This allows for more personalized and timely adjustments to management strategies. By analyzing changes in metabolite levels, clinicians can gain insights into the functional impact of DKD on various biological pathways and organs. Combining metabolomic data with clinical and biochemical information provides a more comprehensive view of the patient’s condition, supporting better clinical decision-making. This integration significantly improved the diagnostic accuracy for distinguishing early and advanced DKD by nearly 13% (Fig. 4). However, predicting early kidney damage remains challenging, and the functional association between specific endogenous metabolites and DKD progression requires further exploration. In this study, we used all differentially expressed serum metabolites for integrative analysis may lead to the underestimation of their diagnostic values. Additionally, TNC, MMP-7, and SAA1 are functionally related to biological processes such as tubular injury, ECM remodeling, and inflammation, which overlap with the pathways captured by established markers like kidney injury molecule-1 (KIM-1), neutrophil gelatinase-associated lipocalin (NGAL), tumor necrosis factor receptors (TNF-Rs), and soluble urokinase plasminogen activator receptor (suPAR). For example, MMP-7 has been implicated in tubular epithelial cell injury similar to KIM-1, while SAA1, like TNF-Rs and suPAR, reflects systemic and renal inflammation, suggesting potential complementarity in multi-marker diagnostic models for DKD.
- Despite these advancements, our study has some limitations: (1) cross-platform and cross-species integration assumes conserved mechanisms, but differences in tissue context and disease dynamics may limit direct translatability; (2) urine samples were not included; (3) there is a lack of correlation analysis between biomarkers/metabolites and DKD pathology; (4) the cohorts are regional, necessitating validation in broader populations with varying diets and lifestyles; (5) unaddressed confounding factors such as medications, comorbidities, and ethnicity may influence biomarker levels and disease progression, which could not be fully accounted for due to the retrospective design. In addition, the sample size for the metabolomics subgroup is relatively small. In total, 133 healthy, 276 T2DM, 49 DKD-E, and 68 DKD-A serum samples have both ELISA and metabolomics markers measured. Per ELISA marker three comparisons were made: HC vs. T2DM, T2DM vs. DKD-E, and DKD-E vs. DKE-A. To control the FDR to be 10%, the Bonferroni method was employed to adjust the probability of type I error per pairwise test to be 0.1/3=0.033. Take MMP-7 HC vs. DM as an example, to reach power=0.8 and probability of type I error=0.033, it requires at least n=23 samples per group using non-parametric Mann-Whitney U or Wilcoxon rank-sum test. Our 276 T2DM and 49 DKD-E individuals can reach this power. Similar calculations were performed across all three markers for all the three pairwise comparisons. The results showed that both MMP-7 and TNC (HC vs. T2DM and T2DM vs. DKD-E) can reach the power, but SAA1 and TNC DKD-E vs. DKD-A are underpowered. For instance, when comparing SAA1 DKD-E and DKD-A, to reach power=0.8 and probability of type I error=0.1/3=0.033, it requires at least n= 15,635 individuals per group. More individuals are expected to be collected in the future study.
- In summary, our study demonstrated the potential of combining advanced technologies and algorithms to improve early DKD diagnosis. While precise pathology remains critical, the synergy between selected serum biomarkers and metabolites offers immense promise for transforming DKD clinical management. By focusing on early detection and personalized intervention, we can potentially alter the disease course, identify subtle kidney damage before clinical symptoms emerge, reduce DKD burden, and improve the quality of life for millions of diabetic patients worldwide.
SUPPLEMENTARY MATERIALS
Supplementary materials related to this article can be found online at https://doi.org/10.4093/dmj.2025.0193.
Supplementary Fig. 1.
Multi-omics reveals the expression patterns of metallopeptidase-7 (MMP-7), serum amyloid A1 (SAA1), and tenascin C (TNC) in diabetic kidney disease (DKD) patients. (A) Kidney single nucleus clustered by early DKD kidneys in human. Single nucleus RNA sequencing data were visualized by uniform manifold approximation and projection (UMAP) algorithm, where UMAP_1 and UMAP_2 represent the two reduced dimensions and each dot in the panel indicates a single cell. (B) Expression of MMP-7 and TNC at the single-cell level. (C) Dot plot of MMP-7 and TNC gene expression patterns for different kidney cell types. (D) Spatially resolved gene expression level in each spot for cell-specific marker genes: MMP-7 and TNC. (E) Serum proteomic analysis showing SAA1 and TNC expression in DKD patients at different stages. (F) Representative immunohistochemical images showing the distributions of MMP-7, SAA1, and TNC in kidney biopsy specimens from DKD patients. Arrows in the enlarged boxes indicate positive staining. Scale bar, 25 μm. TAL, thick ascending limb of the Henle’ loop; DCT, distal convoluted tubule; CNT, connecting tubule; PT, proximal tubul; PC, pericytes; ICA, type A intercalated cells; VCAM1, vascular cell adhesion molecule 1; FIB, fibroblast; PEC, pericytes; MES, mesenchymal cells; ICB, type B intercalated cells; ENDO, endothelial cells; PODO, podocytes; LEUK, leukocytes; T2DM, type 2 diabetes mellitus; DKD-E, diabetic kidney disease at early stage; DKD-A, diabetic kidney disease at advanced stage.
dmj-2025-0193-Supplementary-Fig-1.pdf
Supplementary Fig. 2.
Global regulations of metallopeptidase-7 (MMP-7), serum amyloid A1 (SAA1), and tenascin C (TNC) in db/db mice. (A-C) Western blot assay showing the expression of MMP-7, SAA1, and TNC in heart (A), kidney (B), and liver (C) in 18- and 24-week-old db/db mice, respectively. (D) The densitometry analysis was presented (n=4). aP<0.05.
dmj-2025-0193-Supplementary-Fig-2.pdf
Supplementary Fig. 3.
Pairwise prediction of different stages of diabetic kidney disease (DKD). Using a single marker or multiple markers, receiver operating characteristic (ROC) curves for each pairwise prediction, including healthy control (HC) vs. diabetic kidney disease at early stage (DKD-E), HC vs. diabetic kidney disease at advanced stage (DKD-A), and type 2 diabetes mellitus (T2DM) vs. DKD-A, by four different machine learning methods. MMP-7, matrix metallopeptidase-7; SAA1, serum amyloid A1; TNC, tenascin C; AUC, area under the curve; Acc, accuracy; LDA, linear discriminant analysis; SVM, support vector machine; RF, random forest.
dmj-2025-0193-Supplementary-Fig-3.pdf
Supplementary Fig. 4.
Using a single clinical marker, receiver operating characteristic (ROC) curves for each pairwise prediction, including type 2 diabetes mellitus (T2DM) vs. diabetic kidney disease at early stage (DKD-E), T2DM vs. diabetic kidney disease at advanced stage (DKD-A), and DKD-E vs. DKD-A. DKD-E, diabetic kidney disease at early stage; DKD-A, diabetic kidney disease at advanced stage; AUC, area under the curve; C.S., Central South University; T.J., Tongji Hospital of Tongji University; Z.D., Zhongda Hospital.
dmj-2025-0193-Supplementary-Fig-4.pdf
Supplementary Fig. 5.
Prediction of diabetic kidney disease (DKD) status via machine learning algorithms on metallopeptidase- 7 (MMP-7), serum amyloid A1 (SAA1), and tenascin C (TNC) expression. (A-C) Prediction probabilities based on the random forest algorithm. The best cutoff was trained from the discovery cohort and applied to validation cohorts for predicting healthy control (HC) vs. diabetic kidney disease at early stage (DKD-E), HC vs. diabetic kidney disease at advanced stage (DKDA), and type 2 diabetes mellitus (T2DM) vs. DKD-A. C.S., the 3rd Xiangya Hospital of the Central South University; N.F., Nanfang Hospital; T.J., Tongji Hospital of Tongji University; Z.D., Zhongda Hospital.
dmj-2025-0193-Supplementary-Fig-5.pdf
Supplementary Fig. 6.
Pairwise prediction of different stages of diabetic kidney disease (DKD) in the validation cohort (Central South University [C.S.]). Receiver operating characteristic (ROC) curves for healthy control (HC) vs. type 2 diabetes mellitus (T2DM), HC vs. diabetic kidney disease at early stage (DKD-E), HC vs. diabetic kidney disease at advanced stage (DKD-A), T2DM vs. DKD-E, T2DM vs. DKD-A, and DKD-E vs. DKD-A prediction by four different machine learning methods. Red line for linear discriminant analysis (LDA), blue line for support vector machine (SVM), orange line for random forest (RF), and green line for logistic regression (Logistic). MMP-7, matrix metallopeptidase-7; SAA1, serum amyloid A1; TNC, tenascin C.
dmj-2025-0193-Supplementary-Fig-6.pdf
Supplementary Fig. 7.
Pairwise prediction of different stages of diabetic kidney disease (DKD) in the validation cohort (Tongji Hospital of Tongji University [T.J.]). Receiver operating characteristic (ROC) curves for healthy control (HC) vs. type 2 diabetes mellitus (T2DM), HC vs. diabetic kidney disease at early stage (DKD-E), HC vs. diabetic kidney disease at advanced stage (DKDA), T2DM vs. DKD-E, T2DM vs. DKD-A, and DKD-E vs. DKD-A prediction by four different machine learning methods. Red line for linear discriminant analysis (LDA), blue line for support vector machine (SVM), orange line for random forest (RF), and green line for logistic regression (Logistic). MMP-7, matrix metallopeptidase-7; SAA1, serum amyloid A1; TNC, tenascin C.
dmj-2025-0193-Supplementary-Fig-7.pdf
Supplementary Fig. 8.
Pairwise prediction of different stages of diabetic kidney disease (DKD) in the validation cohort (Nanfang Hospital [N.F.]). Receiver operating characteristic (ROC) curves for healthy control (HC) vs. type 2 diabetes mellitus (T2DM), HC vs. diabetic kidney disease at early stage (DKD-E), HC vs. diabetic kidney disease at advanced stage (DKD-A), T2DM vs. DKD-E, T2DM vs. DKD-A, and DKD-E vs. DKD-A prediction by four different machine learning methods. Red line for linear discriminant analysis (LDA), blue line for support vector machine (SVM), orange line for random forest (RF), and green line for logistic regression (Logistic). MMP-7, matrix metallopeptidase-7; SAA1, serum amyloid A1; TNC, tenascin C.
dmj-2025-0193-Supplementary-Fig-8.pdf
Supplementary Fig. 9.
Pairwise prediction of different stages of diabetic kidney disease (DKD) in the validation cohort (Zhongda Hospital [Z.D.]). Receiver operating characteristic (ROC) curves for type 2 diabetes mellitus (T2DM) vs. diabetic kidney disease at early stage (DKD-E), T2DM vs. diabetic kidney disease at advanced stage (DKD-A), and DKD-E vs. DKD-A prediction by four different machine learning methods. Red line for linear discriminant analysis (LDA), blue line for support vector machine (SVM), orange line for random forest (RF), and green line for logistic regression (Logistic). MMP-7, matrix metallopeptidase-7; SAA1, serum amyloid A1; TNC, tenascin C.
dmj-2025-0193-Supplementary-Fig-9.pdf
Supplementary Fig. 10.
Linear regression analysis comparing the predicted value by metallopeptidase-7 (MMP-7), serum amyloid A1 (SAA1), and tenascin C (TNC) and the real clinical indicators. Blood glucose (A), total cholesterol (TC) (B), triglycerides (TG) (C), low-density lipoprotein (LDL) (D), high-density lipoprotein (HDL) (E), and albumin (F). (G) Calibration plots compare the predicted values from the model (x-axis) with the observed values (y-axis) for various clinical indicators, including glycosylated hemoglobin (HbA1c), estimated glomerular filtration rate (eGFR), albumin, blood urea nitrogen (BUN), serum creatinine (Scr), blood glucose, TC, TG, HDL, LDL, urinary microalbumin (UmALB), and urinary albumin-to-creatinine ratio (UACR).
dmj-2025-0193-Supplementary-Fig-10.pdf
NOTES
-
CONFLICTS OF INTEREST
No potential conflict of interest relevant to this article was reported.
-
AUTHOR CONTRIBUTIONS
Conception or design: D.Z., H.F.
Acquisition, analysis, or interpretation of data: all authors.
Drafting the work or revising: S.L. (Silvia Liu), D.Z., H.F.
Final approval of the manuscript: all authors.
-
FUNDING
This is an investigator-initiated clinical study. Work in the FH’s lab was supported by the National Key R&D Program of China grant (2022YFC2502504, 2022YFC2502500), the National Natural Science Foundation of China grants (92268112, 82570872), and Guangdong Basic and Applied Basic Research Foundation (2023A1515012389).
-
ACKNOWLEDGMENTS
We are grateful to the volunteered staff who assisted us in collecting serum samples at the five included medical centers.
-
DATA AVAILABILITY
Raw Mass Spectrometry data were deposited in MassIVE with the data set identifier MSV000087487.
Fig. 1Diabetic kidney disease (DKD) patient cohort dynamics and multidimensional assessment. 1,052 participants were enrolled in five clinical cohorts from five independent medical centers, including 266 healthy control (HC), 441 type 2 diabetes mellitus (T2DM; urinary albumin-to-creatinine ratio [UACR] <30 mg/g), 183 diabetic kidney disease at an early stage (DKD-E; 30≤ UACR ≤300 mg/g), and 162 diabetic kidney disease at an advanced stage (DKD-A; UACR >300 mg/g) patient groups in both discovery and testing cohorts. Integrative analysis by combining serum protein biomarkers and metabolites were performed. SAA1, serum amyloid A1; TNC, tenascin C; MMP-7, matrix metallopeptidase-7; TMT, tandem mass tags; ELISA, enzyme-linked immunosorbent assay; C.S., the 3rd Xiangya Hospital of the Central South University; T.J., Tongji Hospital of Tongji University; N.F., Nanfang Hospital; Z.D., Zhongda Hospital.
Fig. 2Newly developed machine learning algorithms optimize the combination of serum matrix metallopeptidase-7 (MMP-7), serum amyloid A1 (SAA1), and tenascin C (TNC) in predicting early diabetic kidney disease (DKD). (A–C) Box plots showing the concentration of MMP-7 (A), SAA1 (B), and TNC (C) in sera from healthy adults and DKD patients in different stages. (D) Partial least squares discriminant analysis (PLS-DA) based on the enzyme-linked immunosorbent assay (ELISA) markers. Four groups are included: healthy control (HC), type 2 diabetes mellitus (T2DM; urinary albumin-to-creatinine ratio [UACR] <30 mg/g), diabetic kidney disease at an early stage (DKD-E; 30≤ UACR ≤300 mg/g), diabetic kidney disease at an advanced stage (DKD-A; UACR >300 mg/g). (E) Using a single maker or multiple markers, receiver operating characteristic (ROC) curves for each pairwise prediction, including HC vs. T2DM, T2DM vs. DKD-E, and DKD-E vs. DKD-A. Four different machine learning classifiers were employed: redline for linear discriminant analysis (LDA), blue line for support vector machine (SVM), orange line for random forest (RF), and green line for logistic regression. AUC, area under the curve; Acc, accuracy.
Fig. 3Prediction of diabetic kidney disease (DKD) status via machine learning algorithms on metallopeptidase-7 (MMP-7), serum amyloid A1 (SAA1), and tenascin C (TNC) expression. (A–D) Box plots showing the concentration of MMP-7, SAA1, and TNC in sera from healthy adults and DKD patients at various stages across different clinical cohorts, including the 3rd Xiangya Hospital of the Central South University (C.S.) (A), Tongji Hospital of Tongji University (T.J.) (B), Nanfang Hospital (N.F.) (C), and Zhongda Hospital (Z.D.) (D). (E–G) Prediction probabilities based on the random forest algorithm. The best cutoff was trained from the discovery cohort and applied to validation cohorts for predicting healthy control (HC) vs. type 2 diabetes mellitus (T2DM), T2DM vs. diabetic kidney disease at early stage (DKD-E), and DKD-E vs. diabetic kidney disease at advanced stage (DKD-A). (H) Receiver operating characteristic (ROC) curves for each pairwise prediction using four different machine learning methods in validation cohorts. Four different machine learning classifiers were employed: redline for linear discriminant analysis (LDA), blue line for support vector machine (SVM), orange line for random forest (RF), and green line for logistic regression. AUC, area under the curve.
Fig. 4Integrative analysis of serum protein biomarkers and metabolites. (A) Integrating metallopeptidase-7 (MMP-7), serum amyloid A1 (SAA1), and tenascin C (TNC) with serum metabolites, receiver operating characteristic (ROC) curves for each pairwise prediction by four different machine learning methods in the discovery cohort. (B) Pairwise prediction accuracy based on protein features only (blue), metabolite features only (green), and integration of protein and metabolite features (orange). (C) Heatmap for the multi-outcome prediction accuracy based on protein features, metabolite features, and both feature sets. The numbers in the heatmap cells represent the percentage of cases predicted as the labels on the y-axis among all the true labels on the x-axis. AUC, area under the curve; Acc, accuracy; LDA, linear discriminant analysis; SVM, support vector machine; RF, random forest; HC, healthy control; T2DM, type 2 diabetes mellitus; DKD-E, diabetic kidney disease at early stage; DKD-A, diabetic kidney disease at advanced stage; ELISA, enzyme-linked immunosorbent assay.
Fig. 5Linear regression to analyze the predictive values of metallopeptidase-7 (MMP-7), serum amyloid A1 (SAA1), and tenascin C (TNC) in diabetic kidney disease (DKD). Linear regression analysis for glycosylated hemoglobin (HbA1c) (A), estimated glomerular filtration rate (eGFR) (B), urinary albumin-to-creatinine ratio (UACR) (C), and urinary microalbumin (UmALB) (D). Each dot represents a patient, with their real clinical indicators shown on the y-axis and the predicted values by the three enzyme-linked immunosorbent assay (ELISA) markers shown on the x-axis. (E, F) Using a single clinical maker including HbA1c, eGFR, blood urea nitrogen (BUN), serum creatinine (Scr), and UACR (E) or respectively combine each of them with MMP7, SAA1, and TNC (F), receiver operating characteristic (ROC) curves for each pairwise prediction, including type 2 diabetes mellitus (T2DM) vs. diabetic kidney disease at early stage (DKD-E), T2DM vs. diabetic kidney disease at advanced stage (DKD-A), and DKD-E vs. DKD-A. Four different machine learning classifiers were employed: redline for linear discriminant analysis (LDA), blue line for support vector machine (SVM), orange line for random forest (RF), and green line for logistic regression. AUC, area under the curve; Acc, accuracy.
Table 1Demographic characteristics of the participants for metabolomics (discovery phase)
|
Characteristic |
HC |
T2DM |
DKD-E |
DKD-A |
|
Age, yr |
34.19±9.12 |
54.27±10.2 |
57.38±9.71 |
57.68±9.32 |
|
Sex, male/female |
57/116 |
171/106 |
46/52 |
49/20 |
|
Body mass index, kg/m2
|
|
25.17±3.13 |
26.05±3.7 |
26.14±3.71 |
|
HbA1c, % |
|
8.1±1.99 |
9.06±2.04 |
8.7±7.43 |
|
eGFR, mL/min/1.73 m2
|
|
99.49±14.79 |
99.82±16.77 |
42.82±31.63 |
|
Lp-PLA2, μg/L |
|
103.22±34.92 |
90.06±38 |
116.07±56.22 |
|
Albumin, g/L |
|
39.07±4.57 |
38.72±3.04 |
31.57±4.87 |
|
Blood urea nitrogen, mg/dL |
|
6.68±1.8 |
6.73±1.87 |
14.7±7.83 |
|
Serum creatinine, μmol/L |
|
67.89±19.01 |
61.8±17.18 |
694.47±1,326.46 |
|
Glucose, mmol/L |
|
7.35±2.95 |
8.43±3.04 |
6.96±3.29 |
|
Uric acid, mg/dL |
|
298.88±96.96 |
304.4±97.49 |
437.47±135.36 |
|
Cholesterol, mmol/L |
|
4.33±1.12 |
4.69±1.25 |
5.06±2.13 |
|
Triglycerides, mmol/L |
|
2.04±2.26 |
2.59±2.82 |
2.18±1.64 |
|
High-density lipoprotein, mmol/L |
|
1.35±1.49 |
1.24±0.29 |
1.27±0.34 |
|
Low-density lipoprotein, mmol/L |
|
2.93±0.95 |
3.3±1.05 |
3.32±1.59 |
|
Urine creatinine, μmol/L |
|
9,670.69±3,800.48 |
7,299.73±3,874.18 |
5,059.97±2,851.22 |
|
Albumin-to-creatinine ratio, mg/g |
|
13.39±6.14 |
65.34±50.93 |
2,863.26±2,074.06 |
|
Fasting C-peptide, ng/mL |
|
1.52±0.91 |
1.54±1.09 |
2.4±3.19 |
|
Fasting insulin, μU/mL |
|
12.24±10.97 |
17.18±16.48 |
15.67±15.18 |
REFERENCES
- 1. Saeedi P, Petersohn I, Salpea P, Malanda B, Karuranga S, Unwin N, et al. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Res Clin Pract 2019;157:107843.ArticlePubMed
- 2. Alicic RZ, Rooney MT, Tuttle KR. Diabetic kidney disease: challenges, progress, and possibilities. Clin J Am Soc Nephrol 2017;12:2032-45.PubMedPMC
- 3. Agarwal R, Pitt B, Rossing P, Anker SD, Filippatos G, Ruilope LM, et al. Modifiability of composite cardiovascular risk associated with chronic kidney disease in type 2 diabetes with finerenone. JAMA Cardiol 2023;8:732-41.ArticlePubMedPMC
- 4. Thomas MC, Brownlee M, Susztak K, Sharma K, Jandeleit-Dahm KA, Zoungas S, et al. Diabetic kidney disease. Nat Rev Dis Primers 2015;1:15018.ArticlePubMedPMCPDF
- 5. Sugahara M, Pak WL, Tanaka T, Tang SC, Nangaku M. Update on diagnosis, pathophysiology, and management of diabetic kidney disease. Nephrology (Carlton) 2021;26:491-500.ArticlePubMedPDF
- 6. Persson F, Rossing P. Diagnosis of diabetic kidney disease: state of the art and future perspective. Kidney Int Suppl (2011) 2018;8:2-7.ArticlePubMed
- 7. Rico-Fontalvo J, Aroca-Martinez G, Daza-Arnedo R, Cabrales J, Rodriguez-Yanez T, Cardona-Blanco M, et al. Novel biomarkers of diabetic kidney disease. Biomolecules 2023;13:633.ArticlePubMedPMC
- 8. Barutta F, Bellini S, Canepa S, Durazzo M, Gruden G. Novel biomarkers of diabetic kidney disease: current status and potential clinical application. Acta Diabetol 2021;58:819-30.ArticlePubMedPDF
- 9. Colhoun HM, Marcovecchio ML. Biomarkers of diabetic kidney disease. Diabetologia 2018;61:996-1011.ArticlePubMedPMCPDF
- 10. Hirohama D, Abedini A, Moon S, Surapaneni A, Dillon ST, Vassalotti A, et al. Unbiased human kidney tissue proteomics identifies matrix metalloproteinase 7 as a kidney disease biomarker. J Am Soc Nephrol 2023;34:1279-91.ArticlePubMedPMC
- 11. Saulnier PJ, Dieter BP, Tanamas SK, McPherson SM, Wheelock KM, Knowler WC, et al. Association of serum amyloid a with kidney outcomes and all-cause mortality in American Indians with type 2 diabetes. Am J Nephrol 2017;46:276-84.ArticlePubMedPMCPDF
- 12. Gellen B, Thorin-Trescases N, Thorin E, Gand E, Sosner P, Brishoual S, et al. Serum tenascin-C is independently associated with increased major adverse cardiovascular events and death in individuals with type 2 diabetes: a French prospective cohort. Diabetologia 2020;63:915-23.ArticlePubMedPDF
- 13. Chen S, Fu H, Wu S, Zhu W, Liao J, Hong X, et al. Tenascin-C protects against acute kidney injury by recruiting Wnt ligands. Kidney Int 2019;95:62-74.ArticlePubMed
- 14. Fu H, Tian Y, Zhou L, Zhou D, Tan RJ, Stolz DB, et al. Tenascin-C is a major component of the fibrogenic niche in kidney fibrosis. J Am Soc Nephrol 2017;28:785-801.ArticlePubMed
- 15. Fu H, Zhou D, Zhu H, Liao J, Lin L, Hong X, et al. Matrix metalloproteinase-7 protects against acute kidney injury by priming renal tubules for survival and regeneration. Kidney Int 2019;95:1167-80.ArticlePubMedPMC
- 16. Zhou D, Tian Y, Sun L, Zhou L, Xiao L, Tan RJ, et al. Matrix metalloproteinase-7 is a urinary biomarker and pathogenic mediator of kidney fibrosis. J Am Soc Nephrol 2017;28:598-611.ArticlePubMed
- 17. Zhu H, Liao J, Zhou X, Hong X, Song D, Hou FF, et al. Tenascin-C promotes acute kidney injury to chronic kidney disease progression by impairing tubular integrity via βvβ6 integrin signaling. Kidney Int 2020;97:1017-31.ArticlePubMedPMC
- 18. Liu S, Gui Y, Wang MS, Zhang L, Xu T, Pan Y, et al. Serum integrative omics reveals the landscape of human diabetic kidney disease. Mol Metab 2021;54:101367.ArticlePubMedPMC
- 19. Zhang H, Zuo JJ, Dong SS, Lan Y, Wu CW, Mao GY, et al. Identification of potential serum metabolic biomarkers of diabetic kidney disease: a widely targeted metabolomics study. J Diabetes Res 2020;2020:3049098.ArticlePubMedPMCPDF
- 20. Sharma K, Karl B, Mathew AV, Gangoiti JA, Wassel CL, Saito R, et al. Metabolomics reveals signature of mitochondrial dysfunction in diabetic kidney disease. J Am Soc Nephrol 2013;24:1901-12.ArticlePubMedPMC
- 21. Wilson PC, Wu H, Kirita Y, Uchimura K, Ledru N, Rennke HG, et al. The single-cell transcriptomic landscape of early human diabetic nephropathy. Proc Natl Acad Sci U S A 2019;116:19619-25.ArticlePubMedPMC
- 22. Muto Y, Wilson PC, Ledru N, Wu H, Dimke H, Waikar SS, et al. Single cell transcriptional and chromatin accessibility profiling redefine cellular heterogeneity in the adult human kidney. Nat Commun 2021;12:2190.ArticlePubMedPMCPDF
- 23. Isnard P, Li D, Xuanyuan Q, Wu H, Humphreys BD. Histopathologic analysis of human kidney spatial transcriptomics data: toward precision pathology. Am J Pathol 2025;195:69-88.PubMed
- 24. Hao Y, Hao S, Andersen-Nissen E, Mauck WM, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell 2021;184:3573-87e29.ArticlePubMedPMC
- 25. Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, et al. Comprehensive integration of single-cell data. Cell 2019;177:1888-902e21.ArticlePubMedPMC
- 26. Gui Y, Palanza Z, Gupta P, Li H, Pan Y, Wang Y, et al. Calponin 2 regulates ketogenesis to mitigate acute kidney injury. JCI Insight 2023;8:e170521.ArticlePubMedPMC
- 27. Gui Y, Fu H, Palanza Z, Tao J, Lin YH, Min W, et al. Fibroblast expression of transmembrane protein smoothened governs microenvironment characteristics after acute kidney injury. J Clin Invest 2024;134:e165836.ArticlePubMedPMC
- 28. Xie HH, Xu JY, Xie T, Meng X, Lin LL, He LL, et al. Effects of Pinellia ternata (Thunb.) Berit. on the metabolomic profiles of placenta and amniotic fluid in pregnant rats. J Ethnopharmacol 2016;183:38-45.ArticlePubMed
- 29. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 2011;12:77.ArticlePubMedPMCPDF
- 30. Bishop CM. Pattern recognition and machine learning. New York: Springer; 2006.
- 31. Ban CR, Twigg SM, Franjic B, Brooks BA, Celermajer D, Yue DK, et al. Serum MMP-7 is increased in diabetic renal disease and diabetic diastolic dysfunction. Diabetes Res Clin Pract 2010;87:335-41.Article
- 32. Afkarian M, Zelnick LR, Ruzinski J, Kestenbaum B, Himmelfarb J, de Boer IH, et al. Urine matrix metalloproteinase-7 and risk of kidney disease progression and mortality in type 2 diabetes. J Diabetes Complications 2015;29:1024-31.ArticlePubMedPMC
- 33. Zhou Y, Ma XY, Han JY, Yang M, Lv C, Shao Y, et al. Metformin regulates inflammation and fibrosis in diabetic kidney disease through TNC/TLR4/NF-κB/miR-155-5p inflammatory loop. World J Diabetes 2021;12:19-46.ArticlePubMedPMC
- 34. Dieter BP, McPherson SM, Afkarian M, de Boer IH, Mehrotra R, Short R, et al. Serum amyloid a and risk of death and end-stage renal disease in diabetic kidney disease. J Diabetes Complications 2016;30:1467-72.ArticlePubMedPMC
- 35. Anderberg RJ, Meek RL, Hudkins KL, Cooney SK, Alpers CE, Leboeuf RC, et al. Serum amyloid A and inflammation in diabetic kidney disease and podocytes. Lab Invest 2015;95:250-62.ArticlePubMedPDF
- 36. Al-Rubeaan K, Siddiqui K, Al-Ghonaim MA, Youssef AM, Al-Sharqawi AH, AlNaqeb D. Assessment of the diagnostic value of different biomarkers in relation to various stages of diabetic nephropathy in type 2 diabetic patients. Sci Rep 2017;7:2684.ArticlePubMedPMCPDF
- 37. Vucic Lovrencic M, Bozicevic S, Smircic Duvnjak L. Diagnostic challenges of diabetic kidney disease. Biochem Med (Zagreb) 2023;33:030501.PubMedPMC
- 38. Kidney Disease: Improving Global Outcomes (KDIGO) Diabetes Work Group. KDIGO 2020 clinical practice guideline for diabetes management in chronic kidney disease. Kidney Int 2020;98(4S):S1-115.ArticlePubMed
- 39. Mogensen CE, Christensen CK, Vittinghus E. The stages in diabetic renal disease: with emphasis on the stage of incipient diabetic nephropathy. Diabetes 1983;32(Suppl 2):64-78.ArticlePubMedPDF
- 40. Sauriasari R, Safitri DD, Azmi NU. Current updates on protein as biomarkers for diabetic kidney disease: a systematic review. Ther Adv Endocrinol Metab 2021;12:20420188211049612.ArticlePubMedPMCPDF
- 41. Jung CY, Yoo TH. Novel biomarkers for diabetic kidney disease. Kidney Res Clin Pract 2022;41(Suppl 2):S46-62.ArticlePubMedPMCPDF
- 42. Fu H, Liu S, Bastacky SI, Wang X, Tian XJ, Zhou D. Diabetic kidney diseases revisited: a new perspective for a new era. Mol Metab 2019;30:250-63.ArticlePubMedPMC
- 43. Jiang X, Liu X, Qu X, Zhu P, Wo F, Xu X, et al. Integration of metabolomics and peptidomics reveals distinct molecular landscape of human diabetic kidney disease. Theranostics 2023;13:3188-203.ArticlePubMedPMC
- 44. Mohandes S, Doke T, Hu H, Mukhi D, Dhillon P, Susztak K. Molecular pathways that drive diabetic kidney disease. J Clin Invest 2023;133:e165654.ArticlePubMedPMC
- 45. Sarangi R, Sahu D, Rout NK, Padarabinda Tripathy K, Patra S, Bahinipati J, et al. Role of urinary matrix metalloproteinase-7 (MMP-7) as an early marker of renal dysfunction in diabetic individuals: a cross-sectional study. Cureus 2024;16:e66392.ArticlePubMed
- 46. Mora-Gutierrez JM, Rodriguez JA, Fernandez-Seara MA, Orbe J, Escalada FJ, Soler MJ, et al. MMP-10 is increased in early stage diabetic kidney disease and can be reduced by renin-angiotensin system blockade. Sci Rep 2020;10:26.PubMedPMC
- 47. Mason RM, Wahab NA. Extracellular matrix metabolism in diabetic nephropathy. J Am Soc Nephrol 2003;14:1358-73.ArticlePubMed
- 48. Kolset SO, Reinholt FP, Jenssen T. Diabetic nephropathy and extracellular matrix. J Histochem Cytochem 2012;60:976-86.ArticlePubMedPMCPDF
- 49. Ozanne J, Shek B, Stephen LA, Novak A, Milne E, Mclachlan G, et al. Tenascin-C is a driver of inflammation in the DSS model of colitis. Matrix Biol Plus 2022;14:100112.ArticlePubMedPMC
- 50. Jiang B, Wang D, Hu Y, Li W, Liu F, Zhu X, et al. Serum amyloid A1 exacerbates hepatic steatosis via TLR4-mediated NF-κB signaling pathway. Mol Metab 2022;59:101462.ArticlePubMedPMC