Assessing Nutritional Factors for Metabolic Dysfunction-Associated Steatotic Liver Disease via Diverse Statistical Tools
Article information
Abstract
Background
Lifestyle modifications are critical in addressing metabolic dysfunction-associated steatotic liver disease (MASLD); however, the specific macronutrients that most significantly influence the disease’s progression are uncertain. In this study, we aimed to explore the role of carbohydrate, fat, and protein intake in MASLD development using decision trees, random forest models, and cluster analysis.
Methods
Participants (n=3,951) from the Korean Genome and Epidemiology Study were included. We used the classification and regression tree analysis to classify participants into subgroups based on variables associated with the incidence of new-onset MASLD. Random forest analyses were used to assess the relative importance of each variable. Participants were grouped into homogeneous clusters based on carbohydrate, protein, fat, and total caloric intake using hierarchical cluster analysis. Subsequently, we used the Cox proportional hazards regression models to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) for MASLD risk across the clusters.
Results
Carbohydrate intake was identified as the most significant predictor of new-onset MASLD, followed by fat, protein, and total caloric intake. Participants in cluster 3, who consumed a lower proportion of carbohydrate but had higher total caloric, protein, and fat intake, had a lower risk of new-onset MASLD than those in cluster 1 after adjusting for confounders (cluster 1 as a reference; cluster 3: HR, 0.90; 95% CI, 0.82 to 0.99).
Conclusion
The study’s results highlight the critical role of macronutrient composition, particularly carbohydrate intake, in MASLD development. The findings suggest that dietary strategies focusing on optimizing macronutrients, rather than simply reducing caloric intake, may be more effective in preventing MASLD.
Highlights
• Nutritional strategies are essential for managing MASLD.
• The optimal macronutrient composition for MASLD remains unclear.
• Carbohydrate intake plays a key role in the development of MASLD.
• Optimizing macronutrients may be more effective than reducing energy intake alone.
INTRODUCTION
Metabolic dysfunction-associated steatotic liver disease (MASLD) is a recently proposed term replacing non-alcoholic fatty liver disease (NAFLD). It emphasizes the metabolic dysfunction underlying the condition [1]. Furthermore, it is emerging as a major global health concern because of the increasing prevalence of atherosclerotic cardiovascular diseases [2], potential for severe liver diseases such as cirrhosis and hepatocellular carcinoma [3,4], and its contribution to significant morbidity and mortality [3,5]. Notably, obesity and metabolic syndrome significantly influence the development and progression of MASLD, with global prevalence estimates ranging from 20% to 25% [6]. Recent data from the Global Burden of Disease study indicate that MASLD affects approximately 25% to 38% of the global adult population, making it one of the most prevalent chronic liver diseases worldwide [7]. Furthermore, the annual prevalence of MASLD in South Korea rose from 15.69 per 1,000 population in 2010 to 34.23 per 1,000 in 2021, which was closely parallel to the increase in obesity and metabolic syndrome [8]. However, few targeted therapeutic strategies are available for managing this disease despite its widespread prevalence and substantial public health impact.
Notably, dietary interventions are crucial in managing MASLD. This involves addressing hepatic fat accumulation and metabolic dysfunction [9,10]. Furthermore, excessive calorie intake, particularly from unhealthy sources, promotes obesity and insulin resistance, influencing MASLD development [11]. High-calorie diets further damage the liver by increasing fat accumulation, inflammation, and oxidative stress [11,12]. Conversely, calorie restriction and modest weight loss (5% to 10%) reduce hepatic steatosis, improve liver function, and alleviate metabolic conditions such as type 2 diabetes mellitus (T2DM), dyslipidemia, and hypertension [12,13].
Additionally, adjusting macronutrient intake along with calorie control is important for preventing and managing MASLD. A high-carbohydrate diet increases MASLD risk, as shown in the National Health and Nutrition Examination Survey (NHANES) study, where patients with MASLD who have advanced fibrosis had higher carbohydrate intake [14]. Conversely, reducing free sugars decreased hepatic steatosis from 25% to 17% in adolescents with NAFLD [15]. In addition, fat intake worsens insulin resistance and inflammation, driving MASLD progression [16]. The NHANES study also reported higher saturated fat intake in patients with MASLD [16], whereas a 3-month low-fat diet led to over 9% weight loss and improved hepatic steatosis in Serbian men who were overweight [17]. Additionally, a Mediterranean diet rich in unsaturated fats reduced liver steatosis by 25% to 35% in patients with NAFLD [18]. The role of protein intake in MASLD remains unclear; however, it is crucial for preventing sarcopenia [19].
Notably, it is not well understood which macronutrient has the most profound impact on MASLD development. Additionally, most research performed, focused on Western populations, leaving a gap in understanding how dietary factors influence MASLD in Asian populations. Considering the differences in dietary patterns and metabolic responses across populations, identifying the specific macronutrients that most strongly affect MASLD risk in diverse populations is essential to creating more effective prevention and management strategies.
To address this gap, we use decision trees and random forest models to investigate the impact of carbohydrate, fat, and protein intake on new-onset MASLD in a large Korean cohort. In addition, we conducted a cluster analysis to explore the relationship between macronutrient proportions and MASLD.
METHODS
Study population
We used datasets obtained from the Ansung and Ansan cohort of the Korean Genome and Epidemiology Study (KoGES). These cohorts included community-dwelling individuals, comprising men and women who were aged ≥40 years at the time of enrollment. The baseline data were collected between 2001 and 2002. Furthermore, the 1st to 9th follow-up data were obtained between 2003 and 2020. The data collection in KoGES was meticulously standardized, with regular follow-up surveys and health examinations conducted by trained medical personnel.
Of the 10,030 participants, with ages ranging from 40 to 69 years, those who had no data on NAFLD liver fat score (n=299), metabolic syndrome criteria (n=620), or macronutrient intake (n=31) were excluded. Additionally, we excluded participants who had previous MASLD status at baseline (n=2,089) and were never followed up from 1st to 9th in the follow-up studies (n=1,102), and those who had total calorie intake <500 or >5,000 kcal/day (n=88). Overall, 5,801 participants were included in the baseline study. Furthermore, we included the participants in the study population every 2 years during the follow-up period. Finally, 3,951 participants were included in our study population after the 9th follow-up assessment (Fig. 1).
Definition of metabolic dysfunction-associated steatotic liver disease
We defined MASLD as the presence of hepatic steatosis accompanied by metabolic dysfunction [1]. Furthermore, hepatic steatosis was defined as a NAFLD liver fat score >–0.64, which was calculated as liver fat score=–2.89+1.18×metabolic syndrome (yes=1, no=0)+0.9×diabetes mellitus (yes=1, no=0)+ 0.15×serum insulin (μIU/mL)+0.04×serum aspartate aminotransferase (AST; U/L)–0.94×AST/alanine aminotransferase [20]. Metabolic dysfunction was indicated as meeting at least one of the following cardiometabolic risk factors: (1) body mass index ≥23 kg/m² or waist circumferences (WC) ≥94 cm in men and ≥80 cm in women; (2) fasting plasma glucose level ≥100 mg/dL, 2-hour postprandial glucose levels ≥140 mg/dL, glycosylated hemoglobin ≥5.7%, or receiving treatment for T2DM; (3) systolic blood pressure ≥130 mm Hg, diastolic blood pressure (DBP) ≥85 mm Hg, or receiving specific antihypertensive drug treatment; (4) serum triglyceride (TG) level ≥150 mg/dL or receiving lipid-lowering treatment; and (5) serum high-density lipoprotein cholesterol level <40 mg/dL in men and <50 mg/dL in women [1].
Assessment of nutrition intake
Dietary intake was assessed using a semi-quantitative food frequency questionnaire (FFQ) consisting of 103 food items [21]. The FFQ was designed to estimate usual dietary intake by collecting information on the frequency and portion size of food consumption. To calculate total energy and nutrient intake, the reported frequency of consumption for each food item was multiplied by the corresponding portion size and nutrient content, as specified in the Korean Food Composition Table (7th edition, The Korean Nutrition Society, 2000) [22]. The total daily nutrient intake for each participant was derived by summing the nutrient intakes from all reported food items. This process was conducted using the DS24 software, developed by the Human Nutrition Lab at Seoul National University in collaboration with the AI/DB Lab at Sookmyung Women’s University (1996) [21]. Detailed information regarding the covariates is described in the Supplementary Methods.
Statistical analysis
We used the chi-square test to analyze categorical variables and the independent two-sample t-test for continuous variables to compare between the new-onset MASLD group and the notdeveloped MASLD group. Classification and regression tree (CART) analysis was used to divide subjects into subgroups based on optimal cut-off points for variables. The optimal cut-off points were determined based on the point at which the log-rank test statistic was maximized. Furthermore, a random forest model was used to assess the importance of each variable in the incidence of the new-onset MASLD. Additionally, hierarchical cluster analysis was used to classify the variables, including carbohydrate, protein, fat, and total calorie intake, and Ward’s minimum variance method was used as the clustering criterion. All variables were standardized before clustering and two statistical values were estimated to determine the optimal number (n) of clusters: the smallest pseudo t2 statistic and the largest decrease in semipartial R2 at n clusters compared with n-1 clusters. A one-way analysis of variance and chi-square test were used to compare the clusters, for continuous and categorical variables, respectively. The cumulative incidence curve showed the incidence rate of new-onset MASLD in each cluster. Furthermore, the Cox proportional hazard regression model was performed to estimate hazard ratios (HRs) and 95% confidence intervals (CIs) of MASLD for clusters. All statistical analyses were performed using the R package version 4.0.2 (R Foundation for Statistical Computing, Vienna, Austria) and SAS version 9.4 (SAS Institute, Cary, NC, USA). Statistical significance was set at P<0.05.
Ethics and consent
Ethical standards were rigorously upheld, with informed consent obtained from all participants. The study protocols were conducted following the guidelines of the Declaration of Helsinki. The study was approved by the Institutional Review Board of Severance Hospital, Yonsei University College of Medicine (IRB No: 4-2024-0866).
RESULTS
Demographic characteristics of the study population
Table 1 shows the demographic data of the participants based on the incidence of new-onset MASLD during the median follow-up for approximately 11.6 years (interquartile range [IQR], 5.7 to 17.6). Notably, 2,869 and 2,932 participants were included in the new-onset and not-developed MASLD groups, respectively. The mean ages were 51.4±8.4 and 51.0±9.0 years in new-onset and not-developed MASLD groups, respectively. Furthermore, participants in the not-developed MASLD group had lower rates of obesity and smaller WC than those in the new-onset MASLD group. Moreover, the prevalence of metabolic syndrome and other chronic diseases, such as diabetes, hypertension, and dyslipidemia were significantly lower in the not-developed MASLD group than in the new-onset MASLD. Notably, the not-developed MASLD group had a lower carbohydrate intake than the new-onset MASLD group (345.90±97.44 g/day vs. 351.10±103.74 g/day, P=0.049).
Classification and regression tree analysis and random forest model
Fig. 2A illustrates the results from the CART algorithm, which analyzed the incidence of new-onset MASLD. The root node was initially split based on carbohydrate intake (P=0.003), with 548 individuals consuming over 477.81 g/day (node 1) and 5,253 consuming 477.81 g/day or less (node 1’). Node 1’ was further divided by fat intake at a cut-off of 22.85 g/day (P=0.025), resulting in node 2 and node 2’. Of the 4,198 individuals in node 2’ (fat intake ≤22.85 g/day), two more child nodes were created based on protein intake: node 3, with 337 individuals consuming over 44.21 g/day of protein, and node 4, with 718 individuals consuming 44.21 g/day or less (P=0.031). No further significant variables were found to split the subgroups; hence, in total, four nodes, including nodes 1, 2, 3, and 4, were used.

Classification and decision tree presenting the splitting variables and comparison of the variable importance according to the incidence of new-onset metabolic dysfunction-associated steatotic liver disease. (A) Decision tree algorithm. (B) Random forest model.
The variable importance of the selected splitters, as analyzed by the random forest model, is shown in Fig. 2B. The analysis for predicting new-onset MASLD identified carbohydrates, fat, protein, and total calorie intake as the most relevant variables. Among them, carbohydrate intake had the highest relative importance, set at 100, followed by fat intake at 81.71, protein intake at 32.52, and total calories at 19.54.
Cluster analysis and Cox proportional hazard regression model
We classified the participants based on four variables using a hierarchical cluster analysis: the proportions of carbohydrate, protein, fat, and total calorie intake. Supplementary Table 1 shows the determination of the optimal number of clusters. We found that the pseudo t2 statistic value of 545.8 was the smallest when the number of clusters was three, and the decrease in semipartial R2 was largest at three clusters (0.050). Results from the hierarchical cluster analysis identified three clusters as the best number to represent the data provided for 0.532 R2 values (Supplementary Fig. 1).
Table 2 shows the results of cluster analysis for demographic variables based on the clusters. Approximately 1,568 participants were included in cluster 1, 2,414 in cluster 2, and 1,819 in cluster 3. Cluster 3 participants had lower WC, higher high-density lipoprotein levels, and lower prevalence of abdominal obesity, metabolic syndrome, hypertension, and dyslipidemia than those in other clusters. Their carbohydrate consumption was relatively low despite their higher total caloric intake. Notably, cluster 3 participants also consumed 1.30±0.36 g/kcal of dietary fiber, which was more than the 1.14±0.31 g/kcal in cluster 1 or 1.23±0.32 g/kcal in cluster 2 (P<0.001). Additionally, this group consumed more proteins and fats, with a notably lower n-6 to n-3 polyunsaturated fatty acid ratio.
Fig. 3 presents the Kaplan–Meier analysis for the cumulative incidence of new-onset MASLD. Over a median 11.6-year follow-up period (IQR, 5.7 to 17.6), the cumulative incidence of new-onset MASLD was highest in cluster 1, followed by cluster 2, and lowest in cluster 3. The calculated cumulative incidence values were 0.68 for cluster 1, 0.65 for cluster 2, and 0.64 for cluster 3. However, this difference was not statistically significant (log-rank test, P=0.113).

Kaplan–Meier analysis for the cumulative incidence of new-onset metabolic dysfunction-associated steatotic liver disease (MASLD) (above) and number at risk table (below) (log-rank test, P=0.113).
Moreover, Table 3 presents the HR and 95% CI of clusters for the incidence of new-onset MASLD. The HR and 95% CI for the incidence of new-onset MASLD in cluster 3 was 0.89 (95% CI, 0.81 to 0.98) compared with that in cluster 1 in the unadjusted model. However, the HR and 95% CI for the presence of new-onset MASLD in cluster 3 was 0.90 (95% CI, 0.82 to 0.99) compared with that in cluster 1, after adjusting for age, sex, current smoking and alcohol drinking, and physical activity.
DISCUSSION
In this study, we investigated the relationship between macronutrient intake and the incidence of new-onset MASLD in a large Korean cohort using CART and random forest models. From the analysis, we identified carbohydrate, fat, and protein intake as key contributors to MASLD risk, with carbohydrate intake having the greatest influence. Notably, hierarchical cluster analysis revealed distinct dietary patterns across the three clusters. Participants in cluster 3, who had lower carbohydrate intake and higher intake of healthy fats and proteins, had a 10% reduced risk of developing MASLD compared with those in cluster 1, even after adjusting for confounders.
A high-carbohydrate diet promotes de novo lipogenesis, increasing triglyceride production and hepatic fat accumulation, hence contributing to hepatic steatosis [23,24]. Consistent with our findings, several studies emphasize the critical role of carbohydrate intake in managing liver diseases, beyond simply reducing calories [25-27]. Afsharfar et al. [25] demonstrated that total carbohydrate intake is significantly associated with the risk of hepatic steatosis in NAFLD patients. Additionally, findings from a 2-week randomized clinical trial revealed that a low-carbohydrate diet reduced hepatic steatosis by 27%, surpassing the effect of a low-calorie diet in participants with NAFLD [26]. However, a recent meta-analysis of 34 observational studies revealed that total carbohydrate intake was not significantly associated with MASLD, suggesting that different types and sources of carbohydrates may have varying effects on liver fat accumulation [28]. Specifically, fructose has been associated with a higher incidence of MASLD, whereas fiber and whole-grain starch may have protective effects [28,29]. This was observed in our study where cluster 3 participants with a lower risk of MASLD consumed fewer carbohydrates in total with more dietary fiber.
Our analysis revealed that carbohydrate intake has the greatest influence on MASLD risk; however, the exact reason remains unclear. Moreover, this may be a result of the particularly high-carbohydrate intake in Korea, especially among middle-aged and older adults, due to traditional dietary patterns rich in carbohydrate-heavy foods such as rice and noodles [30,31]. Future research should focus on examining the effects of different carbohydrate types and sources on MASLD through longitudinal studies, and investigate the underlying mechanisms involved.
High-fat diets traditionally influence MASLD development because they promote the accumulation of fat in the liver through increased lipogenesis and impaired fatty acid oxidation [16,32]. However, the type of fat consumed is crucial. In our study, fat intake was identified as the second most significant factor influencing MASLD risk after carbohydrate intake. The CART analysis revealed that participants with lower fat intake had a distinct risk profile. Those in cluster 3—characterized by lower carbohydrate and higher fat intake, particularly with a favorable n-6 to n-3 ratio—showed the lowest incidence of new-onset MASLD. Similar to our findings, findings from a 6-month randomized controlled trial in Denmark revealed that a low-carbohydrate and high-fat diet resulted in greater improvements in NAFLD activity scores than a high-carbohydrate and low-fat diet [27]. These results highlight the importance of fat quality over quantity in preventing MASLD and suggest that balancing fat intake and focusing on healthier fats may be key in MASLD management.
There is a complex relationship between protein intake and MASLD, with some studies suggesting that higher protein intake may help protect against sarcopenia, which is a risk factor for MASLD [19]. However, the outcomes vary based on the type and quality of protein. Plant-based proteins have different effects when compared with animal proteins; however, further research is needed to clarify this in the context of MASLD [19,33]. Our results revealed that cluster 1, characterized by lower protein intake, was significantly associated with a higher risk of MASLD than cluster 3, supporting the idea that protein intake may influence MASLD development. We did not differentiate between protein types in this study; however, a study on older adults in Korea reported that 69% of their protein intake came from plant sources [34], and our findings also reveal that adequate protein intake, particularly in aging populations, may be beneficial for MASLD management. Similarly, a community-based case-control study of Chinese older adults aged ≥65 years reported that a daily protein intake of 58.7 to 70.7 g/day was associated with a reduced risk of NAFLD compared with intake levels below 45.8 g/day [35].
The impact of total caloric intake versus macronutrient composition on MASLD development is still being argued. Some studies suggest that caloric deficit alone can reduce liver fat [36]; however, others highlight that the type of fats and carbohydrates consumed significantly influences MASLD, regardless of calorie intake [14]. Our study supports the latter, highlighting that carbohydrate intake has the greatest influence on MASLD risk, followed by fat, protein, and total calories. These findings emphasize that managing the quality and balance of macronutrients may be more crucial than reducing calories alone. Thus, a dietary approach focusing on macronutrient composition rather than reducing calories may be more effective in preventing and managing MASLD.
This study has several limitations. First, relying on self-reported dietary data introduces potential recall bias. Second, dietary intake was not assessed at every follow-up period in the KoGES study, making it impossible to comprehensively analyze longitudinal dietary changes. Additionally, as only baseline dietary intake data were used, we acknowledge that changes in dietary intake over time may have influenced study outcomes, thereby limiting our ability to evaluate temporal variations in dietary intake. Future studies should consider investigating the impact of dietary intake changes on the incidence of MASLD. Third, unmeasured confounding factors could have influenced the results. Fourth, the lack of detailed nutrient composition analysis, such as distinguishing between refined and unrefined carbohydrates or plant- and animal-based proteins, limits the depth of our findings. Fourth, focusing on a predominantly Korean adult population may affect the generalizability of the results to other ethnicities or regions. Fifth, the use of the NAFLD liver fat score to diagnose MASLD may be less accurate than imaging or biopsy-based methods. While serum-based biomarkers are valuable tools for large-scale epidemiological studies, their diagnostic accuracy remains suboptimal for clinical application, limiting their reliability in precisely identifying MASLD cases [37]. Lastly, the CART algorithm has some limitations, such as converting continuous variables into discrete categories and being sensitive to data variations, which may affect reproducibility [38,39].
Our study findings offer valuable strengths despite its limitations. We used decision trees, random forest models, and clustering analysis to thoroughly assess the impact of macronutrient intake and dietary patterns on MASLD risk. This comprehensive approach makes our study one of the first to explore these dietary influences in-depth, providing essential insights for developing targeted dietary interventions for MASLD prevention and management.
In conclusion, our study results reveal the critical role of macronutrient composition, particularly carbohydrate intake, in new-onset MASLD development. The findings suggest that optimizing macronutrient quality and balance, rather than simply reducing calorie levels, may be more effective for prevention and management. Specifically, strategies that limit refined carbohydrates and emphasize healthier fats could lead to better clinical outcomes. Therefore, further research is needed to investigate the underlying mechanisms and refine these dietary approaches to improve MASLD management.
SUPPLEMENTARY MATERIALS
Supplementary materials related to this article can be found online at https://doi.org/10.4093/dmj.2025.0026.
Determination of the optimal number of clusters
Dendrogram presenting the hierarchical cluster analysis made by the agglomerative technique, which began with each subject being a cluster by itself and merged together continuously based on similarity between clusters. When the number of clusters was 3, R value, which estimated the heterogeneity of the cluster solution formed at a given step, was 0.532.
Notes
CONFLICTS OF INTEREST
No potential conflict of interest relevant to this article was reported.
AUTHOR CONTRIBUTIONS
Conception or design: Y.C.L., Y.J.K., J.W.L.
Acquisition, analysis, or interpretation of data: all authors.
Drafting the work or revising: Y.C.L., H.S.L., Y.J.K., J.W.L.
Final approval of the manuscript: all authors.
FUNDING
This research received funding from the Korea Institute of Planning and Evaluation for Technology in Food, Agriculture, and Forestry (IPET) through a high-value-added food technology development program supported by the Ministry of Agriculture, Food, and Rural Affairs (MAFRA) (321030051HD030). It was also funded by a National Research Foundation of Korea (NRF) grant from the Korean government (MSIT) (RS-2024-00354524).
ACKNOWLEDGMENTS
We would like to thank the Biostatistics Collaboration Unit, Department of Research Affairs, Yonsei University College of Medicine, MID (Medical Illustration & Design), as a member of the Medical Research Support Services of Yonsei University College of Medicine, providing excellent support with medical illustration.