Inflammatory indices, machine learning and artificial intelligence in tubal ectopic pregnancy management

Uğurcan Zorlu; Senem Arda Düz; Gül Kurtaran; Mohammad İbrahim Halilzade; Burak Elmas

doi:10.4274/tjod.galenos.2026.37165

Abstract

Objective

To assess the predictive value of hematologic and biochemical inflammatory indices for methotrexate (MTX) treatment outcomes in tubal ectopic pregnancy (TEP) and to develop machine learning (ML) models for individualized risk stratification.

Materials and Methods

This retrospective cohort included 293 hemodynamically stable TEP patients who were treated with a single dose of MTX between January 2019 and December 2023. Demographic, clinical, ultrasonographic, and laboratory data were analyzed. Inflammatory indices—including neutrophil-to-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio, systemic immune-inflammation index, systemic inflammation response index (SIRI), aggregate index of systemic inflammation (AISI), and fibrinogen-to-albumin ratio (FAR)—were calculated. Outcomes were categorized as single-dose MTX success, requirement for additional MTX, or surgery. Predictive accuracy of five supervised ML algorithms was evaluated using receiver operating characteristic analysis.

Results

Single-dose MTX was successful in 65.5% of patients; 18.4% required an additional dose, and 16.0% underwent surgery. AISI had the highest predictive accuracy for surgery [area under the curve (AUC)=0.929], followed by SIRI (AUC=0.899) and FAR (AUC=0.847). NLR best predicted the need for additional MTX (AUC=0.675). Naïve Bayes achieved the highest performance for surgical prediction (accuracy=98.3%, AUC=0.998), while random forest and gradient boosting were most effective in predicting the need for additional MTX (accuracy=83.1%, AUC=0.884-0.896). Feature importance analyses consistently ranked AISI, SIRI, and FAR as top predictors.

Conclusion

AISI, SIRI, and FAR are strong predictors of MTX failure and surgical intervention in TEP. Combining these biomarkers with ML models markedly improves predictive performance and supports a personalized approach to TEP management. Multicenter prospective validation is needed before clinical application.

Keywords:

Ectopic pregnancy, inflammation, inflammatory markers, machine learning, methotrexate

PRECIS: Using inflammatory indices and machine learning models, we evaluated predictors of methotrexate failure and surgical intervention in patients with tubal ectopic pregnancy.

Introduction

Tubal ectopic pregnancy (TEP) remains a significant contributor to maternal morbidity and mortality during the first trimester and constitutes more than 90% of all ectopic pregnancies⁽¹⁾. The use of systemic methotrexate (MTX) has substantially changed the management of hemodynamically stable patients, as single-dose regimens provide a less invasive alternative to surgical intervention⁽²⁾. However, the clinical response to MTX is heterogeneous, and a subset of patients experiences treatment failure, necessitating additional MTX administration or surgical management⁽³^,⁴⁾. For this reason, the early identification of patients at increased risk for unsuccessful medical treatment is essential to enable individualized therapeutic strategies and improve overall prognosis. In recent years, hematologic and biochemical inflammatory indices obtained from routine laboratory testing, such as the systemic immune-inflammation index (SII), platelet-to-lymphocyte ratio (PLR), neutrophil-to-lymphocyte ratio (NLR), and fibrinogen-to-albumin ratio (FAR), have been increasingly investigated as potential predictors of disease course and treatment response in obstetric and gynecologic disorders⁽⁴^,⁵⁾.These markers reflect systemic immune activation, coagulation status, and metabolic changes, all of which may influence MTX efficacy in TEP⁽⁶^,⁷⁾. Although several studies have explored their predictive role, findings remain inconsistent, particularly regarding their ability to forecast the need for surgical intervention versus additional MTX administration⁽⁸^,⁹⁾.

Alongside biomarker research, advances in artificial intelligence and machine learning (ML) offer novel opportunities to improve clinical decision-making in TEP. ML algorithms can process complex, multidimensional datasets to identify nonlinear relationships between clinical, ultrasonographic, and laboratory variables, potentially outperforming traditional statistical approaches⁽¹⁰^,¹¹⁾. However, integration of ML-based prediction models into routine ectopic pregnancy management remains limited, with few studies systematically comparing their performance against established clinical predictors.

Given these gaps, the present retrospective cohort study aimed to evaluate the predictive accuracy of inflammatory indices for treatment success, the requirement for additional MTX, and surgical intervention in TEP patients, and to develop and validate ML-based prediction models for individualized risk stratification. By combining traditional statistical methods with advanced computational modeling, this study seeks to establish a more precise, data-driven framework for optimizing TEP management.

Materials and Methods

This retrospective cohort study was conducted in the Department of Obstetrics and Gynecology of a tertiary referral hospital between January 2019 and December 2023. All clinical, laboratory, and imaging data were retrieved from the institutional electronic medical record system and radiology archives. The study protocol was reviewed and approved by the Ankara Bilkent City Hospital Institutional Ethics Committee (approval no: TABED 2-25-1311, date: 11.06.2025), and all procedures were conducted in accordance with the Declaration of Helsinki.

All women diagnosed with TEP during the study period were evaluated for eligibility. A definitive diagnosis was established based on transvaginal ultrasonography findings and/or serial serum beta human chorionic gonadotropin (β-hCG) measurements. Only hemodynamically stable patients who were initially managed with a single-dose intramuscular MTX regimen and had complete clinical, laboratory, and imaging records were included. Patients presenting with hemodynamic instability, suspected or confirmed tubal rupture, immediate indications for surgical intervention, contraindications to MTX therapy (e.g., hepatic, renal, or hematologic disorders), non-tubal ectopic pregnancies (including cervical, interstitial, or ovarian locations), or those treated primarily with multi-dose MTX protocols were excluded from the analysis.

All eligible patients received a single intramuscular dose of MTX at 50 mg/m² on day 0. Serum β-hCG levels were measured on days 0, 4, and 7 following treatment. A decline of at least 15% in β-hCG levels between day 4 and day 7 was considered indicative of an adequate therapeutic response. Patients who failed to achieve this decline received an additional dose of MTX and were classified as requiring further medical treatment. Those who developed worsening clinical symptoms such as increasing abdominal pain, hemodynamic deterioration, signs of tubal rupture, or persistent elevation of β-hCG levels despite medical therapy were referred for surgical management.

Demographic characteristics including age, gravidity, and parity, as well as clinical features such as abdominal pain and vaginal bleeding at presentation, were recorded for each patient. Laboratory parameters included serum β-hCG levels at baseline, day 4, and day 7; complete blood count values; and concentrations of C-reactive protein, fibrinogen, and albumin. Ultrasonographic evaluation provided data on adnexal mass size and Doppler-derived ipsilateral ovarian artery systolic/diastolic ratio and pulsatility index. Based on complete blood count parameters, several hematologic and inflammatory indices were calculated, including NLR, monocyte-to-lymphocyte ratio (MLR), PLR, eosinophil-to-lymphocyte ratio (ELR), white blood cell-to-neutrophil ratio, SII, systemic inflammation response index (SIRI), aggregate index of systemic inflammation (AISI), and FAR.

The primary outcomes of the study were defined as the successful resolution of ectopic pregnancy with a single dose of methotrexate, the requirement for an additional MTX dose, and the need for surgical intervention following medical treatment. Secondary outcomes included the identification of clinical and laboratory predictors associated with treatment failure, the assessment of the diagnostic performance of inflammatory indices in predicting treatment outcomes, and the evaluation of ML models for predicting surgical intervention or the need for an additional dose of methotrexate.

Statistical Analysis

All analyses were conducted using IBM SPSS Statistics version 25.0 (IBM Corp., Armonk, NY, USA) and Python 3.10 with standard scientific libraries. Data distribution was assessed with the Shapiro-Wilk test. Parametric data were reported as mean ± standard deviation and compared using the Student’s t-test or one-way ANOVA, whereas nonparametric variables were expressed as median (interquartile range) and analyzed with the Mann-Whitney U or Kruskal-Wallis tests. Categorical variables were compared using the chi-square test or Fisher’s exact test, as appropriate. A two-sided p-value <0.05 was considered statistically significant.

Receiver operating characteristic (ROC) analyses were performed to assess the ability of inflammatory indices to predict surgical intervention and the requirement for an additional dose of methotrexate. Area under the curve (AUC) values and 95% confidence intervals (CIs) were calculated using the DeLong method. Optimal cut-off points were identified with the Youden Index, and sensitivity, specificity, positive predictive value, and negative predictive value were subsequently derived.

Machine Learning Analysis

A supervised ML framework was applied to construct predictive models addressing two binary classification objectives: identifying patients requiring surgical intervention and predicting the need for an additional dose of methotrexate. These models were designed to support individualized risk stratification in the management of tubal ectopic pregnancy. A supervised ML framework was applied using standardized variables, an 80:20 stratified train–test split, and five-fold cross-validation, with performance evaluated using accuracy, AUC, and F1 score.

Prior to model training, comprehensive data preprocessing was undertaken. All continuous clinical, hematologic, and inflammatory variables were standardized using Z-score transformation to eliminate scale-related bias and ensure equal contribution of features during model learning. The proportion of missing data was low, accounting for less than two percent of all entries, and was addressed through mean imputation for continuous variables and mode imputation for categorical variables. The class distribution was examined for imbalance; because surgical intervention outcomes demonstrated moderate skewness, class-weighting strategies were incorporated into algorithms sensitive to class imbalance, such as logistic regression and support vector machines, to enhance model stability and predictive accuracy.

For feature selection, all available hematologic and inflammatory indices were initially included as candidate predictors. The relative importance of each variable was subsequently evaluated using two complementary approaches. First, the mean decrease in impurity derived from random forest models was calculated to assess each feature’s contribution to reducing classification error. Second, permutation-based importance analysis was performed to quantify the change in model performance following random shuffling of individual features, thereby directly measuring their impact on predictive accuracy. The results of these analyses are presented in Figures 1 and 2.

For predictive modeling, commonly used supervised classification algorithms were applied: logistic regression with L2 regularization, random forest, Gaussian naïve Bayes, support vector machine with a radial basis function kernel, and gradient boosting. Model hyperparameters were optimized using grid search strategies, with the number of trees in the random forest set to 500 and the optimal tree depth determined algorithmically. For the support vector machine model, the penalty parameter was tuned to achieve optimal classification performance, while gradient boosting models were trained using a learning rate of 0.1, with depth parameters refined through cross-validation.

To ensure robust evaluation, the dataset was randomly divided into training and testing subsets at an 80:20 ratio, stratified according to outcome categories, and a fixed random seed was applied to guarantee reproducibility. Hyperparameter tuning was conducted using five-fold cross-validation within the training dataset. Final model performance was assessed on the independent test dataset. For feature-importance analyses and graphical representations, models were subsequently retrained on the entire dataset to maximize statistical power and enhance generalizability.

Model performance was quantified using multiple complementary metrics. Overall classification accuracy was calculated to determine the proportion of correctly classified cases. Discriminative capacity, independent of decision thresholds, was assessed using the area under the ROC curve. The F1 score was employed to capture the balance between precision and recall, and confusion matrices were generated to examine the distribution of false-positive and false-negative predictions.

To identify the most influential variables associated with surgical intervention, feature importance analyses were performed using the final random forest model trained on the complete dataset. Variable contributions were first assessed using mean decrease in impurity, which reflects the relative importance of each feature in reducing classification error across decision trees. In addition, permutation-based importance analysis was conducted by randomly shuffling individual features and quantifying the resulting decline in model accuracy, thereby directly measuring the dependence of predictive performance on each variable. Both methods were applied to ensure consistency and robustness of feature ranking. Across both approaches, aggregate index of systemic inflammation, systemic inflammation response index, and FAR consistently emerged as the most influential predictors, followed by neutrophil-to-lymphocyte and PLR ratios. All machine-learning analyses and visualizations were performed using the scikit-learn library and Matplotlib.

Results

A total of 293 women diagnosed with TEP between January 2019 and December 2023 were included in this retrospective study. Among these patients, 192 (65.5%) achieved complete resolution with a single intramuscular dose of methotrexate, 54 (18.4%) required an additional dose of medical therapy, and 47 (16.0%) ultimately underwent surgical management.

Clinical Characteristics

A comparative analysis of clinical features by treatment outcome is presented in Table 1. The mean age did not differ significantly among the single-dose MTX group, the additional-dose group, and the surgical intervention group (p=0.195). Baseline serum β-hCG concentrations, adnexal mass dimensions, and gestational age at diagnosis were also comparable across the three groups, with no statistically significant differences observed (p=0.552, p=0.376, and p=0.203, respectively).

Vaginal bleeding at presentation was observed more frequently in patients who required an additional dose of MTX than in those who responded to a single dose or who proceeded to surgery; this difference was statistically significant (p=0.038). In contrast, the occurrence of abdominal pain was similar across all groups and was not significantly associated with treatment outcome (p=0.616).

Doppler ultrasonographic findings revealed marked differences between groups. The ipsilateral ovarian artery systolic/diastolic ratio was significantly higher in patients who underwent surgery than in those who were successfully treated with a single dose of MTX or who required an additional dose of MTX (p<0.001). Likewise, the pulsatility index was substantially elevated in the surgical group compared with both medical treatment groups, indicating increased vascular resistance in patients progressing to surgical management (p<0.001).

Hematologic and Inflammatory Parameters

Detailed comparisons of hematologic and inflammatory indices are provided in Table 2. The NLR demonstrated a stepwise increase across outcome groups, being lowest in patients successfully treated with a single dose of methotrexate, higher in those requiring an additional dose, and highest among patients undergoing surgical intervention (p<0.001).

A similar pattern was observed for the MLR, PLR, and eosinophil-to-lymphocyte ratio, all of which were significantly elevated in patients who required further medical treatment or surgery, compared with those who responded to initial therapy (p<0.01 for MLR and PLR; p=0.048 for ELR). Systemic inflammatory markers, including the systemic immune-inflammation index, systemic inflammation response index, aggregate index of systemic inflammation, and fibrinogen-to-albumin ratio, were also significantly higher in the additional-dose and surgical groups than in the single-dose group (all p<0.05). Consistent with these findings, C-reactive protein levels increased progressively across groups, with the highest values observed in patients who underwent surgery (p=0.002).

In contrast, the white blood cell-to-neutrophil ratio was significantly lower in the surgical intervention group than in patients successfully treated with a single dose of methotrexate, suggesting a shift toward neutrophil predominance in more severe disease (p=0.032).

ROC Analysis for Predicting Surgical Intervention

The ROC analysis results for inflammatory markers predicting surgical intervention are presented in Table 3. The AISI demonstrated the highest discriminatory ability, with an AUC of 0.929 (95% CI: 0.892–0.963) at a cut-off value of 221.678, yielding a sensitivity of 95.7% and a specificity of 77.2%. This was followed by the systemic inflammation response index (SIRI) (AUC=0.899, 95% CI: 0.846-0.944; cut-off 738.615; sensitivity 72.3%; specificity 95.5%) and the fibrinogen-to-albumin ratio (FAR) (AUC=0.847, 95% CI: 0.761-0.920; cut-off 0.129; sensitivity 76.6%; specificity 80.5%).

Other markers such as NLR, MLR, PLR, and SII also demonstrated good discriminatory performance (AUC range: 0.793–0.835), whereas the white blood cell to neutrophil ratio showed poor predictive value(AUC=0.241).

Model Performance for Surgical Intervention

The performances of ML models using inflammatory markers as predictors are summarized in Table 4. The Naive Bayes (NB) model achieved the highest overall performance, with an accuracy of 98.3%, an ROC AUC of 0.998, and an F1 score of 0.982. Logistic Regression (accuracy 96.6%, ROC AUC 0.996, F1=0.964) and Support Vector Machine (accuracy 94.9%, ROC AUC 0.996, F1=0.948) also demonstrated excellent predictive performance. Random Forest and Gradient Boosting yielded slightly lower accuracies (93.2% each) but maintained high ROC AUC values (0.991 and 0.989, respectively).

Feature importance rankings from the Random Forest analysis are illustrated in Figure 1, highlighting AISI, SIRI, and FAR as the most influential predictors of surgical requirement. Figure 2 depicts permutation-based feature importance, confirming the predominance of these indices.

ROC Analysis for Predicting Additional-dose MTX Requirement

The ROC analysis for predicting additional-dose MTX is shown in Table 5. Among the evaluated markers, NLR exhibited the highest predictive capacity (AUC=0.675, 95% CI: 0.603-0.747) with an optimal cut-off value of 3.842 (sensitivity 90.7%, specificity 43.5%). PLR (AUC=0.642) and ELR (AUC=0.626) provided moderate discrimination, whereas WBC/Neutrophil ratio (AUC=0.435) demonstrated poor predictive value.

Machine Learning Model Performance for Additional-dose MTX Prediction

As presented in Table 6, the best-performing models for predicting additional-dose MTX requirement were random forest and gradient boosting. Both models achieved an accuracy of 83.1%; their ROC AUC values were 0.884 and 0.896 for random forest and gradient boosting, respectively. A Support Vector Machine achieved an accuracy of 81.4% with an ROC AUC of 0.809. In contrast, Logistic Regression and Naive Bayes models achieved moderate accuracy (72.9%) and similar ROC AUC values (0.805 and 0.807, respectively).

Discussion

This study provides robust evidence that specific inflammatory indices, particularly AISI, SIRI, and FAR, strongly predict MTX treatment failure and the need for surgicalintervention in TEP. Furthermore, our results demonstrate that ML algorithms, especially Naïve Bayes and logistic regression, can achieve excellent predictive accuracy, outperforming traditional cut-off-based approaches.

Our findings are in line with recent studies investigating the prognostic utility of hematologic markers in ectopic pregnancy. Bilir et al.⁽⁸⁾ reported that hemogram-based indices such as NLR, PLR, and SII were significantly higher in patients requiring surgical management after MTX therapy, with AISI emerging as the most discriminatory parameter. Dinc and Issın⁽⁵⁾ showed that elevated SII values at presentation correlated strongly with the risk of tubal rupture, underscoring the role of systemic inflammation in the progression of ectopic pregnancy. Seyfettinoglu and Adiguzel⁽⁹⁾ further highlighted that combining multiple indices improved predictive performance compared to single parameters.

The application of ML in ectopic pregnancy prognosis remains nascent but shows promise. Chen et al.⁽¹¹⁾ used gradient boosting and random forest models to predict MTX success in ectopic pregnancy and found ROC-AUC values up to 0.94-comparable to the performance observed in our Naïve Bayes model (AUC=0.998). Our inclusion of Doppler ultrasonography parameters alongside inflammatory indices may partly explain the superior performance, as ultrasonographic vascular indices reflect tubal perfusion and inflammatory status. ROC-derived cut-off values should be interpreted as supportive risk indicators rather than absolute clinical thresholds and must be integrated with clinical assessment and β-hCG dynamics.

Clinically, integrating these biomarkers with ML-based prediction tools could facilitate personalized management strategies. High-risk patients identified at diagnosis could receive closer monitoring, earlier consideration for surgery, and tailored counseling, potentially reducing the incidence of rupture and associated morbidity.

Strengths of this study include its relatively large sample size for a single tertiary center, simultaneous evaluation of multiple inflammatory indices, and the methodological rigor in comparing classical statistical methods with diverse ML algorithms. The use of two independent feature importance approaches (mean decrease in impurity and permutation) strengthens the validity of our predictive variable ranking.

Study Limitations

However, several limitations should be noted. The retrospective design carries an inherent risk of selection bias. All data were derived from a single center, which may limit generalizability. Potential confounders, such as subclinical infections or inflammatory comorbidities, could influence hematologic indices but were not systematically excluded. Although model performance was high, external validation in multicenter prospective cohorts is necessary before clinical adoption.

Future research should focus on validating these findings in larger and more diverse populations, integrating additional biomarkers (e.g., cytokines, cell-free DNA), and developing real-time decision support tools within electronic medical record systems. Randomized controlled trials assessing whether ML-guided treatment decisions improve clinical outcomes compared to current practice would be of particular value. Additional limitations include the single-center design, the potential residual confounding affecting inflammatory markers, and the lack of external validation. Therefore, prospective multicenter studies are required before routine clinical implementation.

Conclusion

This study indicates that AISI, SIRI, and FAR are strong predictors of MTX outcomes in tubal ectopic pregnancy. Integrating these biomarkers into machine-learning models, particularly Naïve Bayes and logistic regression, significantly enhances predictive accuracy and supports individualized risk stratification. The routine use of these indices in clinical practice may facilitate earlier decision-making, optimize treatment selection, and reduce complication rates. However, multicenter prospective studies are required to validate these findings and to explore the added value of advanced imaging and novel molecular biomarkers.

Ethics

Ethics Committee Approval: The study protocol was reviewed and approved by the Ankara Bilkent City Hospital Institutional Ethics Committee (approval no: TABED 2-25-1311, date: 11.06.2025), and all procedures were conducted in accordance with the Declaration of Helsinki.

Informed Consent: Retrospective study.

Authorship Contributions

Surgical and Medical Practices: U.Z., G.K., M.İ.H., B.E., Concept: U.Z., Design: U.Z., Data Collection or Processing: G.K., M.İ.H., Analysis or Interpretation: U.Z., B.E., Literature Search: U.Z., S.A.D., G.K., M.İ.H., B.E., Writing: U.Z., S.A.D., G.K., M.İ.H., B.E.

Conflict of Interest: No conflict of interest was declared by the authors.

Financial Disclosure: The authors declared that this study received no financial support.

References

Varma R, Gupta J. Tubal ectopic pregnancy. BMJ Clin Evid. 2012;2012:1406.

PubMed

Gervaise A, Capella-Allouc S, Audibert F, Rongières-Bertrand C, Vincent Y, Fernandez H. Methotrexate for the treatment of unruptured tubal pregnancy: a prospective nonrandomized study. JSLS. 2003;7:233-8.

PubMed

Baskiran Y, Uçkan K, Karacor T, Çeleğen I, Acar Z. The impact of maternal electrolyte and albumin levels on the efficacy of single-dose methotrexate treatment for ectopic pregnancies. Turk J Obstet Gynecol. 2023;20:214-8.

CrossRef PubMed Google Scholar

Tong S, Skubisz MM, Horne AW. Molecular diagnostics and therapeutics for ectopic pregnancy. Mol Hum Reprod. 2015;21:126-35.

CrossRef PubMed Google Scholar

Dinc K, Issın G. Novel marker to predict rupture risk in tubal ectopic pregnancies: the systemic immune-inflammation index. Ginekol Pol. 2023;94:320-5.

CrossRef PubMed Google Scholar

Akkaya H, Uysal G. Can hematologic parameters predict treatment of ectopic pregnancy? Pak J Med Sci. 2017;33:937-42.

CrossRef PubMed Google Scholar

Baskiran Y, Uçkan K, Çeleğen I. Can failure be predicted in methotrexate treatment with the modified parameter? Arch Gynecol Obstet. 2024;310:477-83.

CrossRef PubMed Google Scholar

Bilir C, Soysal C, Biyik I, Ulas O, Erbakirci NM, Sari H, et al. The relationship between hemogram-based inflammatory indices and prognosis in ectopic pregnancy cases treated with methotrexate. Sci Rep. 2025;15:23114.

Seyfettinoglu S, Adiguzel FI. Prediction of tubal rupture in ectopic pregnancy using methotrexate treatment protocols and hematological markers. J Clin Med. 2023;12:6459.

CrossRef PubMed Google Scholar

Dereli ML, Savran Üçok B, Ozkan S, Sucu S, Topkara S, Firatligil FB, et al. The importance of blood-count-derived inflammatory markers in predicting methotrexate success in patients with tubal ectopic pregnancy. Int J Gynaecol Obstet. 2024;167:789-96.

Chen S, Chen XF, Qiu P, Huang YX, Deng GP, Gao J. Association between white blood cells at baseline and treatment failure of methotrexate for ectopic pregnancy. Front Med (Lausanne). 2021;8:722963.