Fig. 2

Evaluation of the HRD prediction model in the TCGA-pan cancer. a Cross-validation analysis of the machine-learning model. The x-axis denotes the number of elastic net cross-validation iterations. The left-y-axis signifies the count of features (genes), while the right-y axis indicates Pearson’s correlation coefficient (PCC) with the scarHRD score post-machine learning. Black closed circles linked by solid lines and white circles connected by dotted lines correspond to the gene count and PCC, respectively, across each cross-validation step. b Correlation pattern across TCGA-pan cancer cohorts. Bar graph depicting the PCC between the predicted HRD score and scarHRD score in the TCGA-pan cancer test set, encompassing various cancer types including KIRP (kidney renal clear papillary cell carcinoma), UCEC (uterine corpus endometrial carcinoma), BRCA (breast invasive carcinoma), KICH (kidney chromophobe), BLCA (bladder urothelial carcinoma), CESC (cervical squamous cell carcinoma and endocervical adenocarcinoma), OV (ovarian serous cystadenocarcinoma), STAD (stomach adenocarcinoma), SARC (sarcoma), UCS (uterine carcinosarcoma), LIHC (liver hepatocellular carcinoma), PRAD (prostate adenocarcinoma), LGG (brain lower grade glioma), TNBC (triplet negative breast cancer), HNSC (head and neck squamous cell carcinoma), MESO (mesothelioma), READ (rectum adenocarcinoma), SKCM (skin cutaneous melanoma), LUAD (lung adenocarcinoma), PAAD (pancreatic adenocarcinoma), ESCA (esophageal carcinoma), COAD (colon adenocarcinoma), KIRC (kidney renal clear cell carcinoma), ACC (adrenocortical carcinoma), LUSC (lung squamous cell carcinoma), THYM (thymoma), CHOL (cholangiocarcinoma), PCPG (pheochromocytoma and paraganglioma), GBM (glioblastoma multiforme), THCA (thyroid carcinoma), UVM (uveal melanoma), DLBC (lymphoid neoplasm diffuse large B-cell lymphoma), and TGCT (testicular germ cell tumours). Significance levels denoted as *, **, and *** indicate P-values < 0.05, < 0.001, and < 0.0001, respectively. The frequency of HRD (scarHRD score ≥ 42) in each tumour type is displayed. c Correlation between scarHRD and predicted HRD score (pHRD) in the TCGA-pan cancer test set. Pearson’s correlation-regression line was calculated, with the dark dotted line illustrating pan-cancer correlation and the red line representing TCGA-OV set correlation. The numeric number in each bar plot represents the frequency of HRD positive samples in cancer types. Frequency: the number of HRD positive sample / the number of sample with scarHRD score