- Software
- Open access
- Published:
metacp: a versatile software package for combining dependent or independent p-values
BMC Bioinformatics volume 26, Article number: 109 (2025)
Abstract
Background
We present metacp an open-source software package which implements an abundance of statistical methods for the combination of both independent p-values, with methods such as Fisher’s, Stouffer’s and Edgington’s, and dependent p-values, with methods such as Brown’s method and the Cauchy Combination Test.
Results
The tool is available in Python and STATA, it is very fast, and it is easy to use, requiring only minimal input. It offers a useful resource for combining both independent and dependent p-values, responding to diverse analytical needs for practitioners performing meta-analyses and bioinformaticians developing tools for a variety of applications. Depending on the input data it can be used for gene-based testing, for analysis of multiple traits in GWAS, or for combining diverse multi-omics data such as those of a TWAS, a colocalization or an RNA-seq study.
Conclusions
Compared to other similar packages (like poolr or metap), metacp implements the largest collection of statistical methods for this problem, offering users the flexibility to choose from a wide variety of approaches. Being available both as a standalone Python tool and as a STATA command, metacp is accessible to a broad and diverse audience, including practitioners conducting meta-analyses across various fields and bioinformaticians developing new tools where p-value combination is a crucial component.
Background
Combining p-values is an important technique in research synthesis and meta-analysis methods in various fields, ranging from psychology and social sciences to biomedical research and genetics. For example, in genome wide association studies (GWAS), different Single Nucleotide Polymorphisms (SNPs) within a gene are combined in the so-called gene-based testing methods for integrating data from rare variants [1]. Other similar situations are encountered in linkage analysis [2], in multiple traits analysis [3] or in more general multi-omics methods like TWAS or Mendelian Randomization [4]. There are several relevant commands in standard statistical packages, as well as several other dedicated software tools, that all offer the user only a subset of the statistical methods available in the literature. In this work, we present a software tool that implements a wide, if not the widest, range of methods for combining p-values, both independent and dependent ones.
Implementation
We incorporate standard methods which in general are applicable for combining independent p-values. More specifically, we implement Fisher’s method [5], the inverse chi-squared method [6], as well as the related Lancaster’s method [7]. Moreover, we apply Stouffer’s method (z-score) [8], the weighted Stouffer’s method [9, 10] and the meanp (Edgington’s) method [11], all of which use the cumulative distribution function of the Normal distribution on z-scores, in addition to the logit method, which combines p-values using the cumulative distribution of T distribution, and the binomial test [12], where k null hypotheses with probability of rejection denoted by α and the number of tests leading to rejection denoted by r follows a binomial distribution. Regarding dependent p-values we implement the Cauchy Combination Test (CCT) [13], the MinP (Tippett’s) method [14], the combined MinP-CCT-MinP (MCM) and CCT-MinP-CCT (CMC) methods [15], Empirical Brown’s method (EBM) [16] along with its modifications [17], which extend Fisher’s method for the case of dependent p-values using an up-scaling distribution Χ2, \(\Psi { }\sim c\chi_{2f}^{2}\), as well as the modifications of Stouffer’s methods (with weights or not) for combining correlated tests (Strube’s method) [18] and the Harmonic Mean p-value (HMP) test [19]. We also implement Gao’s [20], Galwey’s [21], Cheverud [22]-Nyholt’s [23] and Li-Ji’s methods [24] that adjust for the number of effective tests in the methods that assume independence. This adjusted effective number of tests is applied to the Bonferroni’s basic formula for the combination of p-values. Similarly, another simple correction method using the intraclass correlation is applied to the standard Bonferroni method [25]. In Table 1 we list the main methods described in the text. Detailed information regarding the combining methods and their formulas are available in Supplementary Material.
Results
The tool is implemented in Python as a standalone program and in STATA as an ado command. metacp can be downloaded freely from www.github.com/pbagos/metacp along with the appropriate documentation. The STATA command is also available through STATA repositories, by running the command net search metacp within the STATA environment. Afterwards, the command help metacp will provide the necessary documentation. The program, in both versions, takes as input a text file of n rows and k columns, plus the headers, which contain the p-values or z-scores to be combined. Different rows can correspond to different SNPs or genes from a GWAS, or probes/genes from transcriptomics studies. The different columns can correspond to different statistical tests, different traits, different SNPs within a gene and so on, depending on the setting. In case the same SNP/gene appears in multiple rows, a meta-analysis can be performed prior to combining the different p-values from different columns. Nevertheless, the applicability is not limited in these cases since the program can be used even with different types of data (from social sciences, education, economics and so on) provided that the user specifies correctly the variables to be combined. The user should be reminded that the main difference compared to standard meta-analysis commands is the fact that the different statistics to be combined are needed in the same row (and not column). This is to facilitate the analysis of omics data. Thus, in case of a simple meta-analysis of say, k studies, the different estimates should be given in a single row. We need finally to remind the readers that for the methods that require input p-values it is the user’s responsibility to ensure that these are one-sided ones. In the case of two-sided tests, it is necessary to appropriately transform them taking into account the direction of the effect size, or to ensure that all effects have the same direction. For methods, like Stouffer’s and its variants, that use z-values the situation is clearer since the normal distribution can handle the direction through the sign of z-test.
To prove the validity of the results obtained for the combination methods, a comparison is made with the results of the MIN2 and MAX methods of the GWAR software package [26]. GWAR offers the most efficient implementation of several robust test statistics, including the MAX statistic and the MIN2 statistic, which are perhaps the most powerful such statistics, that is, they preserve high power when the underlying model of inheritance is not known, as usually is the case. The MAX test is based on the simple idea to test all three possible models of inheritance (dominant, recessive and co-dominant) and choose the one with the highest z-value (or equivalently, the minimum of the p-values). MIN2 is another robust approach which uses the Pearson chi-square of the 3 × 2 contingency table of the genotypes (which is a type of genetic-model free approach) along with the Cochrane Armitage Trend Test (CATT) for the co-dominant model, and, subsequently, chooses the minimum of the two p-values. Both tests need numerical integration for calculating an accurate p-value from the asymptotic null distributions. For further information see [26, 27].
We re-analyzed data from 67 SNPs from a GWAS that was performed to identify susceptibility loci for type-2 diabetes [28] and we compare the results obtained by GWAR using the MAX and the MIN2 statistics to the corresponding results obtained by metacp with the CCT, CMC, MCM and minP methods, which are the most suitable ones for such an analysis. As expected, (Fig. 1), the results obtained using metacp are in remarkable agreement with those obtained by GWAR. The advantage of metacp lies in its ability to compute combined p-values without relying on the complex and time-consuming numerical integrations required by GWAR.
A comparison of the statistical methods implemented in the metacp package for combining dependent p-values (CCT, MinP, MCM, CMC) against the methods implemented in GWAR that use numerical integration. We used the MAX and MIN2 methods of GWAR and applied them on 67 SNPs from a GWAS that was performed to identify susceptibility loci for type-2 diabetes. We use -log10 of the p-values value obtained by A the MIN2 method of the GWAR package and B the MAX method of the GWAR package. For each comparison between the methods, we show the R2 values that range from 0.9818 to 1.000, demonstrating consistent agreement across different approaches
We have also compared our results with calculations made with other similar packages, such as the poolr package [29] which implements some (but not all) of the methods for combining independent p-values, such as Fisher’s, Stouffer’s, as well as the Bonferroni’s method with adjustments proposed for the effective number of tests. In addition, our results have been compared to those of the Robust tests for combining p-values package [15] which implements the Cauchy (CCT), MCM, CMC and MinP methods for combining dependent p-values in R. In all cases the results obtained from metacp are identical to the other packages.
Conclusions
The tool presented in this work is very fast and easy to use, requiring only minimal input. It offers a useful resource for combining both independent and dependent p-values (or z-scores), responding to diverse analytical needs for researchers in a variety of fields. Compared to other similar general purpose packages like poolr [29], metap [30], or metapro [34], metacp offers the largest possible collection of analytical methods. As shown in Table 2, unlike other packages that support only a single or limited number of combining methods, the metacp package implements the largest collection of statistical methods for this problem, offering users the flexibility to choose from a wide variety of approaches. Being available both as a standalone Python tool and as a STATA command, metacp is accessible to a broad and diverse audience, including practitioners conducting meta-analyses across various fields and bioinformaticians developing new tools where p-value combination is a crucial component.
Combining p-values has many applications in bioinformatics, for example in sequence homology searches [37], in gene-based testing [38], in pathway and enrichment analysis [39, 40], in the analysis of multiple traits in GWAS [41], or for combining diverse multi-omics data such as those encountered in TWAS, colocalization or RNAseq studies. Of course, the various methods implemented are not all suitable for the same analysis. In some cases, especially when dealing with the combination of independent tests, the statistical properties of the tests are well-studied in the literature, and the user may choose the appropriate test considering the number of the test statistics to be combined or the weight of the evidence against the null hypothesis [42]. Thus, methods like Stouffer’s that use the normal distribution perform well in problems where evidence against the combined null is spread among several of the individual tests, or when the total evidence is weak. Methods like Fisher’s do best when the evidence is stronger and concentrated in a relatively small fraction of the individual tests [42]. Moreover, it has been shown that appropriate weighting may provide additional gains in power. Whitlock [43] initially showed that when all the alternatives have the same effect size, the weighted Stouffer’s method is superior to the unweighted one, as well as to the Fisher’s method. Chen has later shown that, under the same situation, the Lancaster’s method outperforms the weighted Stouffer’s method [44], but Zaykin provided evidence that the two methods have comparable power when the weights are set to square roots of sample sizes [45]. Thus, when the nature of the problem allows for appropriate weighting, such as in the case of meta-analysis of studies with different sample sizes, or in the analysis of multiple phenotypes from different datasets, the optimal choice would be the use of methods that allow weighting.
Regarding the combination of dependent tests, an area in which the research has been very active during the last years, the situation is more complicated, and the optimal choice is also highly problem dependent. For example, if the correlation among the tests can be reliably deduced by using the entire sample, as is the case of most tests in GWAS which utilize information from Linkage Disequilibrium (LD) or from correlated traits, then the methods that use this information may be preferable. For instance, Marczyk and coworkers have shown that a simple ad-hoc LD correction coupled with the Stouffer’s method could increase the performance of various gene-set analysis methods [46]. Similarly, in gene-based testing, extensive simulations have shown that methods that take into account the dependence among the tests due to LD provide proper control of the type I error rate. Among these methods, EBM was found to be the most robust with increased power in many scenarios [38]. The same method has also been shown to be very effective in the analysis of multiple correlated phenotypes [41]. Finally, there are cases in which the correlation structure of the test statistics to be combined is hard to decipher. This is the case in various multi-omics studies which integrate different types of omics data (GWAS, proteomics, gene expression, xQTL, WGS and so on) measured on different or overlapping subjects. Such cases include the multivariable Mendelian Randomization approach [47], high-dimensional mediation analysis with omics data [48], transcriptome-wide association studies across multiple tissues or cell types [49], gene-based multiple trait analysis [50] and many more. In such cases the robust methods that appeared in the literature during the last years, which do not explicitly model the correlation, like CCT, MCM, CMC, or HMP may be preferable.
Availability and requirements
Project name: metacp.
Project home page: www.github.com/pbagos/metacp
Operating system(s): Platform independent.
Programming language: Python and STATA.
Other requirements: Python 2.7 or higher, STATA 10 or higher.
License: GNU GPL-3.0
Any restrictions to use by non-academics: None.
Availability of data and materials
No datasets were generated or analysed during the current study.
References
Svishcheva GR, Belonogova NM, Zorkoltseva IV, Kirichenko AV, Axenovich TI. Gene-based association tests using GWAS summary statistics. Bioinformatics. 2019;35(19):3701–8.
Guerra R, Etzel CJ, Goldstein DR, Sain SR. Meta-analysis by combining p-values: simulated linkage studies. Genet Epidemiol. 1999;17(S1):S605–9.
Watanabe K, Stringer S, Frei O, Umićević Mirkov M, de Leeuw C, Polderman TJ, et al. A global overview of pleiotropy and genetic architecture in complex traits. Nat Genet. 2019;51(9):1339–48.
Kontou PI, Bagos PG. The goldmine of GWAS summary statistics: a systematic review of methods and tools. BioData Min. 2024;17(1):31.
Fisher RA. Statistical methods for research workers. Breakthroughs in statistics: Methodology and distribution. Springer; 1970. p. 66–70.
Bernardo JM, Smith AF. Bayesian theory. Wiley; 2009.
Lancaster H. The combination of probabilities: an application of orthonormal functions. Aust J Stat. 1961;3(1):20–33.
Stouffer SA, Suchman EA, Devinney LC, Star SA, Williams RM. The American soldier: adjustment during Army life. Princeton: Princeton University Press; 1949.
Lipták T. On the combination of independent tests. Magyar Tud Akad Mat Kutato Int Kozl. 1958;3:171–97.
Mosteller F, Bush RR. Selected quantitative techniques. Handbook of social psychology. Cambridge, Mass: Addison-Wesley; 1954.
Edgington ES. An additive method for combining probability values from independent experiments. J Psychol. 1972;80(2):351–63.
Wilkinson B. A statistical consideration in psychological research. Psychol Bull. 1951;48(2):156.
Liu Y, Xie J. Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures. J Am Stat Assoc. 2020;115(529):393–402.
Tippett LHC. The methods of statistics. 1931.
Chen Z. Robust tests for combining p-values under arbitrary dependency structures. Sci Rep. 2022;12(1):3158.
Brown MB. A method for combining non-independent, one-sided tests of significance. Biometrics. 1975:987–92.
Poole W, Gibbs DL, Shmulevich I, Bernard B, Knijnenburg TA. Combining dependent P-values with an empirical adaptation of Brown’s method. Bioinformatics. 2016;32(17):i430–6.
Strube MJ. Combining and comparing significance levels from nonindependent hypothesis tests. Psychol Bull. 1985;97(2):334.
Wilson DJ. The harmonic mean p-value for combining dependent tests. Proc Natl Acad Sci. 2019;116(4):1195–200.
Gao X, Starmer J, Martin ER. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol Off Publ Int Genet Epidemiol Soc. 2008;32(4):361–9.
Galwey NW. A new measure of the effective number of tests, a practical tool for comparing families of non-independent significance tests. Genet Epidemiol Off Publ Int Genet Epidemiol Soc. 2009;33(7):559–68.
Cheverud JM. A simple correction for multiple comparisons in interval mapping genome scans. Heredity. 2001;87(1):52–8.
Nyholt DR. A simple correction for multiple testing for single-nucleotide polymorphisms in linkage disequilibrium with each other. Am J Hum Genet. 2004;74(4):765–9.
Li J, Ji L. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix. Heredity. 2005;95(3):221–7.
Shi Q, Pavey ES, Carter RE. Bonferroni-based correction factor for multiple, correlated endpoints. Pharm Stat. 2012;11(4):300–9.
Dimou NL, Tsirigos KD, Elofsson A, Bagos PG. GWAR: robust analysis and meta-analysis of genome-wide association studies. Bioinformatics. 2017;33(10):1521–7.
Bagos PG. Genetic model selection in genome-wide association studies: robust methods and the use of meta-analysis. Stat Appl Genet Mol Biol. 2013;12(3):285–308.
Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, et al. A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature. 2007;445(7130):881–5.
Cinar O, Viechtbauer W. The poolr package for combining independent and dependent p values. J Stat Softw. 2022;101:1–42.
Dewey M. metap: Meta-analysis of significance values. R package version 0.7. 2016.
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–72.
Besançon M, Papamarkou T, Anthoff D, Arslan A, Byrne S, Lin D, et al. Definition and modeling of probability distributions in the JuliaStats ecosystem. J Stat Softw. 2021;98:1–30.
Seabold S, Perktold J. Statsmodels: econometric and statistical modeling with python. SciPy. 2010;7(1):92–6.
Yoon S, Baik B, Park T, Nam D. Powerful p-value combination methods to detect incomplete association. Sci Rep. 2021;11(1):6980.
Schröder MS, Culhane AC, Quackenbush J, Haibe-Kains B. survcomp: an R/Bioconductor package for performance assessment and comparison of survival models. Bioinformatics. 2011;27(22):3206–8.
Zhang H, Tong T, Landers J, Wu Z. TFisher: A powerful truncation and weighting procedure for combining p-values. 2020.
Bailey TL, Gribskov M. Combining evidence using p-values: application to sequence homology searches. Bioinformatics (Oxford, England). 1998;14(1):48–54.
Cinar O, Viechtbauer W. A comparison of methods for gene-based testing that account for linkage disequilibrium. Front Genet. 2022;13: 867724.
Weng L, Macciardi F, Subramanian A, Guffanti G, Potkin SG, Yu Z, et al. SNP-based pathway enrichment analysis for genome-wide association studies. BMC Bioinform. 2011;12:1–9.
Yu K, Li Q, Bergen AW, Pfeiffer RM, Rosenberg PS, Caporaso N, et al. Pathway analysis by adaptive combination of P-values. Genet Epidemiol Off Publ Int Genet Epidemiol Soc. 2009;33(8):700–9.
Yang JJ, Li J, Williams LK, Buu A. An efficient genome-wide association test for multivariate phenotypes based on the Fisher combination function. BMC Bioinform. 2016;17:1–11.
Loughin TM. A systematic comparison of methods for combining p-values from independent tests. Comput Stat Data Anal. 2004;47(3):467–85.
Whitlock MC. Combining probability from independent tests: the weighted Z-method is superior to Fisher’s approach. J Evol Biol. 2005;18(5):1368–73.
Chen Z. Is the weighted z-test the best method for combining probabilities from independent tests? J Evol Biol. 2011;24(4):926–30.
Zaykin DV. Optimally weighted Z-test is a powerful method for combining probabilities in meta-analysis. J Evol Biol. 2011;24(8):1836–41.
Marczyk M, Macioszek A, Tobiasz J, Polanska J, Zyla J. Importance of SNP dependency correction and association integration for gene set analysis in genome-wide association studies. Front Genet. 2021;12: 767358.
Jin C, Lee B, Shen L, Long Q. Integrating multi-omics summary data using a Mendelian randomization framework. Brief Bioinform. 2022;23(6).
Zeng P, Shao Z, Zhou X. Statistical methods for mediation analysis in the era of high-throughput genomics: Current successes and future challenges. Comput Struct Biotechnol J. 2021;19:3209–24.
Gao G, Fiorica PN, McClellan J, Barbeira AN, Li JL, Olopade OI, et al. A joint transcriptome-wide association study across multiple tissues identifies candidate breast cancer susceptibility genes. Am J Hum Genet. 2023;110(6):950–62.
Deng Y, He T, Fang R, Li S, Cao H, Cui Y. Genome-wide gene-based multi-trait analysis. Front Genet. 2020;11:437.
Acknowledgements
Part of the work and findings presented in this manuscript were previously included in an abstract published in the proceedings of the 45 th Annual Conference of the International Society for Clinical Biostatistics (ISCB) – Thessaloniki, Greece, 21-25 July 2024. The abstract is available in the conference program and abstract book, accessible at: [https://iscb.international/wp-content/uploads/2024/09/ISCB2024Program_AbstractBook.pdf]. The authors would like to thank the editor and the two reviewers whose comments and constructive criticism helped in improving the quality of the manuscript.
Funding
This work is funded by the project “Bridging big omic, genetic and medical data for Precision Medicine implementation in Greece” (TAEDR-0539180) which is carried out within the framework of the National Recovery and Resilience Plan Greece 2.0, funded by the European Union –NextGeneration EU.
Author information
Authors and Affiliations
Contributions
Conceptualization, P.G.B.; methodology, P.G.B., E.K.N., P.I.K.; software, E.K.N., P.I.K.; validation E.K.N., P.I.K.; writing—review and editing, P.G.B., E.K.N., P.I.K. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Nikolitsa, E.K., Kontou, P.I. & Bagos, P.G. metacp: a versatile software package for combining dependent or independent p-values. BMC Bioinformatics 26, 109 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06126-z
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12859-025-06126-z