Statistics and Data Science Seminar
Hongyuan Cao
Florida State University
Testing composite null hypotheses with high-dimensional dependent data.
Abstract: Testing composite null hypotheses is fundamental to many scientific applications, including mediation and replicability analyses, and becomes particularly challenging in high-throughput settings involving tens of thousands of features. Existing high-dimensional composite null hypotheses testing often ignores the dependence structure among features, leading to overly conservative or liberal results. To address this limitation, we develop a four-state hidden Markov model (HMM) for bivariate $p$-value sequences arising from two-study replicability analysis. This model captures local dependence among features and accommodates study-specific heterogeneity. Based on the HMM, we propose a multiple testing procedure that asymptotically controls the false discovery rate (FDR). Extending this framework to more than two studies is computationally intensive, with complexity growing exponentially in the number of studies $n.$ To address this scalability issue, we introduce a novel e-value framework that reduces computational complexity to quadratic in $n,$ while preserving asymptotic FDR control. Extensive simulations demonstrate that our method achieves higher power than existing approaches at comparable FDR levels. When applied to genome-wide association studies (GWAS), the proposed approach identifies novel biological findings that are missed by current methods.
Wednesday September 24, 2025 at 4:15 PM in 636 SEO