If you want to download or save this thesis you can use the pdf link or the epub format too. The thesis has many links to make it easier to reach external resources, if printed they will display as blue text.

A Online resources

Some of the links that we found helpful during the course of the thesis and be useful for those interested in multi-omics.

Table A.1: Integration methods available and their references.
Method Publication
SCCA [241]
PCCA [242]
PMA [243]
sPLS [244]
gesca [245]
Regularized dual CCA [246]
RGCCA [115]
SNMNMF [247]
scca [248]
STATIS [249]
joint NMF [250]
sMBPLS [251]
Bayesian group factor analysis [133]
FactoMineR [252]
JIVE [135]
pandaR [253]
omicade4 [173]
STATegRa [174]
Joint factor model [254]
GFAsparse [255]
Sparse CCA [256]
CCAGFA [257]
CMF [258]
MOGSA [259]
iNMF [136]
BASS [260]
imputeMFA [261]
PLSCA [262]
mixOmics [263]
mixedCCA [264]
SLIDE [265]
fCCAC [266]
TSKCCA [267]
SMSMA [268]
AJIVE [269]
MOFA [270]
PCA+CCA [271]
JACA [272]
iPCA [273]
pCIA [274]
sSCCA [275]
SWCCA [276]
OmicsPLS [277]
SCCA-BC [278]
maui [281]
SmCCNet [282]
msPLS [283]
MOTA [284]
D-CCA [285]
COMBI [286]
DPCCA [287]
MultiPower [108]
  • Bookdown: A guide on how to write this type of book.

  • Bioconductor: A project about bioinformatics on R, primarily addressed to sequencing technologies.

  • CRAN: The principal archive of R extensions/packages for R.

  • GitHub: Company that allows users to freely host remote git repositories from many projects, including some used or developed during the course of this thesis.


108. Tarazona S, Balzano-Nogueira L, Gómez-Cabrero D, Schmidt A, Imhof A, Hankemeier T, et al. Harmonization of quality metrics and power calculation in multi-omic studies. Nature Communications. 2020;11:3092.
115. Tenenhaus A, Tenenhaus M. Regularized Generalized Canonical Correlation Analysis. Psychometrika. 2011;76:257–84.
128. Zhu J, Sova P, Xu Q, Dombek KM, Xu EY, Vu H, et al. Stitching together Multiple Data Dimensions Reveals Interacting Metabolomic and Transcriptomic Networks That Modulate Cell Regulation. PLOS Biology. 2012;10:e1001301.
133. Virtanen S, Klami A, Khan S, Kaski S. Bayesian Group Factor Analysis. PMLR; 2012. p. 1269–77.
135. Lock EF, Hoadley KA, Marron JS, Nobel AB. JOINT AND INDIVIDUAL VARIATION EXPLAINED (JIVE) FOR INTEGRATED ANALYSIS OF MULTIPLE DATA TYPES. The annals of applied statistics. 2013;7:523–42.
136. Yang Z, Michailidis G. A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics. 2016;32:1–8.
173. Meng C, Kuster B, Culhane AC, Gholami AM. A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics. 2014;15:162.
174. Planell N, Lagani V, Sebastian-Leon P, van der Kloet F, Ewing E, Karathanasis N, et al. STATegra: Multi-omics data integration a conceptual scheme with a bioinformatics pipeline. Frontiers in Genetics. 2021;12.
241. Parkhomenko E, Tritchler D, Beyene J. Sparse Canonical Correlation Analysis with Application to Genomic Data Integration. Statistical Applications in Genetics and Molecular Biology. 2009;8.
242. Waaijenborg S, Hamer PCV de W, Zwinderman AH. Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis. Statistical Applications in Genetics and Molecular Biology. 2008;7.
243. Witten DM, Tibshirani RJ. Extensions of sparse canonical correlation analysis with applications to genomic data. Statistical Applications in Genetics and Molecular Biology. 2009;8:127.
244. Lê Cao K-A, Martin PG, Robert-Granié C, Besse P. Sparse canonical methods for biological data integration: Application to a cross-platform study. BMC Bioinformatics. 2009;10:34.
245. Hwang H. Regularized Generalized Structured Component Analysis. Psychometrika. 2009;74:517–30.
246. Soneson C, Lilljebjörn H, Fioretos T, Fontes M. Integrative analysis of gene expression and copy number alterations using canonical correlation analysis. BMC Bioinformatics. 2010;11:191.
247. Zhang S, Li Q, Liu J, Zhou XJ. A novel computational framework for simultaneous integration of multiple types of genomic data to identify microRNA-gene regulatory modules. Bioinformatics. 2011;27:i401–9.
248. Lee W, Lee D, Lee Y, Pawitan Y. Sparse Canonical Covariance Analysis for High-throughput Data. Statistical Applications in Genetics and Molecular Biology. 2011;10.
249. Abdi H, Williams LJ, Valentin D, Bennani-Dosse M. STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling. WIREs Computational Statistics. 2012;4:124–67.
250. Zhang S, Liu C-C, Li W, Shen H, Laird PW, Zhou XJ. Discovery of multi-dimensional modules by integrative analysis of cancer genomic data. Nucleic Acids Research. 2012;40:9379–91.
251. Li W, Zhang S, Liu C-C, Zhou XJ. Identifying multi-layer gene regulatory modules from multi-dimensional genomic data. Bioinformatics. 2012;28:2458–66.
252. Abdi H, Williams LJ, Valentin D. Multiple factor analysis: principal component analysis for multitable and multiblock data sets. WIREs Computational Statistics. 2013;5:149–79.
253. Schlauch D, Paulson JN, Young A, Glass K, Quackenbush J. Estimating gene regulatory networks with pandaR. Bioinformatics. 2017;33:2232–4.
254. Ray P, Zheng L, Lucas J, Carin L. Bayesian joint analysis of heterogeneous genomics data. Bioinformatics. 2014;30:1370–6.
255. Bunte K, Leppäaho E, Saarinen I, Kaski S. Sparse group factor analysis for biclustering of multiple data sources. Bioinformatics. 2016;32:2457–63.
256. Chen M, Gao C, Ren Z, Zhou HH. Sparse CCA via precision adjusted iterative thresholding. arXiv:13116186 [math, stat]. 2013.
257. Leppäaho E, Ammad-ud-din M, Kaski S. GFA: Exploratory analysis of multiple data sources with group factor analysis. Journal of Machine Learning Research. 2017;18:1–5.
258. Klami A, Bouchard G, Tripathi A. Group-sparse embeddings in collective matrix factorization. arXiv:13125921 [cs, stat]. 2014.
259. Meng C, Basunia A, Peters B, Gholami AM, Kuster B, Culhane AC. MOGSA: integrative single sample gene-set analysis of multiple omics data. 2018.
260. Zhao S, Gao C, Mukherjee S, Engelhardt BE. Bayesian group latent factor analysis with structured sparsity. arXiv:14112698 [q-bio, stat]. 2015.
261. Voillet V, Besse P, Liaubet L, San Cristobal M, González I. Handling missing rows in multi-omics data integration: Multiple imputation in multiple factor analysis framework. BMC Bioinformatics. 2016;17:402.
262. Beaton D, Dunlop J, Abdi H, Alzheimer’s Disease Neuroimaging Initiative. Partial least squares correspondence analysis: A framework to simultaneously analyze behavioral and genetic data. Psychological Methods. 2016;21:621–51.
263. Singh A, Shannon CP, Gautier B, Rohart F, Vacher M, Tebbutt SJ, et al. DIABLO: An integrative approach for identifying key molecular drivers from multi-omics assays. Bioinformatics. 2019;35:3055–62.
264. Yoon G, Carroll RJ, Gaynanova I. Sparse semiparametric canonical correlation analysis for data of mixed types. arXiv:180705274 [stat]. 2019.
265. Gaynanova I, Li G. Structural learning and integrative decomposition of multi-view data. arXiv:170706573 [stat]. 2017.
266. Madrigal P. fCCAC: Functional canonical correlation analysis to evaluate covariance between nucleic acid sequencing datasets. Bioinformatics. 2017;33:746–8.
267. Yoshida K, Yoshimoto J, Doya K. Sparse kernel canonical correlation analysis for discovery of nonlinear interactions in high-dimensional data. BMC Bioinformatics. 2017;18:108.
268. Kawaguchi A, Yamashita F. Supervised multiblock sparse multivariable analysis with application to multimodal brain imaging genetics. Biostatistics. 2017;18:651–65.
269. Feng Q, Jiang M, Hannig J, Marron JS. Angle-based joint and individual variation explained. arXiv:170402060 [stat]. 2018.
270. Argelaguet R, Arnol D, Bredikhin D, Deloro Y, Velten B, Marioni JC, et al. MOFA+: A statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biology. 2020;21:111.
271. Brown BC, Bray NL, Pachter L. Expression reflects population structure. 2018. https://doi.org/10.1101/364448.
272. Zhang Y, Gaynanova I. Joint association and classification analysis of multi-view data. arXiv:181108511 [cs, stat]. 2020.
273. Tang TM, Allen GI. Integrated principal components analysis. arXiv:181000832 [stat]. 2021.
274. Min EJ, Safo SE, Long Q. Penalized co-inertia analysis with applications to -omics data. Bioinformatics (Oxford, England). 2019;35:1018–25.
275. Safo SE, Li S, Long Q. Integrative analysis of transcriptomic and metabolomic data via sparse canonical correlation analysis with incorporation of biological information. Biometrics. 2018;74:300–12.
276. Min W, Liu J, Zhang S. Sparse weighted canonical correlation analysis. arXiv:171004792 [cs, stat]. 2017.
277. Bouhaddani S el, Uh H-W, Jongbloed G, Hayward C, Klarić L, Kiełbasa SM, et al. Integrating omics datasets with the OmicsPLS package. BMC Bioinformatics. 2018;19:371.
278. Pimentel H, Zhiyue H, Huang H. Biclustering by sparse canonical correlation analysis. 2018;6:11.
279. Kim Y, Bismeijer T, Zwart W, Wessels LFA, Vis DJ. Genomic data integration by WON-PARAFAC identifies interpretable factors for predicting drug-sensitivity in vivo. Nature Communications. 2019;10:5034.
280. Lock EF, Park JY, Hoadley KA. Bidimensional linked matrix factorization for pan-omics pan-cancer analysis. arXiv:200202601 [cs, q-bio, stat]. 2020.
281. Ronen J, Hayat S, Akalin A. Evaluation of colorectal cancer subtypes and cell lines using deep learning. Life Science Alliance. 2019;2.
282. Shi WJ, Zhuang Y, Russell PH, Hobbs BD, Parker MM, Castaldi PJ, et al. Unsupervised discovery of phenotype-specific multi-omics networks. Bioinformatics. 2019;35:4336–43.
283. Csala A, Zwinderman AH, Hof MH. Multiset sparse partial least squares path modeling for high dimensional omics data analysis. BMC Bioinformatics. 2020;21:9.
284. Fan Z, Zhou Y, Ressom HW. MOTA: Network-Based Multi-Omic Data Integration for Biomarker Discovery. Metabolites. 2020;10:144.
285. Shu H, Wang X, Zhu H. D-CCA: A decomposition-based canonical correlation analysis for high-dimensional datasets. Journal of the American Statistical Association. 2020;115:292–306.
286. Hawinkel S, Bijnens L, Cao K-AL, Thas O. Model-based joint visualization of multiple compositional omics datasets. NAR Genomics and Bioinformatics. 2020;2.
287. Gundersen G, Dumitrascu B, Ash JT, Engelhardt BE. Uncertainty in Artificial Intelligence. PMLR; 2020. p. 945–55.
288. Velten B, Braunger JM, Arnol D, Argelaguet R, Stegle O. Identifying temporal and spatial patterns of variation from multi-modal data using MEFISTO. bioRxiv. 2020;2020.11.03.366674.