Transcriptome-Wide Co-Expression of Small Non-Coding RNAs and Genes In Cancer
Taylor B. Cavazos1, Aiden M. Sababi1, Jeffrey Wang1, Alexander J. Lazar2, Patrick A. Arensdorf1, Hani Goodarzi3, Fereydoun Hormozdiari1, Babak Alipanahi1
1Exai Bio Inc., Palo Alto, CA; 2UT MD Anderson Cancer Center, Houston, TX; 3University of California San Francisco, San Francisco CA
Background
Small non-coding RNAs (smRNAs) are a diverse class of molecules shown to have post-transcriptional roles in gene expression regulation across many human diseases.
While some smRNAs are well characterized, the majority remain unannotated and have unknown biological functions.
Understanding the co-expression of smRNAs with genes may provide insights into the regulatory functions of currently unannotated smRNAs.
Goals
Systematically identify smRNAs co-expressed with proximal and distal genes in tumors.
Characterize co-expressed smRNA, gene pairs and explore their regulatory links.
Methods
The study cohort included 4,262 individuals with smRNA-seq and RNA-seq data measured in tumors from The Cancer Genome Atlas (TCGA) representing six major cancer types (breast, colorectal, kidney, lung, prostate, and thyroid).
For RNA-seq data, following GTEx guidelines, we removed genes with <20 reads that were expressed in <20% of samples. Expression data then underwent TMM normalization and rank inverse normal transformation. For small RNA-seq data, we previously developed methodology for identifying novel small RNAs within the TCGA cohort resulting in 1.2% known and 98.8% de-novo smRNA species [1].
Following quality control, we tested for associations between expression of ~32k genes (lncRNAs and mRNAs) and ~530k smRNAs, that were detected in at least 1% of samples.
For each tumor tissue, a quantitative trait loci (QTL) analysis was performed using QTLTools [2] for all smRNAs, in cis (within a 1Mb region of each gene's transcription start site) and in trans (not within 1Mb), to identify the top smRNA-gene associations corrected for multiple testing through a permutation schema within each gene for cis analyses and across genes for trans analyses.
We additionally corrected for multiple testing across gene tests in our cis analysis, reporting those cis-smRNA, gene pairs with a significant FDR adjusted p-value (q<0.05). For trans-QTLs, gene expression was permuted and used to generate a null distribution, with FDR adjusted p-values averaged across 200 permutations and significant (q<0.05) results reported.
Result 1: Summary of cis- and trans- smRNA-QTL Findings per Tumor Tissue
Result 2. Characterization of cis- smRNA-QTLs and Annotations Across Tumor Tissues
Figure 1. (A) eGenes per cis- smRNA-QTLs, (B) known smRNA annotations, and (C) genomic annotations
Overall, there were 12,515 cis- smRNA-QTLs for 13,485 genes with an average of 1.4 (median=1) co-expressed genes per cis- smRNA-QTL. Average eGenes per cis- smRNA-QTLs for each tumor tissue is shown on the right y axis as a red dot (A).
While some cis- smRNA-QTLs represented known miRNAs (3.10%) or other smRNA biotypes (1.50%), the majority (95.4%) were previously unannotated smRNAs (B).
The cis- smRNA-QTLs were annotated based on distance from their eGene (upstream or downstream) or whether they were located inside the eGene on the same or opposite strand (*) within the exon or intron (C).
23.0% (5.9% exonic, 17.1% intronic) of annotated and 40.1% (27.1% exonic, 13.0% intronic) of unannotated cis- smRNA-QTLs were within the boundaries of their eGene.
Result 3. Regulatory Effects and Functional Enrichment of smRNA-QTL Associations
Figure 2. Select trans- smRNA-QTLs and gene targets
We identified 15,118 trans- smRNA-QTLs for 6,571 genes. There were an average of 5.2 (median=1) trans- smRNA-QTLs per gene.
Overall, 7.85% of cis- smRNA-QTLs also had a significant association in trans.
The trans- smRNA-QTLs were primarily unannotated (95.9%), with only 4.05% mapping to known smRNA biotypes.
The circos plot shows select trans- smRNA-QTLs with the largest number of associated genes. We replicate known interactions between hsa-mir-210 and genes in trans, including VEGFA and PDK1 [3].
Figure 3. Enrichment of smRNA-QTLs among cancer-related genes
2,764 (of 6,571) genes with a co-expressed smRNA in trans did not have a significant association in cis.
Compared to all genes, genes with a co-expressed smRNA, in cis or trans, were found to be overrepresented among copy-number amplifications (CNAs) in cancer [4] and common cancer genes [5] using gene-set enrichment analysis (q<1e-4)
Conclusions
We have demonstrated the ability to detect co-expression of smRNAs and genes transcriptome-wide with high-throughput sequencing data.
Given the demonstrated prevalence of smRNAs in body fluids, our results highlight the potential of these molecules as harbingers of cancer-specific molecular signatures during tumor progression.
Disclosures:
TC, JW , and FH are full-time employees of Exai Bio. AS is a parttime consultant at Exai Bio. BA and PA are co-founders, stockholders, and full-time employees of Exai Bio. HG is co-founder, stockholder, and advisor of Exai Bio.
References:
Fish L, et al. Nature Med. 2018;24:1743-51.
Delaneau O, et al. Nature Commun. 2017; e15452.
Huang X, et al. Trends Mol Med. 2012;16(5):230-37.