Table of Contents
- Column Label Descriptions
- Chromosome Location Search
- Recent updates
- Contact us
1. Annotating PolymiRTS from CLASH experiment data
Cross linking, ligation, and sequencing of hybrids (CLASH) is a newly developed technique for high throughput mapping of RNA-RNA interactions1 that has recently been used for direct observation of miRNA-mRNA target pairs associated with human AGO12. While previous methods for high-througput identification of miRNA target sites (e.g., PAR-CLIP) identified only target site sequences and, then, relied on computational scans for complementary miRNA seeds to predict the targeting miRNAs, CLASH provides chimeric reads of miRNA and target site sequences and, therefore, directly identifies the targeting miRNA and allows for improved determination of non-canonical targeting. The CLASH dataset included in this update to the PolymiRTS database contains 18,514 high confidence canonical and non-canonical target sites of 399 different miRNAs2. We mapped the miRNA target sites from transcript to genomic location using the genomic start and end locations of exons from Ensembl Biomart3. SNPs and INDELs in the genomic locations of the target sites were then collected from dbSNPs build 137 table in the UCSC table browser by pasting the locations within the "define region" option. We found ~24,000 SNPs and INDELs in the miRNA target sites from CLASH experiment data.
2. Collecting SNPs and INDELs in the 3'-UTR
SNPs and INDELs in dbSNP build 137 within 3'UTRs of all RefSeq genes were collected using the ALL SNPs 137 track in the UCSC table browser. Specifically, we selected the following filter options: 3'-UTRs, SNP, insertion, deletion and INDEL for the mouse (mm10) and human (hg19) genomes.
3. Identifying and annotating PolymiRTS
For each SNP/INDEL, we assessed whether its two alleles lead to different miRNA target sites. We only consider those 3'-UTR SNPs that affect the match to the seed region of the miRNA. Mature miRNA sequences were downloaded from the miRBase (mirbase.org). We used the criteria of TargetScan4 in the prediction of miRNA sites. Basically, besides requiring a perfect Watson-Crick match to the seed nucleotides 2-7 of miRNA, we further require that there is either a perfect match to the 8th nucleotide of miRNA, or an anchor adenosine immediately downstream the 2-7 seed in the target.
We assigned the PolymiRTS to one of the four classes: 'D' (the derived allele disrupts a conserved miRNA site), 'N' (the derived allele disrupts a nonconserved miRNA site), 'C' (the derived allele creates a new miRNA site) and 'O' (other cases when the ancestral allele can not be determined unambiguously). PolymiRTS of class 'C' may cause abnormal gene repression and PolymiRTS of class 'D' may cause loss of normal repression control. These two classes of PolymiRTS are most likely to have functional impacts. We used the pre-calculated Multiz alignments of vertebrate genomes to derive the annotations. For a miRNA site to be conserved, we require that it is present in at least two other vertebrate genomes in addition to the query genome. For mouse SNPs, their ancestral alleles were determined by mouse vs. rat (rn4) genome alignment. For human SNPs, their ancestral alleles were determined by human vs. chimpanzee (panTro2) genome alignment. Additionally, we also categorized PolymiRTS with A/G alleles because they are supposed to be less deleterious with their ability to form G:U wobble base-pairs with miRNAs.
In a recent update, TargetScan introduced a context+ score for selection of favorable sites5. Context+ scores predict the binding of a miRNA to the entire 3'-UTR by summing over contributions made by individual sites within the 3'-UTR that have perfect sequence complementarity to the miRNA seed region. We have included the differences in context+ scores between the reference and derived alleles for each SNP or INDEL in putative miRNA target sites. A more negative value of the context+ score difference indicates an increased likelihood that the miRNA targeting is disrupted or newly created by the mutation in the target sites.
4. Experimentally supported targets
Experimentally supported miRNA-target interactions were collected from several sources. Three databases, miRecords (http://mirecords.biolead.org/), TarBase (http://diana.cslab.ece.ntua.gr/tarbase/), and miTarBase (http://mirtarbase.mbc.nctu.edu.tw/), contain collections of miRNA targets from both low- and high-throughput experiments. Additionally, several experimental techniques, such as the HITS-CLIP6and PAR-CLIP7 have recently been developed and used to identify the specific mRNA sequences that bind with miRNAs in the RNA-inducing silencing complex (RISC). To include data from these experiments, we first obtained the mRNA sequences bound in RISCs from ago.rockefeller.edu, which is associated with the HITS-CLIP experiment, and the supplementary material from Ref. 7. We also included the 11 miRNA-mRNA pairs with the 11 highest allelic imbalance ratios from Kim and Bartel8. The type of experiment (high- or low-throughput) used to support the miRNA-mRNA interaction, as well as if the experiment identified only the mRNAs that bind to the miRNA or the specific locations that are targeted by the miRNA were also determined. See the "Column Label Description" section for further discussion of the classification scheme used.
5. SNPs and INDELs in miRNA seed regions
For each miRNA, we collected all SNPs and INDELs in the seed regions from dbSNP build 137. We found 144 SNPs and 36 INDELs within miRNA seeds in mouse and 271 SNPs and 23 INDELs within miRNA seeds in human. We extracted the entire 3'-UTR of all RefSeq genes in UCSC genome database for the mouse and human genomes and used TargetScan to identify all predicted target sites that would be either disrupted or created by the SNPs in the miRNA seed regions. Disrupted sites are targets of the miRNA for seeds with the reference allele at the SNP location, while the created sites are targets of the miRNA for seeds with the derived allele. For INDELs in miRNA seeds we considered all the targets of the miRNA for seeds with the reference allele at the INDEL location get disrupted for the derived allele.
6. Assessing PolymiRTS in cis-acting eQTLs
Genes with both cis-acting eQTL and PolymiRTS are featured in the database.
For mouse, gene expression levels, which are publically available in the GeneNetwork (www.genenetwork.org), in nine tissues (whole brain, cerebellum, eye, hippocampus, kidney, liver, nucleus accumbens, prefrontal cortex, and retina) in the BXD recombinant inbred panel were examined. Gene expression levels were treated as quantitative traits and were mapped onto genomic regions (eQTL) using standard marker regression. A gene is said to have a significant cis-acting eQTL if the QTL peak location is within 5 Mb from the gene's physical location and the genome-wide significance level was < 0.05.
Two methods were used to identify genes with cis-eQTLs in humans. First, gene expression levels in lymphoblastoid cells of 194 human individuals from 14 CEPH families were downloaded from the GEO database and the raw data were processed by using the RMA protocol. Genotypes for 1628 autosomal SNP markers were downloaded from The SNP Consortium database. We used Merlin to remove genotype errors and perform family-based linkage analysis. A gene is said to have a cis-acting eQTL if the LOD peak location is within 10 Mb from the gene's physical location and the p-value is <0.05. Second, the cis-eQTLs identified in a variety of literature sources were included in the database. These eQTLs include all of the records contained in the GTEx eQTL browser (http://www.ncbi.nlm.nih.gov/gtex/test/GTEX2/gtex.cgi) as of September 2011 as well as in 5 additional studies in skin9 , cortex10, monocytes11, and lymphoblasoid cells12,13.
7. Assessing PolymiRTS in pQTLs
For mouse, we first mapped QTLs (genome-wide significance < 0.1) for more than 2000 published BXD phenotypes (physiological/behavioral traits). For each QTL, we linked it with genes that are physically located in the QTL interval and have at least one PolymiRTS. These genes, together with genes with nonsynonymous SNPs, are candidate causal genes underlying the pQTL.
For human, we collected all genes corresponding to SNPs associated with human diseases and traits in the NHGRI GWAS catalog (www.genome.gov/gwastudies) and dbGaP (http://www.ncbi.nlm.nih.gov/gap) and compared them with the list of genes with SNPs in miRNA target site in our PolymiRTS database.
The database can be browsed by six different criteria:
1. Polymorphic miRNA target sites from CLASH experiment data
This table display all the miRNA target sites from CLASH experiment data that also have SNPs or INDELs in the target sites.
2. Polymorphic miRNA target sites with other types of experimental support
This table displays miRNA-gene target pairs that have been experimentally supported with experiments other than CLASH and contain SNPs in target sites.
3. Genes with SNPs and INDELs in their miRNA target sites
This table displays all genes that contain SNPs in predicted miRNA target sites. It can be filtered to select only genes that contain target sites with specific functional classes or gene symbols that start with certain characters. Clicking on the RefSeq ID provides a table with specific details of the SNPs in the miRNA target sites as well as regulation of the gene by cis-eQTLs associations with complex traits. This table can also be filtered based on conservation, functional class, and experimental support. See the "Column Label Definitions" section for further description of these categories.
4. SNPs and INDELs in miRNA seeds
This table displays miRNAs with SNPs and INDELs in seed regions. For each miRNA with a seed SNP, two tables are available: one containing genes with putative target sites that are disrupted by the derived allele in the miRNA seed and one containing genes with putative target sites that are created by the derived allele in the seed. For each miRNA with a seed INDEL one table is available with a list of genes with putative target sites that are disrupted by the derived allele in the miRNA seed.
5. Diseases and traits
This table displays all genes in the PolymiRTS database that also have been associated with human diseases or traits in GWAS studies.
6. Gene pathways for genes with SNPs and INDELs in their miRNA target sites
This table displays gene pathways from the KEGG database. The drag down menu in the browse page gives options to select either "Experimental" or "ALL" for both mouse and human. The option "Experimental" gives pathways with genes that have SNPs or INDELs in experimentally supported miRNA target sites, including CLASH target sites for human. The option "ALL" lists the pathways with all genes in the database that have SNPs or INDELs in all putative miRNA target sites.
| Column Label Descriptions
||SNP location in the mRNA transcript. It is a zero-based number.
||Link to dbSNP.
||Whether the SNP can form a G:U wobble basepair with the miRNA. Y: Yes; N: No.
||If applicable, the ancestral allele is denoted.
||Two alleles of the SNP in the mRNA transcript.
||Genotypes of two mouse inbred strains to be compared. The default compares C57BL/6J with DBA/2J.
||Link to miRBase.
||Occurrence of the miRNA site in other vertebrate genomes in addition to the query genome. By clicking the hyperlink, the users can examine the genomes in which this miRNA target site occurs.
D: The derived allele disrupts a conserved miRNA site (ancestral allele with support >= 2).
N: The derived allele disrupts a nonconserved miRNA site (ancestral allele with support < 2).
C: The derived allele creates a new miRNA site.
O: The ancestral allele can not be determined.
||Sequence context of the miRNA site. Bases complementary to the seed region are in capital letters and SNPs are highlighted in red.
LT: The miRNA-mRNA interaction is supported by a low-throughput experiment (e.g., luciferase reporter assay or Western blot).
HT: The miRNA-mRNA interaction is supported by a high-throughput experiment (e.g., microarray or pSILAC).
LTL: The miRNA targeting the specific location is supported by a low-throughput experiment (e.g., allelic imbalance sequencing).
HTL: The miRNA targeting the specific location is supported by a high-throughput experiment (e.g., HITS-CLIP).
N: Predicted target site with no experimental support.
|The user can query the database by SNP ID (e.g. rs13176), miRNA ID (e.g. hsa-miR-9), RefSeq ID (e.g. NM_000051), HUGO gene identifier (e.g. CDK4), any word in the gene or trait description, and GO accession number or name. The search results can also be filtered based on functional class, conservation, and experimental support. See the "Column Label Descriptions" section for further description of these categories.
| Chromosome Location Search
|This feature is designed for researchers who have obtained QTL (genomic loci) controlling traits of their interests and want to look through a functional polymorphism set (all PolymiRTSs for example) within this genomic region to identify the causal variant. For mouse, we provide the inbred strain comparison option so that the query only searches against the SNPs between the two selected strains.
|We recently completed a significant update to the PolymiRTS to both significantly expand the database content as well as provide new features, including data from CLASH experiments in the PolymiRTS database to provide more complete and accurate miRNA-mRNA interactions. Other significant new features include: (i) small insertions and deletions (INDELs) in miRNA sequences and miRNA target sites, (ii) TargetScan context+ score differences for assessing the impact of polymorphic miRNA-mRNA interactions and (iii) biological pathways. The following table summarizes the changes in PolymiRTS database as of 9/10/2013.
||dbSNP build 132
||dbSNP build 137
|Number of miRNA (mouse)
|Number of miRNA (human)
|Number of SNPs/INDELs in miRNA target sites (human) from CLASH experiment data
|Number of SNPs in miRNA target sites (mouse)
|Number of SNPs in miRNA target sites (human)
|Number of INDELs in miRNA target sites (mouse)
|Number of INDELs in miRNA target sites (human)
|Experimentally validated miRNA targets (mouse)
|Experimentally validated miRNA targets (human)
|Number of target sites with a change in Context+ scorecaused by polymorphisms (mouse)
|Number of target sites with a change in Context+ scorecaused by polymorphisms (human)
|Number of SNPs in miRNA seed regions (mouse)
|Number of SNPs in miRNA seed regions (human)
|Number of INDELs in miRNA seed regions (mouse)
|Number of INDELs in miRNA seed regions (human)
|Number of genes associated with human traits in GWAS
|Number of genes associated with gene pathways (mouse)
|Number of genes associated with gene pathways (human)
|We offer flat file downloads for the database, including the main records of human and mouse PolymiRTS as well as a list of genes with cis-acting eQTLs. A description of the files available for download is provided in File_description.txt.
| 1. Kudla, G., et al. 2011. Cross-linking, ligation, and sequencing of hybrids reveals RNA-RNA interactions in yeast. Proc Natl Acad Sci,108(24),10010-5. Cell, 153(3), 654-65.
2. Helwak, A., et al. 2013. Mapping the Human miRNA Interactome by CLASH Reveals Frequent Noncanonical Binding. Cell, 153(3), 654-65.
3. Syed Haider, S., et al. 2009. BioMart Central--unified access to biological data. Nucleic Acids Res, 37, W23-7.
4. Lewis B. P., et al. 2005. Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets. Cell, 120, 15-20.
5. Garcia D. M., et al. 2011. Weak seed-pairing stability and high target-site abundance decrease the proficiency of lsy-6 and other microRNAs. Nat Struct Mol Biol. 18(10),1139-46.
6. Chi, S.W., et al. 2009. Argonaute HITS-CLIP decodes microRNA-mRNA interaction maps. Nature, 460, 479-786.
7. Hafner, M., et al. Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell, 141, 129-141.
8. Kim, J. and Bartel, D. P., 2009. Allelic imbalance sequencing reveals that single-nucleotide polymorphisms frequently alter microRNA-directed repression. Nature Biotechnol, 27, 472-477.
9. Ding, J., et al. 2010. Gene expression in skin and lymphoblastoid cells: refined statistical method reveals extensinve overlap in cis-eQTL signals. Am J Hum Genet, 87, 779-789.
10. Myers, A. J., et al. 2007. A survey of genetic human cortical gene expression. Nat Genet, 39, 1594-9.
11. Zeller, T., et al. 2010. Genetics and beyond: the transciptome of human monocytes and disease susceptibility. PLoS One, 5, e10693.
12. Morely, M., et al. 2004. Genetic analysis of genome-wide variation in human gene expression. Nature, 430, 743-747.
13. Pickrell, J. K., et al. 2010. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature, 464, 768-772.
Please send questions and comments to Dr. Yan Cui at University of Tennesee Health