RNA-binding protein database

The RNA-binding Proteins Database (RBPDB) is a biological database of RNA-binding protein specificities that includes experimental observations of RNA-binding sites. The experimental results included are both in vitro and in vivo from primary literature.[1] It includes four metazoan species, which are Homo sapiens, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. RNA-binding domains included in this database are RNA recognition motif, K homology, CCCH zinc finger, and more domains. As of 2021, the latest RBPDB release (v1.3, September 2012) includes 1,171 RNA-binding proteins.[2]

Background Information about RNA Binding Protein

Transcription and translation processes are different in prokaryotes and eukaryotes. Unlike prokaryotes, these two processes occur separately in eukaryote's nucleus and cytoplasm. Because of this, eukaryotes apply a strategy called post-transcriptional modification which includes splicing, editing and polyadenylation to process the pre-mRNA. RNA-binding proteins ( RBPs ) play critical role during this process. All RBPs can bind to RNA depends on different specificities and affinities.[3][4][5] RBPs contain at least one RNA-binding domains and usually they have multiple binding domains. RNA-binding domain (RBD, also known as RNP domain and RNA recognition motif, RRM), K-homology (KH) domain (type I and type II), RGG (Arg-Gly-Gly) box, Sm domain; DEAD/DEAH box, zinc finger (ZnF, mostly C-x8-X-x5-X-x3-H), double stranded RNA-binding domain (dsRBD), cold-shock domain; Pumilio/FBF (PUF or Pum-HD) domain, and the Piwi/Argonaute/Zwille (PAZ) domain have been well characterized.[6][7]

RBPs are constructed by multiple binding domains. These domains contain a few basic modular units. Comparing with a single motif, RBPs can recognize a much longer stretch of nucleic acids with those multiple motifs. Meanwhile, RBPs bind to RNA by forming weak interactions. The weak interaction surface is largely increased by these motifs. As the result, RBPs can bind RNA with higher specificity and affinity than single domain.[8] RNA-binding protein database has three main specific categories. They are RNA recognition motif (RRM), K-Homology domain (KH domain) and zinc fingers.

Diagram showing crystal structures of RNA-binding proteins; 4 on the top and 2 on the bottom of the image.
Crystallographic structures of RNA-binding domains from RNA-binding proteins.

RNA-binding protein domains

In Lunde's article, their group has introduced different types of RNA-binding protein motif and their specific functions.[7]

RNA recognition rotif (RRM)

RNA recognition rotif (RRM) contains about 80–90 amino acids that form four-stranded anti-parallel β-sheet with two helices (βαββαβ topology). The β-sheet plays critical role for RNA recognition. Usually, three conserved residues on the β-sheet are very important for this recognition process. Specifically, an Arg or Lys residue forms a salt bridge to the phosphodiester backbone and another two aromatic residues make stacking interactions with the nucleobases. Each of these four β-sheet recognize one nucleotides. However, with exposed loops and additional secondary structure, RRM can recognized up to 8 nucleotides.[7][9]

K-homology domain (KH domain)

K-homology domain (KH domain) was the first identified in the human. It is from heterogeneous nuclear ribonucleoprotein (hnRNP) K. Therefore, binding domains that belong to this family are called K-Homology domain. It is a domain that binds to both ssDNA and ssRNA. Eukaryotes, eubacteria and archaea usually have this type of domains. The domain contains about 70 amino acids. The important signature sequence of this domain is (I/L/V)IGXXGXX(I/L/V). All KH domains contain three-stranded β-sheet and three α-helices. There are two subfamilies of this domain. Type I KH domain (βααββα topology) and type II KH domain (αββααβ topology). For both classes, the GXXG loop, the flanking helices, the β-strand and the variable loop between β2 and β3 (type I) or between α2 and β2 (type II) play a very important role in recognizing RNA.[7][10]

Zinc fingers

Zinc fingers are the domains contain zinc coordinated residues. There are three main types of this domain which are Cys2His2 (CCHH), CCCH or CCHC. Generally, there are several repeats of this domain work together in a protein. When CCHH zinc finger binds to DNA, residues in its recognition α-helix forming hydrogen bonds to Watson–Crick base pairs in the major groove. When It binds to RNA, same residues used to recognize DNA may still be used to recognize RNA. The strategy used by zinc figure to distinguish these two type of nucleotides may contain distinct structural arrangement of this domain. CCCH and CCHC zinc fingers bind to an AU-rich RNA element. Different from CCHH zinc figure, the shape of the protein is the primary determinant of specificity.[7][11]

Sequence preference of RNA-binding protein

In Ray and Kazan's paper, they address the question about sequence preference of RBPs. In their research, one single RBP is incubated with a vast molar excess of a complex pool of RNAs. The protein is recovered by affinity selection and associated RNAs are interrogated by microarray and computational analyses. Their results show that RNA-binding proteins have sequence preference and Identical or closely related RBPs will bind to specific similar RNA sequence.[12]

Use

Right now, RNA-binding protein database (RBPDB) contains 1171 RNA-binding proteins from Homo sapiens, Mus musculus, Drosophila melanogaster, and Caenorhabditis elegans. Proteins can be searched by domain or species. Both ways will lead to the detail information list of proteins which includes gene symbol, annotation ID, synonyms, gene description, species, RNA-binding domain, number of experiment and homologs. The link on the number of experiments leads to the research articles related to the protein. Also, in this database users can search experiments related to specific RNA binding sequence. Furthermore, this site can help users predict the binding sites for a sequence.

See also

References

  1. ^ Cook, Kate B.; Kazan, Hilal (2010). "RBPDB: a database of RNA-binding specificities". Nucleic Acids Research. 39 (Database issue). Oxford University Press: D301 – D308. doi:10.1093/nar/gkq1069. PMC 3013675. PMID 21036867.
  2. ^ "RBPDB: The database of RNA-binding specificities". rbpdb.ccbr.utoronto.ca. Retrieved 29 April 2021.
  3. ^ Matera, A. Gregory; Terns, Rebecca M.; Terns, Michael P. (March 2007). "Non-coding RNAs: lessons from the small nuclear and small nucleolar RNAs". Nature Reviews Molecular Cell Biology. 8 (3). Nature Publishing Group: 209–220. doi:10.1038/nrm2124. PMID 17318225. S2CID 30268055.
  4. ^ Glisovic, Tina; Bachorik, Jennifer L. (2008). "RNA-binding proteins and post-transcriptional gene regulation". FEBS Letters. 582 (14). Elsevier B.V.: 1977–1986. Bibcode:2008FEBSL.582.1977G. doi:10.1016/j.febslet.2008.03.004. PMC 2858862. PMID 18342629.
  5. ^ Kishore, Shivendra; Luber, Sandra; Zavolan, Mihaela (2010). "Deciphering the role of RNA-binding proteins in the post-transcriptional control of gene expression". Briefings in Functional Genomics. 9 (5–6): 391–404. doi:10.1093/bfgp/elq028. PMC 3080770. PMID 21127008.
  6. ^ Chen, Y.; Varani, G. (2005). "Protein families and RNA recognition". FEBS J. 272 (9): 2088–2097. doi:10.1111/j.1742-4658.2005.04650.x. PMID 15853794. S2CID 12432954.
  7. ^ a b c d e Lunde, B.M.; Moore, C.; Varani, G. (2007). "RNA-binding proteins: modular design for efficient function". Nat. Rev. Mol. Cell Biol. 8 (6): 479–490. doi:10.1038/nrm2178. PMC 5507177. PMID 17473849.
  8. ^ Hogan, DJ; Riordan, DP (2008). "Diverse RNA-binding proteins interact with functionally related sets of RNAs, suggesting an extensive regulatory system". PLOS Biology. 6 (10): 2297–2313. doi:10.1371/journal.pbio.0060255. PMC 2573929. PMID 18959479.
  9. ^ Swanson MS, Dreyfuss G, Pinol-Roma S (1988). "Heterogeneous nuclear ribonucleoprotein particles and the pathway of mRNA formation". Trends Biochem. Sci. 13 (3): 86–91. doi:10.1016/0968-0004(88)90046-1. PMID 3072706.
  10. ^ García-Mayoral MF, Hollingworth D, Masino L, et al. (April 2007). "The structure of the C-terminal KH domains of KSRP reveals a noncanonical motif important for mRNA degradation" (PDF). Structure. 15 (4): 485–98. doi:10.1016/j.str.2007.03.006. PMID 17437720.
  11. ^ Klug A, Rhodes D (1987). "Zinc fingers: a novel protein fold for nucleic acid recognition". Cold Spring Harb. Symp. Quant. Biol. 52: 473–82. doi:10.1101/sqb.1987.052.01.054. PMID 3135979.
  12. ^ Ray, D.; Kazan, H. (2013). "A compendiumof RNA-binding motifs for decoding gene regulation". Nature. 499 (7457): 172–177. Bibcode:2013Natur.499..172R. doi:10.1038/nature12311. PMC 3929597. PMID 23846655.