List of biological databases

Contents (in order of appearance below)

  • 1 Meta Databases
  • 2 Nucleic Acid Databases
  • 2.1 DNA Databases
  • 2.2 Gene Expression Databases (mostly Microarray data)
  • 2.3 Genome Databases
  • 2.4 Phenotype Databases
  • 2.5 RNA Databases
  • 3 Amino Acid / Protein Databases
  • 3.1 Protein Sequence Databases
  • 3.2 Protein Structure Databases
  • 3.3 Protein Model Databases
  • 3.4 Protein-Protein and Other Molecular Interactions
  • 3.5 Proteomics Databases
  • 4 Additional Databases
  • 4.1 Carbohydrate Structure Databases
  • 4.2 Signal Transduction Pathway Databases
  • 4.3 Metabolic Pathway and Protein Function Databases
  • 4.4 Metabolomic Databases
  • 4.5 Exosomal Databases
  • 4.6 Mathematical Model Databases
  • 4.7 PCR and Quantitative PCR Primer Databases
  • 4.8 Taxonomic Databases
  • 4.9 Radiologic Databases
  • 5 Specialized Databases (Alphabetically Ordered)
  • 6 Wiki-Style Databases
  • 7 Unsorted
  • 8 References

Meta Databases

Meta databases are databases of databases that collect data about data to generate new data. They are capable of merging information from different sources and making it available in a new and more convenient form, or with an emphasis on a particular disease or organism.

Nucleic Acid Databases

DNA Databases

Primary Databases International Nucleotide Sequence Database (INSD) consists of the following databases.

DNA Data Bank of Japan (National Institute of Genetics)

The three databases, DDBJ (Japan), GenBank (USA) and European Nucleotide Archive (Europe), are repositories for nucleotide sequence data from all organisms. All three databases accept nucleotide sequence submissions, and then exchange new and updated data on a daily basis to achieve optimal synchronisation between them. These three databases are primary databases, as they house original sequence data. They collaborate with Sequence Read Archive (SRA), which archives raw reads from high-throughput sequencing instruments.

Secondary Databases

Gene Expression Databases (mostly Microarray data)

  • Main article: Microarray databases
  • Gene Expression Omnibus (GEO, National Center for Biotechnology Information)
  • GPX(Scottish Centre for Genomic Technology and Informatics)
  • maxd (Univ. of Manchester)
  • Stanford Microarray Database (SMD) (Stanford University)
  • Genevestigator – Expression Search Engine (Nebion AG)
  • Bgee Bgee is a database to retrieve and compare gene expression patterns between species. It only contains wild-type and manually curated microarray/RNASeq/in situ experiments.
  • BioGPS (The Scripps Research Institute) A Gene Portal System with a Gene Expression Visualizer
  • The European Genome-phenome Archive (EGA)
  • The Genotype-Tissue Expression (GTEx) Project (GTEx). The Genotype-Tissue Expression (GTEx) project aims to provide to the scientific community a resource with which to study human gene expression and regulation and its relationship to genetic variation. This project will collect and analyze multiple human tissues from donors who are also densely genotyped, to assess genetic variation within their genomes.
  • Expression Atlas: Differential and Baseline Expression ( Expression Atlas provides information on gene expression patterns under different biological conditions. Gene expression data is re-analysed in-house to detect genes showing interesting baseline and differential expression patterns.
  • The Human Protein Atlas ( The Human Protein Atlas contains information for a large majority of all human protein-coding genes regarding the expression and localization of the corresponding proteins based on both RNA and protein data. The atlas consists of three subparts; cell, normal tissue, and cancer with each subpart containing images and data based on antibody-based proteomics and transcriptomics.

Genome Databases

These databases collect genome sequences, annotate and analyze them, and provide public access. Some add curation of experimental literature to improve computed annotations. These databases may hold many species genomes, or a single model organism genome.

Phenotype Databases

  • PhenCode linking human mutations with phenotyp
  • PhenomicDB multi-organism database linking genotype to phenotype
  • PHI-base Pathogen-host interaction database. It links gene information to phenotypic information from microbial pathogens on their hosts. Information is manually curated from peer reviewed literature.
  • RGD Rat Genome Database: Genomic and phenotype data for Rattus norvegicus
  • Planform: planarian formalized-experiments database, linking surgical, genetic, and pharmacological perturbations to morphological phenotypic outcomes from published planarian regeneration experiments.
  • Limbform: limb formalized-experiments database, linking surgical, genetic, and pharmacological perturbations to morphological phenotypic outcomes from published multi-organism limb regeneration experiments.
  • Ontology of microbial phenotypes

RNA Databases

Amino Acid / Protein Databases

Protein Sequence Databases

Protein Structure Databases

Primary databases

Secondary databases

For more protein structure databases, see also Protein structure database

Protein Model Databases

  • Swiss-model[9] Server and Repository for Protein Structure Models
  • ModBase[10] Database of Comparative Protein Structure Models (Sali Lab, UCSF)
  • Protein Model Portal[11] (PMP) Meta database that combines several databases of protein structure models (Biozentrum, Basel, Switzerland)
  • Similarity Matrix of Proteins (SIMAP) is a database of protein similarities computed using FASTA.

Protein-Protein and Other Molecular Interactions

Proteomics Databases

  • Proteomics Identifications Database (PRIDE) A public repository for proteomics data, containing protein and peptide identifications and their associated supporting evidence as well as details of post-translational modifications. (European Bioinformatics Institute)
  • ProteomeScout – A public repository of processed proteomics datasets concerning post-translational modifications, includes quantification across conditions (if applicable). Also includes a graphics exports of protein annotations.
  • MitoMiner – A mitochondrial proteomics database integrating large-scale experimental datasets from mass spectrometry and GFP studies for 12 species. (MRC Mitochondrial Biology Unit)
  • GelMap – A public database of proteins identified on 2D gels (University of Hanover Proteomics Department)
  • OWL – A public non-redundant database for protein search, derived from : SWISS PROT, PIR, GenBank(translation) and NRL-3D
  • ProteomeXchange provides a coordinated submission of mass spectrometry proteomics data to the main existing proteomics repositories. It includes datasets such as PRIDE, Tranche, and PeptideAtlas.

Additional Databases

Carbohydrate Structure Databases

  • EuroCarbDB[13], A repository for both carbohydrate sequences/structures and experimental data.

Signal Transduction Pathway Databases

Metabolic Pathway and Protein Function Databases

Metabolomic Databases

Exosomal Databases

Mathematical Model Databases

PCR and Quantitative PCR Primer Databases

Taxonomic Databases

Main article: List of biodiversity databases

Radiologic Databases

Specialized Databases (Alphabetically Ordered)

  • Antibody Central Antibody information database and search resource.
  • assigns unique identifiers used to track antibody reagents in published literature.
  • BETYdb is a database of plant traits, yields, and ecosystem services.
  • Bgee Bgee is a database to retrieve and compare gene expression patterns between species.
  • BIOMOVIE (ETH Zurich) movies related to biology and biotechnology
  • BioNumbers a database of useful biological numbers
  • Barcode of Life Data Systems, a database of DNA barcodes
  • Cellosaurus, a knowledge resource on cell lines
  • CGAP Cancer Genes (National Cancer Institute)
  • Clone Registry Clone Collections (National Center for Biotechnology Information)
  • Colorectal Cancer Atlas catalogs multiple genomic and proteomic data types from 13,711 tissue samples to identify sequence variants in more than 165 colorectal cancer cell lines.
  • Connectivity map Transcriptional expression data and correlation tools for drugs
  • CTD The Comparative Toxicogenomics Database describes chemical-gene-disease interactions
  • DBGET H.sapiens (Univ. of Kyoto)
  • DisGeNET DisGeNET is database that integrates information on gene-disease associations
  • DiProDB A database to collect and analyse thermodynamic, structural and other dinucleotide properties.
  • Drug2Gene Provides integrated information for identified and reported relations between genes/proteins and drugs/compounds
  • Dryad a repository of data underlying scientific publications in the basic and applied biosciences.
  • Edinburgh Mouse Atlas
  • EPD Eukaryotic Promoter Database
  • Eukaryotic Linear Motif Database (ELM) Database of short linear motifs.
  • EpimiRBase A comprehensive database of microRNA-epilepsy associations.
  • FunSecKB The fungal secretome knowledgebase.
  • FunSecKB2 The fungal secretome and subcellular proteome knowledgebase (version 2)
  • GreenPhylDB (A phylogenomic database for plant comparative genomics)
  • GDB Hum. Genome Db (Human Genome Organisation)
  • HGMD disease-causing mutations (HGMD Human Gene Mutation Database)
  • HUGO (Official Human Genome Database: HUGO Gene Nomenclature Committee)
  • HvrBase++ Human and primate mitochondrial DNA
  • IEDB Immune Epitope Database
  • IMGT The international ImMunoGeneTics information system
  • INTERFEROME The Database of Interferon Regulated Genes
  • List with SNP-Databases
  • MetazSecKB The metazoa [human/animal] secretome and subcellular proteome knowledgebase
  • MethBase Database of DNA methylation data visualized on the UCSC Genome Browser.
  • Minimotif Miner -Database of short contiguous functional peptide motifs
  • NCBI-UniGene (National Center for Biotechnology Information)
  • Oncogenomic databases A compilation of databases that serve for cancer research.
  • OMIM Inherited Diseases (Online Mendelian Inheritance in Man)
  • OrthoMaM (A database of Orthologous Mammalian Markers)
  • OrthoMCL Ortholog Groups of Protein Sequences from Multiple Genomes including Archaea, Bacteria and Eukaryotes.
  • p53 The p53 Knowledgebase
  • PASD The plant alternative splicing database
  • PlantSecKB The plant secretome and subcullular proteome knowledgebase
  • Plasma Proteome Database Human plasma proteins along with their isoforms
  • SABIO-RK SABIO-RK is a curated database that contains information about biochemical reactions, their kinetic rate equations with parameters and experimental conditions.
  • SciClyc An Open-access database to shared antibodies, cell cultures, and documents for biomedical research.
  • Selectome Selectome is a database of positive selection based on a rigorous branch-site specific likelihood test. Positive selection is detected using CODEML on all branches of animal gene trees.
  • SHMPD The Singapore Human Mutation and Polymorphism Database
  • SNPSTR database A database of SNPSTRs – compound genetic markers consisting of a microsatellite (STR) and one tightly linked SNP – in human, mouse, rat, dog and chicken.
  • The Cancer Genome Atlas (TCGA) provides data from hundreds of cancer samples obtained using high-throughput techniques such as gene expression profiling, copy number variation profiling, SNP genotyping, genome wide DNA methylation profiling, microRNA profiling, and exon sequencing of at least 1,200 genes.
  • TDR Targets A chemogenomics database focused on drug discovery in tropical diseases.
  • TRANSFAC A database about eukaryotic transcription factors, their genomic binding sites and DNA-binding profiles.
  • TreeBASE An open-access database of phylogenetic trees and the data behind them
  • Treefam TreeFam (Tree families database) is a database of phylogenetic trees of animal genes
  • [XTractor] Discovering Newer Scientific Relations Across PubMed Abstracts. A tool to obtain manually annotated relationships for Proteins, Diseases, Drugs and Biological Processes as they get published in PubMed.

Wiki-Style Databases