Site Map
Introduction
Search Database
BLAST Service
Prediction Service
Downloads
Documents
Related Links
Contact Author
Soap Server
Download Client

Statistics of the database :

Full Dataset

BacteriaEukaryoteFungiPlantAnimalVirusArchaeaAll
Total196643827556085468251923472264064051
Nucleus or Chromosome2613544497141239951701688395
Cytoplasm3254246342311392004176134
Membrane66041287715991329947997265421107
Extracellular1223085428031731503222
Mitochondrion017263932111048001726
Chloroplast01789012930001789
Ribosome2625100027125736507884413
Others44469891190917736212231561317265

Non-Redundant Dataset (threshold=60%)

BacteriaEukaryoteFungiPlantAnimalVirusArchaeaAll
Total98011814342011367125751182123130357
Nucleus or Chromosome16593049802181206673834864
Cytoplasm144612033272562901862835
Membrane423062231247304451736341211228
Extracellular631178142010961001251
Mitochondrion09632919158100963
Chloroplast03310331000331
Ribosome6423761167818202871305
Others17614820140433735047362637580

Data file format :

All downloadable data files are in plain text format. The format is SWISS-PROT like: each line of an entry is a record composed of key name and key value. Entries are seperated by a seperator record "//". The meaning of the keys are :

  • ID: the identity of an entry, this is an unique integer used internally in the relation database and may vary in different release versions.
  • DE: the scientific name and description of the protein and gene.
  • OS: the organism source of the protein. This is the text description of organism taxonomy.
  • LC: text name of the subcellular location.
  • CX: cross references. The record is in "DB:ID" format.
  • SQ: amino acid sequence of the protein, may be splitted into multiple records.


Copyright (C) 2003-2005 Institute of Bioinformatics,Tsinghua University. All Rights Reserved.
Last Modified: 2005-8-17