|
|
Statistics of the database :
Full Dataset
| Bacteria | Eukaryote | Fungi | Plant | Animal | Virus | Archaea | All |
| Total | 19664 | 38275 | 5608 | 5468 | 25192 | 3472 | 2640 | 64051 |
| Nucleus or Chromosome | 2613 | 5444 | 971 | 412 | 3995 | 170 | 168 | 8395 |
| Cytoplasm | 3254 | 2463 | 423 | 113 | 920 | 0 | 417 | 6134 |
| Membrane | 6604 | 12877 | 1599 | 1329 | 9479 | 972 | 654 | 21107 |
| Extracellular | 122 | 3085 | 42 | 80 | 3173 | 15 | 0 | 3222 |
| Mitochondrion | 0 | 1726 | 393 | 211 | 1048 | 0 | 0 | 1726 |
| Chloroplast | 0 | 1789 | 0 | 1293 | 0 | 0 | 0 | 1789 |
| Ribosome | 2625 | 1000 | 271 | 257 | 365 | 0 | 788 | 4413 |
| Others | 4446 | 9891 | 1909 | 1773 | 6212 | 2315 | 613 | 17265 |
Non-Redundant Dataset (threshold=60%)
| Bacteria | Eukaryote | Fungi | Plant | Animal | Virus | Archaea | All |
| Total | 9801 | 18143 | 4201 | 1367 | 12575 | 1182 | 1231 | 30357 |
| Nucleus or Chromosome | 1659 | 3049 | 802 | 181 | 2066 | 73 | 83 | 4864 |
| Cytoplasm | 1446 | 1203 | 327 | 25 | 629 | 0 | 186 | 2835 |
| Membrane | 4230 | 6223 | 1247 | 304 | 4517 | 363 | 412 | 11228 |
| Extracellular | 63 | 1178 | 14 | 20 | 1096 | 10 | 0 | 1251 |
| Mitochondrion | 0 | 963 | 291 | 91 | 581 | 0 | 0 | 963 |
| Chloroplast | 0 | 331 | 0 | 331 | 0 | 0 | 0 | 331 |
| Ribosome | 642 | 376 | 116 | 78 | 182 | 0 | 287 | 1305 |
| Others | 1761 | 4820 | 1404 | 337 | 3504 | 736 | 263 | 7580 |
Data file format :
All downloadable data files are in plain text format. The format is SWISS-PROT like: each line of an entry is a record composed of key name and key value. Entries are seperated by a seperator record "//". The meaning of the keys are :
- ID: the identity of an entry, this is an unique integer used internally in the relation database and may vary in different release versions.
- DE: the scientific name and description of the protein and gene.
- OS: the organism source of the protein. This is the text description of organism taxonomy.
- LC: text name of the subcellular location.
- CX: cross references. The record is in "DB:ID" format.
- SQ: amino acid sequence of the protein, may be splitted into multiple records.
|
|