Skip Header

 

Release 12.5

Published November 13, 2007

Headlines

Acanthamoeba polyphaga mimivirus, a "giant" virus in UniProtKB/Swiss-Prot

Mimivirus (for mimicking microbe) is a new viral genus containing a single identified species, Acanthamoeba polyphaga mimivirus (APMV), discovered by Didier Raoult's lab in 1992 within the amoeba Acanthamoeba polyphaga while working on Legionellosis. The virion has a non-enveloped, icosahedral capsid with a diameter of 400 nm and protein filaments projecting from its surface. The capsid contains the internal core surrounded by an internal lipid layer. Its linear, double- stranded DNA genome is roughly 1.2 million bp in length, the largest viral genome known so far. Its replication cycle, genome and capsid structure place it into the nucleocytoplasmic large DNA viruses (NCLDVs), which include amongst others the poxviruses and iridoviruses.

This virus is amazing in many ways. It is the largest virus ever isolated, with a genome size and complexity comparable to that of a small bacterium. A thorough bioinformatics analysis carried out by the group of Jean-Michel Claverie uncovered 909 potential protein-coding genes. Some of these proteins belong to families that are shared with all or some NCLDVs, many have eukaryotic counterparts and there are quite a number of ORFans (no sequence similarity to proteins from other genomes). It was a surprise to find an appreciable number of genes coding for proteins involved in metabolism, DNA repair pathways and, most surprising, genes encoding a partially functional protein translation apparatus. Mimivirus does indeed encode four aminoacyl-tRNA synthetases (ArgRS, CysRS, MetRS, TyrRS), as well as various translation initiation, elongation and termination factors. It is very intriguing to find, in a virus, genes corresponding to central components of the protein translation machinery, a biochemical process widely thought to be an exclusive signature of cellular organisms.

The discovery of this amazing virus has lead to the concept of "giant" virus and implies that there is an overlap in terms of particle dimension, genome size, and genetic complexity between the viral and cellular organism worlds.

A special effort has been made in UniProtKB/Swiss-Prot database to provide the complete, fully annotated mimivirus proteome. We have also integrated all proteomics and structural information that has been made available by the groups of Jean-Michel Claverie and Chantal Abergel.

To get all UniProtKB mimivirus entries, click here.

UniProtKB News

Format change in the ptmlist.txt document file

The ptmlist.txt document, which is available by ftp and on the Web site, describes the post-translational modifications (PTMs) that are annotated in UniProtKB/Swiss-Prot entries in the sequence annotation section (Features) (FT lines in the flat file) in the subsections "Cross-link" (CROSSLNK key in the flat file), "Lipidation" (LIPID key in the flat file) and "Modified residue" (MOD_RES key in the flat file). The document was in a format that was suitable for computer applications (e.g. ExPASy's proteomics tools), but which was not very human readable. The new file format should improve this.

Previous format:

     N,N-dimethylproline  MOD_RES P  BB Nter C2H4  28.031300  28.06  in  e:6446,7586,33682  Methylation  FT=MOD_RES%20dimethylproline&wild=1  AA0066  MOD:00075
    

New format:

     ID   N,N-dimethylproline
     AC   PTM-0179
     FT   MOD_RES
     TG   Proline.
     PA   Amino acid backbone.
     PP   N-terminal.
     CF   C2 H4
     MM   28.031300
     MA   28.06
     LC   Intracellular localisation.
     TR   Eukaryota; taxId:6446 (Sipunculus nudus), taxId:7586 (Echinodermata), taxId:33682 (Euglenozoa).
     KW   Methylation.
     DR   RESID:AA0066.
     DR   MOD:00075.
     //
    

With the following definitions of the line types:

     ---------  ---------------------------     ----------------------
     Line code  Content                         Occurrence in an entry
     ---------  ---------------------------     ----------------------
     ID         Identifier (FT description)     Once; starts a PTM entry.
     AC         Accession (PTM-xxxx)            Once.
     FT         Feature key                     Once.
     TG         Target                          Once; two targets separated
                                                by a dash in case of intrachain
                                                crosslinks.
     PA         Position of the modified        Optional, once.
                amino acid
     PP         Position of the modification    Optional, once.
                in the polypeptide
     CF         Correction formula              Optional, once.
     MM         Monoisotopic mass difference    Optional, once.
     MA         Average mass difference         Optional, once.
     LC         Cellular location               Optional, once; alternatives
                                                can be proposed.
     TR         Taxonomic range                 Optional, once or more.
     KW         Keyword                         Optional, once or more.
     DR         Cross-reference to PTM          Optional, once or more.
                databases
     //         Terminator                      Once; ends an entry.
    

Changes concerning cross-references to PDB

We added an additional field to the cross-reference (DR line in the flat file) to the PDB database to show the resolution of structures that were determined by X-ray crystallography or electron microscopy.

For the chain names we use now the remediated data from wwPDB, therefore the chain names have changed for some entries.

Previous format:

     DR   PDB; ENTRY_NAME; METHOD; CHAIN.
    

New format:

     DR   PDB; ENTRY_NAME; METHOD; RESOLUTION; CHAIN.
    

Examples:

Q20728:
     DR   PDB; 1LPL; X-ray; 1.77 A; A=135-229.
    
Q5HEB7:
     DR   PDB; 2I8C; X-ray; 2.46 A; A/B=1-356.   
    

A dash indicates that we found no information about the resolution or that the field is not applicable (for NMR structures and theoretical models).

Examples:

P02768:
     DR   PDB; 2ESG; X-ray; -; C=25-609.
    
P12872:
     DR   PDB; 1LBJ; NMR; -; A=26-47.   
    
P0AC41:
     DR   PDB; 2AD0; Model; -; A=1-588.  
    

Cross-references to CleanEx

Cross-references have been added to the CleanEx database of gene expression profiles. CleanEx is a database which provides access to public gene expression data via unique approved gene symbols and which represents heterogeneous expression data produced by different technologies in a way that facilitates joint analysis and cross-dataset comparison.

The CleanEx database is available at http://www.cleanex.isb-sib.ch/.

The format of the explicit link is:

Data bank identifier CleanEx
Primary identifier The primary identifier consists of a GENE_NAME (species code followed by the gene identifier)
Secondary identifier None; a dash '-' is stored in that field.
Examples
O08788:
        DR   CleanEx; MM_DCTN1; -.    
       
P78358:
        DR   CleanEx; HS_CTAG1A; -.
        DR   CleanEx; HS_CTAG1B; -.
        
       

Changes concerning keywords (KW line)

Modified keyword: