UniProt
Swiss-ProtTrEMBL
UniProt Knowledgebase
Swiss-Prot Protein Knowledgebase
TrEMBL Protein Database

Forthcoming changes
Release 13.3 of 29-Apr-2008

Also read about recent changes, and recent and forthcoming changes for the XML version of the UniProt Knowledgebase.

Table of contents

Change of the protein description (DE line)
Changes in the FASTA header line

Change of the protein description (DE line)

Not before: 01-Jul-2008

The UniProtKB description (DE) lines list protein names in a computer parsable format, but currently with a minimal amount of structure. In UniProtKB/Swiss-Prot the description starts with the recommended name of the protein and additional alternative names are indicated between parentheses. In UniProtKB/TrEMBL the description is derived directly from the underlying nucleotide entry and its accuracy relies on the information provided by the submitter of the nucleotide entry, unless it has been improved by automatic annotation procedures.

Consistent nomenclature is indispensable for communication, literature searching and entry retrieval. The protein names provided in the description lines of UniProtKB/Swiss-Prot are widely used by life scientists and often propagated during the annotation of new genomic sequences. For these reasons we intend to structure the UniProtKB DE lines more explicitly: We will introduce 3 categories, as well as several subcategories, of protein names:

Category FieldSubcategory FieldCardinalityDescription
RecName:1 in UniProtKB/Swiss-Prot
0-1 in UniProtKB/TrEMBL
The name recommended by the UniProt consortium.
Full=1 The full name.
Short=0-n An abbreviation of the full name or an acronym.
EC=0-n An Enzyme Commission number.
AltName:0-n A synonym of the recommended name.
Full=0-1 The full name.
Short=0-n An abbreviation of the full name or an acronym.
EC=0-n An Enzyme Commission number.
Allergen=0-1 See allergen.txt.
Biotech=0-1 A name used in a biotechnological context.
CD_antigen=0-n See cdlist.txt.
INN=0-1 The international nonproprietary name: A generic name for a pharmaceutical substance or active pharmaceutical ingredient that is globally recognized and is a public property.
SubName:0 in UniProtKB/Swiss-Prot
0-n in UniProtKB/TrEMBL
A name provided by the submitter of the underlying nucleotide sequence.
Full=1 The full name.
EC=0-n An Enzyme Commission number.

Each name is shown on a separate line; lines may therefore exceed 75 characters.

A block of DE lines may further contain multiple Includes: and/or Contains: sections and a separate field Flags: to indicate whether the protein sequence is a precursor or a fragment:

FieldCardinalityValue
Includes:0-n A block of protein names as described in the table above.
Contains:0-n A block of protein names as described in the table above.
Flags:0-1 Precursor and/or Fragment or Fragments

Examples:

P60568:

Current format:

DE   Interleukin-2 precursor (IL-2) (T-cell growth factor) (TCGF)
DE   (Aldesleukin).

New format:

DE   RecName: Full=Interleukin-2;
DE            Short=IL-2;
DE   AltName: Full=T-cell growth factor;
DE            Short=TCGF; 
DE   AltName: INN=Aldesleukin;
DE   Flags: Precursor;
Q10743:

Current format:

DE   ADAM 10 precursor (EC 3.4.24.81) (A disintegrin and metalloproteinase
DE   domain 10) (Mammalian disintegrin-metalloprotease) (Kuzbanian protein
DE   homolog) (CD156c antigen) (Fragment).

New format:

DE   RecName: Full=A disintegrin and metalloproteinase domain 10;
DE            Short=ADAM 10;
DE            EC=3.4.24.81;
DE   AltName: Full=Mammalian disintegrin-metalloprotease;
DE   AltName: Full=Kuzbanian protein homolog;
DE   AltName: CD_antigen=CD156c;
DE   Flags: Precursor; Fragment;
Q07908:

Current format:

DE   Arginine biosynthesis bifunctional protein argJ [Includes: Glutamate
DE   N-acetyltransferase (EC 2.3.1.35) (Ornithine acetyltransferase)
DE   (Ornithine transacetylase) (OATase); Amino-acid acetyltransferase
DE   (EC 2.3.1.1) (N-acetylglutamate synthase) (AGS)] [Contains: Arginine
DE   biosynthesis bifunctional protein argJ alpha chain; Arginine
DE   biosynthesis bifunctional protein argJ beta chain].

New format:

DE   RecName: Full=Arginine biosynthesis bifunctional protein argJ;
DE   Includes:
DE     RecName: Full=Glutamate N-acetyltransferase;
DE              EC=2.3.1.35;
DE     AltName: Full=Ornithine acetyltransferase;
DE              Short=OATase;
DE     AltName: Full=Ornithine transacetylase;
DE   Includes:
DE     RecName: Full=Amino-acid acetyltransferase;
DE              EC=2.3.1.1;
DE     AltName: Full=N-acetylglutamate synthase;
DE              Short=AGS;
DE   Contains:
DE     RecName: Full=Arginine biosynthesis bifunctional protein argJ alpha chain;
DE   Contains:
DE     RecName: Full=Arginine biosynthesis bifunctional protein argJ beta chain;
Changes in the FASTA header line

Not before: 01-Jul-2008

The current UniProtKB FASTA headers are unfortunately incompatible with the -o option of the NCBI's program formatdb. We have been working with the NCBI to remedy this and changes are required on both sides. While future versions of formatdb will accept a database code for UniProtKB/TrEMBL, we will also have to modify our UniProtKB FASTA headers. For consistency reasons, we will also change the FASTA headers of the other UniProt databases.

UniProtKB

>db|UniqueIdentifier|EntryName ProteinName OS=OrganismName[ GN=GeneName]PE=ProteinExistence SV=SequenceVersion
Where:

Examples:

>sp|Q8I6R7|ACN2_ACAGO Acanthoscurrin-2 (Fragment) OS=Acanthoscurria gomesiana GN=acantho2 PE=1 SV=1
>sp|P27748|ACOX_RALEH Acetoin catabolism protein X OS=Ralstonia eutropha (strain ATCC 17699 / H16 / DSM 428 / Stanier 337) GN=acoX PE=4 SV=2
>sp|P04224|HA22_MOUSE H-2 class II histocompatibility antigen, E-K alpha chain OS=Mus musculus PE=1 SV=1

>tr|A3SA23|A3SA23_9RHOB TonB dependent, hydroxamate-type ferrisiderophore, outer membrane receptor OS=Sulfitobacter sp. EE-36 GN=EE36_08023 PE=3 SV=1
>tr|Q8N2H2|Q8N2H2_HUMAN CDNA FLJ90785 fis, clone THYRO1001457, moderately similar to H.sapiens protein kinase C mu OS=Homo sapiens PE=2 SV=1
Alternative isoforms (this only applies to UniProtKB/Swiss-Prot):
>sp|IsoID|EntryName Isoform IsoformName of ProteinName OS=OrganismName[ GN=GeneName]
Where: ProteinExistence and SequenceVersion do not apply to alternative isoforms (ProteinExistence is dependent on the number of cDNA sequences, which is not known for individual isoforms).

Example:

sp|Q4R572-2|1433B_MACFA Isoform Short of 14-3-3 protein beta/alpha OS=Macaca fascicularis GN=YWHAB

UniRef

>UniqueIdentifier ClusterName n=Members Tax=Taxon RepID=RepresentativeMember
Where:

Example:

>UniRef100_A5DI11 Elongation factor 2 n=1 Tax=Pichia guilliermondii RepID=EF2_PICGU

UniParc

>UniqueIdentifier status=Status
Where:

Example:

>UPI0000000005 status=active

UniMES

>UniqueIDentifier ProteinName OS=OrganismName[ Pep=SourcePeptideIdentifier]SV=SequenceVersion
Where:

Example:

>MES00000000005 Putative uncharacterized protein GOS_3018412 (Fragment) OS=marine metagenome Pep=JCVI_PEP_1096688850003 SV=1

Archived UniProtKB sequence versions

>db|UniqueIdentifier archived from Release ReleaseNumber ReleaseDate SV=SequenceVersion
Where:

Examples:

"pre-UniProt":
>sp|P05067 archived from Release 18.0 01-MAY-1991 SV=3
>tr|Q55167 archived from Release 17.0 01-JUN-2001 SV=1
"post-UniProt":
>sp|P05067 archived from Release 9.2/51.2 28-NOV-2006 SV=3
>tr|A0RTJ8 archived from Release 11.0/36.0 29-MAY-2007 SV=1