References
Last modified August 8, 2008
This section contains the literature citations and indicates the sources from which the data used to annotate an entry have been extracted. It is composed of blocks of references that are numbered.
Up to 7 distinct subsections can compose a reference block: the ‘Number’, the ‘Title’, the ‘Author’s names’, the ‘Reference information’, the ‘Cross-references’, the ‘Citation content’ and the ‘Sequence origin’.
Example: Q9XTY6
1. Number
The reference number gives a sequential number to each reference citation in an entry. This number is used to link to the appropriate reference in the ‘General annotation (Comments)’ and ‘Sequence annotation (Features)’ sections.
2. Title
The title of the paper (or other work) is cited as exactly as possible given the limitations of the computer character set.
Example: Q21507
- Major title words are not capitalized;
- The text of a title ends with either a period ’.’, a question mark ’?’ or an exclamation mark ’!’;
- Double quotation marks ’ ” ’ in the text of the title are replaced by single quotation marks;
- Titles of articles published in a language other than English have been translated into English;
- Greek letters are written in full (alpha, beta, etc.).
3. Author’s names
We list the authors’ names the order given in the original paper (or work).
Example: P11071
An author’s initials can be followed by an abbreviation such as ‘Jr’ for Junior), ‘Sr’ (Senior), ‘II’, ‘III’ or ‘IV’ (2nd, 3rd and 4th).
Example: P00350
We try to be as complete as possible with author’s names: we keep all initials and hyphens between initials. The German umlaut is replaced by an ‘e’, which follows the modified vowel. Some authors do not have any initial: in such case, we add an ‘X.’ after the name.
We also try to be as consistent as possible with author’s names: when a same author is misspelled, we correct it and homogenize it.
In some cases, the author’s names consists in the name of a consortium. That is mainly used for direct submissions to databases but can also be used in full references, when the consortium is cited as an author. Consortium’s names and author’s names may coexist.
Examples: Q7TQA9, O60260
4. Reference information
The reference information contains the conventional citation information for the reference.
a) Journal citations
The reference information for a journal citation includes the journal abbreviation, the volume number, the page range and the year.
Journal names are abbreviated according to the conventions used by the National Library of Medicine (NLM) and are based on the existing ISO and ANSI standards. A list of the abbreviations currently in use is given in the document ‘Controlled vocabulary of journals’.
Example: P03024
When a reference is made to a paper which is ‘in press’ at the time the database is released, the page range, and possibly the volume number, are indicated as ‘0’ (zero).
Example: P84575
b) Electronic publications
The reference information for an electronic publication includes an ‘(er)’prefix.
Examples: O64948, Q09517
c) Book citations
The reference information for papers found in books or other type of publication includes the book name, the volume number, the page range, the publisher, the city and the year.
Examples: P00065, P04560, P02675
d) Unpublished observations
The reference information for unpublished observations includes the month and the year.
We use the ‘unpublished observations’ to cite communications by sientists to UniProtKB/Swiss-Prot of unpublished information concerning various aspects of a sequence entry.
Example: P08195
e) Thesis
The reference information for Ph.D. theses includes a ‘Thesis’ prefix, the year, the institution name, the city and the country.
Example: P01428
Thesis (1977), University of Geneva, Switzerland.
f) Patent applications
The reference information for patent applications includes the international publication number of the patent and the date.
Example: P29853
g) Submissions
The reference information for submissions includes the date and the database to which the data were submitted.
We report the data submitted to the following databases:- the EMBL/GenBank/DDBJ databases
- UniProtKB
- the PDB data bank
- the PIR data bank
5. Cross-references
The cross-references is optional and is used to indicate the identifier assigned to a specific reference in a bibliographic database.
When present, it gives the cross references to:- the PubMed Unique Identifier (PMID)
- the AGRICOLA identifier
- the abstract as supplied by the publishers
- the article from publisher, which corresponds to the digital Object Identifier (DOI)
Examples: P02675, Q10670, Q9LFB2, Q3EDJ0 (AGRICOLA)
6. Citation content (Cited for)
The citation content describes the information that has been used to annotate the entry (sequence, protein-protein information, variants or mutant, PTMs and 3D structure papers etc.)
Sequence information retrieved from a reference is described in detail by its range and its origin (nucleic acid sequencing or from direct protein sequencing).
Example:
NUCLEOTIDE SEQUENCE [GENOMIC DNA], AND PROTEIN SEQUENCE OF 21-35.
The comment ‘NUCLEOTIDE SEQUENCE’ might be tagged with a qualifier, indicating the origin of the sequence data. Valid names of this qualifiers are:
- GENOMIC DNA: the individual gene has been sequenced
- GENOMIC RNA: the individual gene has been sequenced
- MRNA: the individual cDNA has been sequenced
- LARGE SCALE GENOMIC DNA: the gene has been sequenced as part of a genome project
- LARGE SCALE MRNA: the cDNA has been sequenced as part of a large-scale cDNA project
Example: Q9QY42
When the sequence describes specific isoform(s), this is indicated in brackets, following the sequence information.
Example: Q8TCU6
‘LARGE SCALE ANALYSIS’ is the tag added in references that report large scale results to indicate that results have not been extensively analysed.
Example: Q9JIX8
The protein-protein interactions (using the official gene name when it exists), mutagenesis experiments and natural variants and post-translational modifications (PTM) information is also listed in detail.
Examples: Q96BI3, Q96EP1, P62739
The 3D structure information is given and describes the method used together with the highest resolution for X-ray crystallography), the range of the domain, and the structure that has been determined.
Examples: P17427, P00831
7. Sequence origin
The sequence origin is optional and indicates the strain(s), tissue(s), plasmid(s) and transposon(s) from which the sequence is derived.
The strains listed in the ‘Strains’ token are sorted alphabetically. All frequently occuring strains in UniProtKB are listed in the document ‘Controlled vocabulary of strains’.
The tissues listed in the ‘Tissue’ token are sorted alphabetically. All tissues indicated in this token in UniProtKB/Swiss-Prot are listed in the document ‘Controlled vocabulary of tissues’. Wherever possible, UniProtKB/TrEMBL also makes use of this controlled tissue list, and efforts are made to automatically match tissues in UniProtKB/TrEMBL entries to tissues on the list. However, due to the nature of the data in UniProtKB/TrEMBL, this is not always possible until the entry is manually annotated.
The ‘Plasmid’ token is only used if an entry describes a sequence identical in more than one plasmid. The document ‘Controlled vocabulary of plasmids’ lists all the plasmids that are used in UniProtKB/Swiss-Prot in the context of the ‘plasmid’ token.
Examples: P18445, Q28125, P30867, P12121, P00810, Q9EVG8.
Many bacterial or fungal strains have names that are composed of an acronym (ATCC, DSM, NRRL…) followed by a number. These strains are maintained in specific culture collections, of which the most frequently cited are listed below:
| Acronym | Culture collection |
|---|---|
| ATCC | American Type Culture Collection; Rockville, USA |
| CBS | Centraalbureau voor Schimmelcultures; Baarn and Delft, Netherlands |
| CECT | Coleccion Espagnola de Cultivos Tipo; Valencia, Spain |
| CCAP | Culture Collection of Algae and Protozoa; U.K. |
| CCMP | Culture Collection of Marine Phytoplankton |
| DSM | Deutsche Sammlung von Mikroorganismen and Zellkulturen GmbH; Germany |
| IAM | Institute of Applied Microbiology; University of Tokyo, Japan |
| IFO | Institute for Fermentation; Osaka, Japan |
| KCC | Culture collection of Actinomycetes, Kaken Chemical Co; Tokyo, Japan |
| NCDO | National Collection of Dairy Organisms; Reading, U.K. |
| NCIB | National Collection of Industrial Bacteria; Aberdeen, U.K. |
| NCPPB | National Collection of Plant Pathogenic Bacteria; U.K. |
| NCTC | National Collection of Type Cultures; London, U.K. |
| NRCC | National Research Council of Canada |
| NRRL | Agricultural Research Service Culture Collection, National Center for Agricultural Utilization Research |
| USDA | U.S. Department of Agriculture; USA |
| UTEX | Culture collection of Algae at the University of Texas at Austin; USA |



