Skip Header

 

References

Last modified August 8, 2008

This section contains the literature citations and indicates the sources from which the data used to annotate an entry have been extracted. It is composed of blocks of references that are numbered.

Up to 7 distinct subsections can compose a reference block: the ‘Number’, the ‘Title’, the ‘Author’s names’, the ‘Reference information’, the ‘Cross-references’, the ‘Citation content’ and the ‘Sequence origin’.
Example: Q9XTY6

1. Number

The reference number gives a sequential number to each reference citation in an entry. This number is used to link to the appropriate reference in the ‘General annotation (Comments)’ and ‘Sequence annotation (Features)’ sections.

2. Title

The title of the paper (or other work) is cited as exactly as possible given the limitations of the computer character set.
Example: Q21507

The format of the title is not always identical to that displayed at the top of the published work:

3. Author’s names

We list the authors’ names the order given in the original paper (or work).
Example: P11071

An author’s initials can be followed by an abbreviation such as ‘Jr’ for Junior), ‘Sr’ (Senior), ‘II’, ‘III’ or ‘IV’ (2nd, 3rd and 4th).
Example: P00350

We try to be as complete as possible with author’s names: we keep all initials and hyphens between initials. The German umlaut is replaced by an ‘e’, which follows the modified vowel. Some authors do not have any initial: in such case, we add an ‘X.’ after the name.

We also try to be as consistent as possible with author’s names: when a same author is misspelled, we correct it and homogenize it.

In some cases, the author’s names consists in the name of a consortium. That is mainly used for direct submissions to databases but can also be used in full references, when the consortium is cited as an author. Consortium’s names and author’s names may coexist.
Examples: Q7TQA9, O60260

4. Reference information

The reference information contains the conventional citation information for the reference.

a) Journal citations

The reference information for a journal citation includes the journal abbreviation, the volume number, the page range and the year.

Journal names are abbreviated according to the conventions used by the National Library of Medicine (NLM) and are based on the existing ISO and ANSI standards. A list of the abbreviations currently in use is given in the document ‘Controlled vocabulary of journals’.
Example: P03024

When a reference is made to a paper which is ‘in press’ at the time the database is released, the page range, and possibly the volume number, are indicated as ‘0’ (zero).
Example: P84575

b) Electronic publications

The reference information for an electronic publication includes an ‘(er)’prefix.
Examples: O64948, Q09517

c) Book citations

The reference information for papers found in books or other type of publication includes the book name, the volume number, the page range, the publisher, the city and the year.
Examples: P00065, P04560, P02675

d) Unpublished observations

The reference information for unpublished observations includes the month and the year.

We use the ‘unpublished observations’ to cite communications by sientists to UniProtKB/Swiss-Prot of unpublished information concerning various aspects of a sequence entry.
Example: P08195

e) Thesis

The reference information for Ph.D. theses includes a ‘Thesis’ prefix, the year, the institution name, the city and the country.
Example: P01428

 Thesis (1977), University of Geneva, Switzerland.

f) Patent applications

The reference information for patent applications includes the international publication number of the patent and the date.
Example: P29853

g) Submissions

The reference information for submissions includes the date and the database to which the data were submitted.

We report the data submitted to the following databases:

Examples: P50388, P83886

5. Cross-references

The cross-references is optional and is used to indicate the identifier assigned to a specific reference in a bibliographic database.

When present, it gives the cross references to:

Examples: P02675, Q10670, Q9LFB2, Q3EDJ0 (AGRICOLA)

6. Citation content (Cited for)

The citation content describes the information that has been used to annotate the entry (sequence, protein-protein information, variants or mutant, PTMs and 3D structure papers etc.)

Sequence information retrieved from a reference is described in detail by its range and its origin (nucleic acid sequencing or from direct protein sequencing).
Example:

 NUCLEOTIDE SEQUENCE [GENOMIC DNA], AND PROTEIN SEQUENCE OF 21-35.

The comment ‘NUCLEOTIDE SEQUENCE’ might be tagged with a qualifier, indicating the origin of the sequence data. Valid names of this qualifiers are:

Example: Q9QY42

When the sequence describes specific isoform(s), this is indicated in brackets, following the sequence information.
Example: Q8TCU6

‘LARGE SCALE ANALYSIS’ is the tag added in references that report large scale results to indicate that results have not been extensively analysed.
Example: Q9JIX8

The protein-protein interactions (using the official gene name when it exists), mutagenesis experiments and natural variants and post-translational modifications (PTM) information is also listed in detail.
Examples: Q96BI3, Q96EP1, P62739

The 3D structure information is given and describes the method used together with the highest resolution for X-ray crystallography), the range of the domain, and the structure that has been determined.
Examples: P17427, P00831

7. Sequence origin

The sequence origin is optional and indicates the strain(s), tissue(s), plasmid(s) and transposon(s) from which the sequence is derived.

The strains listed in the ‘Strains’ token are sorted alphabetically. All frequently occuring strains in UniProtKB are listed in the document ‘Controlled vocabulary of strains’.

The tissues listed in the ‘Tissue’ token are sorted alphabetically. All tissues indicated in this token in UniProtKB/Swiss-Prot are listed in the document ‘Controlled vocabulary of tissues’. Wherever possible, UniProtKB/TrEMBL also makes use of this controlled tissue list, and efforts are made to automatically match tissues in UniProtKB/TrEMBL entries to tissues on the list. However, due to the nature of the data in UniProtKB/TrEMBL, this is not always possible until the entry is manually annotated.

The ‘Plasmid’ token is only used if an entry describes a sequence identical in more than one plasmid. The document ‘Controlled vocabulary of plasmids’ lists all the plasmids that are used in UniProtKB/Swiss-Prot in the context of the ‘plasmid’ token.
Examples: P18445, Q28125, P30867, P12121, P00810, Q9EVG8.

Many bacterial or fungal strains have names that are composed of an acronym (ATCC, DSM, NRRL…) followed by a number. These strains are maintained in specific culture collections, of which the most frequently cited are listed below:

Acronym Culture collection
ATCC American Type Culture Collection; Rockville, USA
CBS Centraalbureau voor Schimmelcultures; Baarn and Delft, Netherlands
CECT Coleccion Espagnola de Cultivos Tipo; Valencia, Spain
CCAP Culture Collection of Algae and Protozoa; U.K.
CCMP Culture Collection of Marine Phytoplankton
DSM Deutsche Sammlung von Mikroorganismen and Zellkulturen GmbH; Germany
IAM Institute of Applied Microbiology; University of Tokyo, Japan
IFO Institute for Fermentation; Osaka, Japan
KCC Culture collection of Actinomycetes, Kaken Chemical Co; Tokyo, Japan
NCDO National Collection of Dairy Organisms; Reading, U.K.
NCIB National Collection of Industrial Bacteria; Aberdeen, U.K.
NCPPB National Collection of Plant Pathogenic Bacteria; U.K.
NCTC National Collection of Type Cultures; London, U.K.
NRCC National Research Council of Canada
NRRL Agricultural Research Service Culture Collection, National Center for Agricultural Utilization Research
USDA U.S. Department of Agriculture; USA
UTEX Culture collection of Algae at the University of Texas at Austin; USA