Tuesday, May 24, 2016

Proteomics peptide data in the Feature Viewer

The Protein Feature Viewer in UniProt is an interactive representation of all protein sequence features in one compact view. It now provides a new track displaying proteomics peptide identification data for UniProt Knowledgebase entries.

UniProt runs an analysis pipeline to map data from publicly available proteomics resources to UniProtKB sequences. So far these data were available as a download from the UniProt FTP site. Now these data are also displayed in the Protein Feature Viewer which can be accessed through the ‘Feature viewer’ link on the left hand side of the entry view page.

The proteomics track currently displays mass spectrometry peptide data mapped from PeptideAtlas, EPD and MaxQB to UniProtKB protein sequences and more mass spectrometry proteomics resources will be added in the future. The track can be further expanded to see unique and non-unique mapped peptides, as shown in the example screenshot below.

Unicity of peptides is evaluated according to gene groups underlying the UniProtKB reference proteomes where we group  protein sequences based on the gene(s) encoding them. Each gene group is constituted by one or more UniProtKB protein isoform sequences. A peptide is considered unique if it belongs to only one gene group. Two types of peptides are therefore identified: unique and non-unique.

In the future, we also plan to also add post-translational modification-specific proteomics data sets (initially phosphorylation sites) to the mappings to UniProtKB sequences.

Wednesday, March 2, 2016

Pan proteomes in UniProt

A proteome is the set of proteins thought to be expressed by an organism and is typically obtained from the translation of fully sequenced, annotated genome. The last few years have seen a vast increase in the submission of multiple genomes for the same or closely related organisms. To help users to find the most relevant and best-annotated set of sequences for each taxon we now have the twin concepts of Reference proteomes and the newly introduced Pan proteomes.

Reference proteomes are chosen to provide broad coverage of the tree of life and constitute a representative cross-section of the taxonomic diversity found within UniProtKB. These proteomes - both community selected and computationally determined - include model organisms and other proteomes of interest to biomedical and biotechnological research.

A pan proteome is the full set of proteins thought to be expressed by a group of highly related organisms (e.g. multiple strains of the same bacterial species). Pan proteomes provide a representative set of all the sequences within a taxonomic group and capture unique sequences not found in the group’s reference proteome. UniProtKB pan proteomes encompass all non-redundant proteomes and are aimed at users interested in phylogenetic comparisons and the study of genome evolution and gene diversity.

When a proteome has proteins that are part of a larger pan proteome, you will see it indicated on the proteome page in the 'Pan proteome' row. You will also see a link to download the full fasta sequence set.

You can also download pan proteome sets from the UniProt FTP site http://www.uniprot.org/downloads through the 'Pan proteomes' sub directory.

For each reference proteome cluster, also known as representative proteome group (RPG) (Chen et al., 2011), a pan proteome is a set of sequences consisting of all the sequences in the reference proteome, plus the addition of unique protein sequences that are found in other species or strains of the cluster but not in the reference proteome. Click here to find more about how we compute pan proteome sets.

Thursday, February 18, 2016

Introducing the UniProt Feature Viewer

UniProt provides sequence annotations, a.k.a. protein features, to describe regions or sites of biological interest; secondary structure regionsdomainspost-translational modifications and binding sites among others, which play a critical role in the understanding of what the protein does. With the growth in biological data, integration and visualization becomes increasingly important for exposing different data aspects.
Hence we are introducing the UniProt feature viewer, a BioJS component bringing together protein sequence features in one compact view. If you would like to include the feature viewer in your own website, you can find instructions in our technical documentation.

UniProtKB entry display options

The UniProt feature viewer is available for every UniProtKB protein entry through the ‘Feature viewer’ link under the ‘Display’ heading on the left hand side.

You can click on the links under 'Display' to switch your view between the default entry, the feature viewer or the feature table. The feature viewer shows all sequence features from the entry in an interactive view. It also includes additional features such as variants mapped from Large Scale Studies. The feature table shows all sequence features from only the entry in a tabular format.  

Feature viewer

Similar to genome browsers, the viewer uses tracks to display different protein features providing an intuitive picture of co-localized elements. Each track can be expanded to reveal a more in-depth view of the underlying data. 

Clicking on a feature will trigger a pop-up with more information about the feature such as the feature position, description and any available evidence.

You can zoom into an area by dragging the edges of the ruler. You can then grab the ruler and move it along the sequence to scroll.

You can zoom in straight to the amino acid level by using the zoom icon to the left of the ruler.

You can customize your view to hide or show feature tracks using the settings icon to the left of the ruler.

The Variants track offers a novel visualization and presents UniProt curated natural variants along with imported variants from large-scale studies (such as 1000 Genomes, COSMIC, ExAC and ESP). The track shows all amino acids listed vertically. It plots variants at the position on the x-axis and the amino acid substitution on the y-axis. You can use filters on the left to refine your view.

We hope that the feature viewer will make it easier for you to view, compare and analyze sequence feature in one view. 

Would you like to see more features added to the view or have a request that will make the feature viewer more useful for you? 

Send us your feedback at help@uniprot.org!

Friday, December 11, 2015

View protein sequence annotations as genome browser tracks

With the latest UniProt release 2015_12, we are introducing new genome annotation track files in both BED and bigBed formats that will allow you to view human UniProtKB sequence feature annotations such as domains, sites and post-translational modifications as genome browser tracks! This initial beta release of the UniProt genome annotation tracks resource contains sequence annotations for human only but it will be followed by additional species in the future. 

As well as the standard tracks provided by the UCSC and Ensembl genome browsers, both browsers allow users to upload additional tracks that annotate the genome further to help understand its architecture . Genome browser tracks also allow users to analyze their own sequencing data against the reference genome data and genome annotations. You will now be able to upload files from UniProt to genome browsers to be able to easily compare UniProtKB protein features with other genomic information and also with your own sequencing data if available, bridging the protein and gene visually. 

Each species represented (currently only human) within the genome annotation tracks resource will have its sequence annotations defined with the BED and bigBed formats. 

For example the human active site BED file is called: UP000005640_9606_act_site.bed.  BigBed formatted files have a .bb extension.  You will see two directories on the FTP site for each species (currently only human), one directory for the BED files and a track hub directory that can be used to add all UniProtKB sequence annotations for a species to a genome browser. 

All UniProt annotation tracks can be added in one single step by adding the UniProt species track hub. Simply copy the URL for the species hub.txt file and follow the genome browser instructions on how to add a track hub.

Adding a UniProt species track hub to the Ensembl genome browser.

UniProt FGFR2 features uploaded using a track hub visualized in the Ensembl genome browser.


Adding a track hub in the UCSC genome browser.

UniProt FGFR2 features uploaded using a track hub visualized in the UCSC genome browser.

In order to add specific feature annotation tracks on a genome browser like Ensembl, simply copy the URL to the file and follow the instructions on how to add custom tracks in the Ensembl genome browser or UCSC genome browser.   Individual bigBed files can be added as tracks to a genome browser by utilizing the track definitions provided in the species tracks.txt (UP000005640_9606_tracks.txt) file. 

Adding custom tracks to the Ensembl genome browser

Adding custom tracks or track definitions to UCSC genome browser.

UniProt active site annotation track in the Ensembl genome browser.

We welcome your feedback on this new resource! 

Tuesday, November 24, 2015

UniRule automatic annotation system in UniProt

UniProt has developed two prediction systems, UniRule (Unified Rule system) and the Statistical Automatic Annotation System (SAAS) to automatically annotate unreviewed UniProtKB/TrEMBL entries in an efficient and scalable manner. 

UniRule is a rule-based automatic annotation system that consists of rules devised and tested by experienced curators using experimental data from expertly annotated entries. It automatically annotates entries with a high degree of accuracy. This helps leverage curators' knowledge and expertise to add annotation to a much larger set of protein entries than are possible to annotate solely through expert curation. 

UniRule has been developed by merging existing curated rule-based systems (HAMAP, PIR name and site rules, and RuleBase rules) into one system which stores, applies, and evaluates all rules. 

What is a rule and how does it work to annotate proteins?

Let us look at a fictitious rule to see how this concept works for a basic rule.

Could you make this rule even more granular and specific by adding more conditions?

In this example, the main conditions delineate the space that can be annotated as a 'purple quadrilateral' and the further conditions help add more specific annotation of being a 'square' to a subset. 

This is essentially how rules are created with main conditions and additional conditions to identify sequence matches for which certain annotation can be applied with confidence. The quality of the rules is maintained thanks to the expert curators creating and checking rules before application. 

UniRule annotation in protein entries

If a protein entry contains annotations from the UniRule system, this is indicated in the entry, as seen below.

Clicking on the evidence will take you to the rule that is the source of that annotation. Here you can click on the annotations you're interested in and see how they are applied through the rule or click on the conditions you're interested in and see which annotations they would apply.

If you are interested in exploring rules for proteins, taxonomic groups etc. of your interest, you can also search the UniRule set directly. Just click on the dropdown to the left of the search box to change the focus from 'UniProtKB' to 'UniRule' and search for your query of interest.

So now you can explore rules that UniProt has built to annotate the sequence space of your interest! We always love to hear feedback so please let us know how you would plan to use this functionality and if there is any additional functionality you would find useful. You can always also email us as help@uniprot.org with queries and feedback.

Thursday, September 10, 2015

Linking proteins via pathways

Proteins in UniProt are now linked and connected by pathways! When looking at your protein of interest, you will now be able to see if it is involved in any known pathway and then be able to follow links to other proteins involved at different stages of the pathway hierarchy. This allows you to traverse the world of proteins through the pathways that connect them!

Let's follow the example of protein 3-hydroxyanthranilate 3,4-dioxygenase in Baker's yeast. This protein catalyses the oxidative ring opening of 3-hydroxyanthranilate. When looking at the protein in UniProt http://www.uniprot.org/uniprot/P47096, I see the 'Pathway' comment in the 'Function' section. Let's look at this comment more closely.

I can see the main pathway title that my protein is involved in, in this case NAD(+) biosynthesis. I see exactly which step of which subpathway my protein is involved in. I then see all steps of the subpathway listed out. My protein is involved in Step 3 but I can also see links to the proteins that are involved in the first two steps.

The subpathway, its parent pathway and superpathway are all linked to UniPathway for more information. The final line in the 'Pathway' comment provides links to all proteins involved in the same subpathway (from the same organism) as my protein, its parent pathway and even the superpathway another level up from the parent pathway. In this example, if I follow the link to Cofactor biosynthesis, I see all 63 proteins involved in this pathway listed out.

Try this out and let us know what you think! Your feedback and suggestions are always welcome. 

Wednesday, September 2, 2015

Have you tried our new Beta UniProtJAPI?

A new Beta version of the UniProtJAPI is now available! It aims to improve several issues encountered by the current version such as frequent library updates, retrieval speeds and server availability. We invite you to try the Beta JAPI and share your feedback via a short survey so we can provide the best service for your needs.

Please try the Beta UniProtJAPI for your tasks at http://wwwdev.ebi.ac.uk/uniprot/remotingAPI/index.html and then fill out this short 5 minute survey: https://www.surveymonkey.com/r/2DYCMQM.

Here are some examples of tasks you could try on the JAPI (or simply use your own tasks):

  1. Create a UniProt, UniRef and UniParc service, and use each of these to find out the number of entries in the release.
  2. Retrieve entries containing keywords “Kinase” or “Amyloidosis”.
  3. Retrieve entries that have been updated in the last six months, and then filter only those that are reviewed (Swiss-Prot).
  4. Given the PFAM signature “PM00228”, find the associated reviewed (Swiss-Prot) and non-reviewed (TrEMBL) entries.

We hope you will find our new JAPI useful!