www.bioinformatics.org

www.bioinformatics.org

An Automated System for Deep Proteome Annotation
Gary Van Domselaar, Savita Shrivastava,
Paul Stothard and David S. Wishart
Department of Computing Science and Biological Sciences
University of Alberta
Edmonton AB T6E 2E9

[email protected]
[email protected]

Abstract
Most biological databases in existence today are
focused on a narrow biological domain. As such, they
are unable to address biological questions outside of
that domain. Researchers wishing to address broad
biological questions must manually compile data from
several biological data sources.
This poster describes our progress on the development
of an automated system for deeply annotating the
proteomes of model organisms (and others), and an
intuitive data mining and data visulization system that
provide detailed information for broad biological queries.
The Deep Annotaition system is part of the PENCE
Proteome Analyst project.

Internal Processing

Deeply Annotated Model Organisms
Human, Mouse, E. coli,
D. melanogaster,
S. cerevisiae, C. elegans, etc.

Unannotated
Genome
Sequence Data

1

3

Introduction
Prior to the advent of high throughput sequencing, most biologists would
annotate or characterize genes and proteins manually one at a time.
However for genome scale annotation it is too consuming to predict the
properties of each protein sequence or to organize the results of many
prediction tools by hand. Furthermore due to the enormous volume of
biological information, the sheer number of different data sources, and their
growing heterogeneity, an 'information labyrinth' has been created, where
one can easily lose ones way on such a quest for information. Clearly a
high degree of automation is required to cope with the analysis of the huge
number of sequences generated by genome sequencing projects, and to
ensure consistent and reproducible results. This automation could free the
expert to verify and refine these analyses and to follow up new discoveries.
A number of systems have been developed over the past few years that
permit automated genome-wide or proteome-wide annotation, such as The
ENSEMBL system, PEDANT, Magpie, GeneQuiz, and Proteome
Analyst.

Local Datbases:
SwissProt

Unannotated
Protein
Sequence Data

2

5

4

Description
1. The system accepts proteomic or genomic data. If the user
submits genomic data, gene predictions can be performed with
Glimmer or Genscan.
2. The unnanotated sequences enter the Proteome Annotation
System.
3. Sequences are compared against existing deeply annotated
databases. Sequences with sufficient homology inheret
appropriate annotations. Other annoations are computed locally.
4. Annotations unavailable locally are obtained by querying
servers and databases across the Internet.
5. The annotated sequence data is added to the database of
annotated organisms and made available for viewing and
querying.

Annotated
Protein
Sequence Data

6. Annoations are viewable over the Web using CGView for
circular chromosomes, and LGView for linear chromosomes.
Broad queries can be made across organisms for an arbitrary
subset of available annotations.

6

Progress
Visualization
and
Mining Software

The above-mentioned systems are web-based tools designed to identify
genes, parse data, translate sequences, search against public databases,
identify domains or motifs and perform predictive analyses. Many of these
packages provide user-customizable searches and graphical, hyperlinked
output. The level of interpretation or inference offered by these annotation
systems varies widely, with some offering only raw data in a consolidated
format and others inferring function or ontology through detailed analysis.

The workflow engine, database comparison, data input / output
and html rendering systems are in place. A number of
annotation computing modules have been implemented (Pfam,
PROSITE, Protein Name Finder, Orthologues, Paralogues,
Molecular Weight, PI, Subcellular Location Prediction, and
Function Prediction). Many more are being written. We are
currently working on improving the data storage and querying
systems. An initial release has been planned for mid-summer
2004.
An early test version of the output (on H. influenzae) is
available at:

A common problem for many existing automated annotation system is that
the depth of annotation about any given gene or protein is quite limited or
shallow, typically consisting of 10-15 piece of information. We are
working on an automated system (The Proteome Analyst System) for
deeply annotating the proteomes of model organisms, and developing an
intuitive data mining and data visualization system. Deep annotation
means that the proteome/genome is annotated to a level that includes such
items as predicted protein location, 2D or 3D structure, detailed or specific
functions, post-translational modifications, expression levels, interacting
partners, domains, active sites, substrates, ligands, pathways, cofactors,
copy numbers, etc.. An example of the kind of "deep" annotation can be
seen on Cybercell database. This deep annotation project contains a
software engineering component that integrates existing data and methods
to perform a scientific analysis of the integrated data. The results of this
kind of project are of interest from the scientific point of view and from the
software engineering point of view. This deep annotation system may be
used to support a wide range of biologists and could be a platform for
further developments. Since the similarity of functions between related
proteins varies substantially depending on the species context and
evolutionary distance, the relevant analysis and annotations also differ
between the kingdoms (viruses, archaebacteria, protista, fungae, animalia,
eubacteria, plantae). The major challenge of this project is to develop
custom analysis pipelines for each kingdom.

http://redpoll.pharmacy.ualberta.ca/~savita/ha_series/

References

External Processing

Proteome Analyst:
http://www.cs.ualberta.ca/~bioinfo/PA/
Cybercell:
http://redpoll.pharmacy.ualberta.ca/CCDB/
MagPie:
http://magpie.ucalgary.ca/
GeneQuiz:
http://jura.ebi.ac.uk:8765/ext-genequiz/
Pedant:
http://pedant.gsf.de/
Ensembl:
http://ensembl.org/

Recently Viewed Presentations

  • Tom and Jerry - University of Florida

    Tom and Jerry - University of Florida

    EEL-5666 - Intelligent Machines Design Lab Tom and Jerry By: Nicholas Johnson & Joshua Hartman EEL-5666: Intelligent Machines Design Lab Overview Overall Function Tom's Design Jerry's Design Special Sensor - RF Link Conclusions Possible Improvements Questions EEL-5666: Intelligent Machines Design...
  • The Odyssey

    The Odyssey

    Crucial to the fall of Troy in the last days of the Trojan War. The shrewdest of all Greeks; conceived a plan to hide in a huge wooden horse to sneak into the gates of Troy. Angered the Gods who...
  • asethaeportfolio.weebly.com

    asethaeportfolio.weebly.com

    Croscarmellose Sodium. Molecular Formula: C. 28 H 30 Na 8 O. 27. It's synthetic which mean inorganic. Description: Croscarmellose Sodium acts as an enzyme to break down the tablet's contents for easier digestion making it faster for the effects to...
  • Glass Fractures - bsapp.com

    Glass Fractures - bsapp.com

    Link a suspect to a crime scene Fingerprints Blood Fractures Direction of Penetration A projectile hole is inevitably wider at the exit side Direction of Penetration Stress Marks Successive Penetrations A fracture always terminates at an existing line of fracture...
  • CPO Marketing Plan Survey

    CPO Marketing Plan Survey

    Budget proposals to . improve environmental performance, maximize financial resources through multi-agency coordination and strategic targeting of natural resource concerns, monitor environmental effectiveness, and . engage our state's farmers and ranchers in solutions for a resilient farm economy. 2019-21 WSCC...
  • 1 Introducing Giving in Grace A biblically-based and

    1 Introducing Giving in Grace A biblically-based and

    This presentation introduces Giving in Grace to a leadership team or church council and potential planning group members. The presentation itself should take no more than 30 minutes, ideally less. If you can, allow up to one hour at a...
  • Ionization Energy - Cabarrus County Schools

    Ionization Energy - Cabarrus County Schools

    F, At, Cl, Br. Electron Affinity. Electron affinity is defined as the neutral atom's likelihood of gaining an electron. ... Place the following in increasing ionization energy: Cl , As, Ca, Ni. Place the following in increasing atomic radius: I,...
  • Sacramental Awareness - ndcrusaders.org

    Sacramental Awareness - ndcrusaders.org

    What is Sacramental Awareness? A special appreciation of the sacred in the world. Sacred-the holy, that which is of God. The Catholic faith teaches us to see the world and everything in as sacred—filled with God.