The World of Microbes on the Internet - Genomics Help

The World of Microbes on the Internet - Genomics Help

Bioinformatics Genomic Biology as a Quantitative Science Stuart M. Brown, Ph.D. Director, Research Computing, NYU School of Medicine A Genome Revolution is underway in Biology and Medicine

We are in the midst of a "Golden Era" of biology The Human Genome Project has produced a huge storehouse of data that will be used to change every aspect of biological research and medicine The revolution is about treating biology as an information science, not about specific technologies.

The Human Genome Project The job of the biologist is changing As more biological information becomes available and laboratory equipment becomes more automated ... The biologist will spend more time using computers & on experimental design and data analysis (and less time doing tedious lab biochemistry)

Biology will become a more quantitative science (think how the periodic table affected chemistry) Biological Information Protein 2-D gel mRNA Expression Protein 3-D Structure

Mass Spec. Genome sequence The Cell A review of some basic genetics DNA 4 bases (G, C, T, A)

base pairs G--C T--A genes

non-coding regions Decoding Genes Classic Molecular Biology

A gene is a DNA sequence at a particular locus on a chromosome that encodes a protein. The Central Dogma of Molecular Biology: DNA > RNA > Protein A mutation changes the DNA sequence - leads to a change in protein sequence - or no protein.

Alleles are slightly different DNA sequences of the same gene. The human genome is the the complete DNA content of the 23 pairs of human chromosomes 44 autosomes plus two sex chromosomes - approximately 3.2 billion base pairs. Bold Words from Francis Collins: The history of biology was forever altered a decade ago by the bold decision to launch a research program that would characterize in

ultimate detail the complete set of genetic instructions of the human being. Francis S. Collins Director of the National Human Genome Research Institute N Engl J Med 1999 882:42-65 Genome Projects Complete genomic sequences:

Dozens of microorganisms Yeast, C. elegans, Drosophila Mouse

Human Comparative genomics All this data is enabling new kinds of research for those with the computational skills to take advantage of it. How does genome sequencing technology work?

Molecular biology of the Sanger method Sub-cloning of fragments - BAC, PAC, cosmid, plasmid, phage Automated sequencers The need for computers to assemble the "reads" and manage the workflow Automated sequencing machines,

particularly those made by PE Applied Biosystems, use 4 colors, so they can read all 4 bases at once. Raw Genome Data: Lots of Sequence Data How to extract useful knowledge from all of this data?

Need sophisticated computer tools Find the genes Figure out what they do (function)

Diagnostic tests Medical treatments Finding genes in genome sequence is not easy About 1% of human DNA encodes functional genes.

Genes are interspersed among long stretches of non-coding DNA. Repeats, pseudo-genes, and introns confound matters Gene prediction tools - look for Start and Stop codons, intron splice sites,

similarity to known genes and cDNAs, etc. Data Mining Tools Scientists need to work with a lot of layers of information about the genome

coding sequence of known genes and cDNAs genetic maps (known mutations and markers) gene expression Protein sequence (from Mass Spectroscopy) cross species homology

Most of the best tools are free on the Web UCSC Ensembl at EBI/EMBL What comes after Genome Sequencing?

We are now in the "Post-Genomic" era. It is possible to use the genome sequence plus a variety of automated laboratory equipment to do entirely new kinds of biology. Not just scaled-up, but comprehensive Relate genes to Organisms

Diseases OMIM: Human Genetic Disease Metabolic and regulatory pathways KEGG Cancer Genome Project Human Alleles

The OMIM (Online Mendelian Inheritance in Man) database at the NCBI tracks all human mutations with known phenotypes. It contains a total of about 2,000 genetic diseases [and another ~11,000 genetic loci with known phenotypes - but not necessarily known gene

sequences] It is designed for use by physicians: can search by disease name contains summaries from clinical studies KEGG: Kyoto Encylopedia of Genes and Genomes

Enzymatic and regulatory pathways Mapped out by EC number and crossreferenced to genes in all known organisms (wherever sequence information exits) Parallel maps of regulatory pathways Genomics

What is Genomics? An operational definition: The application of high throughput automated technologies to molecular biology. A philosophical definition: A wholistic or systems approach to the study of information flow within a cell. Genomics Technologies

Automated DNA sequencing Automated annotation of sequences DNA microarrays gene expression (measure RNA levels)

SNP Genotyping Genome diagnostics (genetic testing) Proteomics Protein identification Protein-protein interactions DNA chip microarrays

Put a large number (~100K) of cDNA sequences or synthetic DNA oligomers onto a glass slide (or other substrate) in known locations on a grid. Label an RNA sample and hybridize Measure amounts of RNA bound to each square in the

grid Make comparisons Cancerous vs. normal tissue Treated vs. untreated Time course Many applications in both basic and clinical research Spot your own Chip

(plans available for free from Pat Browns website) Robot spotter QuickTime and a TIFF (Uncompressed) decompressor are needed to see this picture. Ordinary glass microscope slide

cDNA spotted microarrays Goal of Microarray experiments Microarrays are a very good way of identifying a bunch of genes involved in a disease process Differences between cancer and normal tissue Tuberculosis infected vs resistant lung cells

Mapping out a pathway Co-regulated genes Finding function for unknown genes Involved these processes Direct Medical Applications

Diagnosis Type of cancer Aggressive or benign? Monitor treatment outcome Is a treatment having the desired effect on the target tissue?

When you go looking you will certainly find something! Human Genetic Variation Every human has essentially the same set of genes But there are different forms of each gene -- known as

alleles blue vs. brown eyes genetic diseases such as cystic fibrosis or Huntingtons disease are caused by dysfunctional alleles Alleles are created by mutations in the DNA sequence of one person - which are passed on to their descendants Clinical Manifestations of Genetic Variation

(All disease has a genetic component) Susceptibility vs. resistance Variations in disease severity or symptoms Reaction to drugs (pharmacogenetics) All of these traits can be traced back to particular genes (or sets of genes)

Pharmacogenomics People react differently to drugs Side effects Variable effectiveness There are genes that control these

reactions SNP markers can be used to identify these genes (profiles) Use the Profiles Genetic profiles of new patients can then be

used to prescribe drugs more effectively & avoid adverse reactions. Sell a drug with a gene test Can also speed clinical trials by testing on those who are likely to respond well. Toxicogenomics

There are a number of common pathways for drug toxicity (or environmental tox.) It is possible to compile genomic signatures (gene expression data) for these pathways.

Candidate drug molecules can be screened in cell culture or in animals for induction of these toxicity pathways. Planning for a Genomics Revolution Bioinformatics support must be integral in the planning process for the development of new genomics research facilities.

Genome Project sequencing centers have more staff and more $$$ spent on data analysis than on the sequencing itself. Microarray facilities will be even more skewed toward data analysis It is an information-intensive business!

Implications for Biomedicine Physicians will use genetic information to diagnose and treat disease. Virtually all medical conditions have a genetic component.

Faster drug development research Individualized drugs Gene therapy All Biologists will use gene sequence information in their daily work

Training "computer savvy" scientists Know the right tool for the job Get the job done with tools available

Network connection is the lifeline of the scientist Jobs change, computers change, projects change, scientists need to be adaptable Long Term Implications

A "periodic table for biology" will lead to an explosion of research and discoveries we will finally have the tools to start making systematic analyses of biological processes (quantitative biology). Understanding the genome will lead to the ability to change it - to modify the characteristics of organisms and people in a wide variety of ways

Genomics Education Genomics scientists need basic training in both Molecular Biology and Computing Specific training in the use of automated

laboratory equipment, the analysis of large datasets, and bioinformatics algorithms Particularly important for the training of medical doctors - at least a familiarity with the technology Genomics in Medical Education The explosion of information about the new genetics will create a huge problem in health education. Most physicians in practice have had not a single hour of

education in genetics and are going to be severely challenged to pick up this new technology and run with it." Francis Collins Bioinformatics: A Biologist's Guide to Biocomputing and the Internet

Stuart M. Brown, Ph.D. [email protected]

Recently Viewed Presentations

  • Biology Book section 5-3 Homework (due next class - 15 points)

    Biology Book section 5-3 Homework (due next class - 15 points)

    Arial MS Pゴシック Calibri Wingdings Default Design Introducing the Disease Unit: Biology Book section 5-1 and 5-2 Homework (due Wednesday - 20 points) Section 5-1 Cell Theory Section 5-2 Cell structure Biology Book section 5-3 Homework (due friday - 15...
  • OPCVL - IB History.

    OPCVL - IB History.

    OPCVL. What's the point? All sources must be approached with caution. When reading a source . one must consider who wrote it, why they wrote it, what is included, what is left out, and how helpful this source will be...
  • AQTF Essential Conditions and Standards for ... - Velg Training

    AQTF Essential Conditions and Standards for ... - Velg Training

    AQTF Essential Conditions and Standards for Continuing Registration Standard 1.4 Explained Created by VELG July 2010 Standard 1 Standard 1: The RTO provides quality training and assessment across all of its operations.
  • Early Management of NSTE-ACS: From the ED to the Cath Lab

    Early Management of NSTE-ACS: From the ED to the Cath Lab

    Early Management of NSTE-ACS: From the ED to the Cath Lab Main Limitations of UFH Fondaparinux* Bivalirudin Otamixaban Early Management of NSTE-ACS: From the ED to the Cath Lab Evaluation of Chest Pain ACCF/AHA Recommendations for Initial Management of UA/NSTEMI...
  • Chapter 7

    Chapter 7

    Temporal method still applies. 8-* FUNCTIONAL CURRENCY To determine whether a subsidiary is integrated with the parent or operates independently, SFAS 52 introduced the concept of functional currency. A company's functional currency is the primary currency of the foreign entity's...
  • Distributed Software Engineering Hadoop and Spark David A.

    Distributed Software Engineering Hadoop and Spark David A.

    MapReduce Combiner Can also define an option function "Combiner" (to optimize bandwidth) If defined, runs after Mapper & before Reducer on every node that has run a map task Combiner receives as input all data emitted by the Mapper instances...
  • MMS SCA meeting - Loudoun County Public Schools

    MMS SCA meeting - Loudoun County Public Schools

    Must have a campaign slogan. Posters approved by Mrs. James/ Mrs. Robucci. 8th grade posters- Mr. Rutledge/ O'Meara & Mr. Sarra. Speech no longer than 2 minutes, explaining why you want to serve for Executive Board. VOTING DAY. ... MMS...
  • Comparison of White Book Capacity to Sustained Peaking Analysis

    Comparison of White Book Capacity to Sustained Peaking Analysis

    Comparison of White Book Capacity to Sustained Peaking Analysis Comparison of White Book Capacity to Sustained Peaking Analysis * * 1944 July Comparison