Preparing data for GWAS analysis

Preparing data for GWAS analysis

Preparing data for GWAS analysis Tommy Carstensen AA AB BB ? Manhattan Plot GWAS Quality Control QC 2 Bad QC > Bad Data > Bad Results Entebbe, Uganda 3 Genotype data SNP A SNP B SNP C

SNP D SNP E Female 1 00 AG GG GA 00 Male 1 00 GG GG AA CC

Female 2 AC 00 GG AA CC Female 3 AA AG GC AA CC Male 2 AC AA

00 AA CA 00 = missing data September 2013 Entebbe, Uganda 4 How did we get the genotype data? Genotype Calling Good data SNP1 Bad data SNP2 AA or AB? 00! AA AB BB

September 2013 Entebbe, Uganda 5 Sample QC and SNP QC SNP QC sample QC SNP A SNP B SNP C SNP D SNP E Female 1 00 AG GG

GA 00 Male 1 00 GG GG AA CC Female 2 AC 00 GG AA CC

Female 3 AA AG GC AA CC Male 2 AC AA 00 AA CA 00 = missing data September 2013 Entebbe, Uganda

6 Quality Control Steps Sample QC SNP QC Sample Call Rate/Proportion SNP Call Rate/Proportion Autosomal Heterozygosity Hardy Weinberg Equilibrium (HWE) Sex / Gender X Chromosome Heterozygosity Too Much Relatedness Identity By Descent (IBD) Too Little Relatedness / Confounding Principal Component Analysis (PCA) September 2013 Entebbe, Uganda 7 Quality Control Steps

Sample QC SNP QC Sample Call Rate/Proportion SNP Call Rate/Proportion Autosomal Heterozygosity Hardy Weinberg Equilibrium (HWE) Sex / Gender X Chromosome Heterozygosity Too Much Relatedness Identity By Descent (IBD) Too Little Relatedness / Confounding Principal Component Analysis (PCA) September 2013 Entebbe, Uganda 8 How did we get the data? Genotype Calling Good data SNP1

Bad data SNP2 AA or AB? 00! AA AB BB September 2013 Entebbe, Uganda 9 Sample Call Rate/Proportion SNP1 SNP2 SNP3 SNP4 SNP5 Sample Call Rate

Sample1 00 00 AG GG GA 00 00 60% Sample2 00 00 GG GG AA CC

80% Sample3 AC 00 00 GG AA CC 80% Sample4 AA AG GC AA CC

Sample5 AC AA 00 00 AA CA September 2013 Entebbe, Uganda 100% 80% 10 Quality Control Steps Sample QC SNP QC Sample Call Rate/Proportion

SNP Call Rate/Proportion Autosomal Heterozygosity Hardy Weinberg Equilibrium (HWE) Sex / Gender X Chromosome Heterozygosity Too Much Relatedness Identity By Descent (IBD) Too Little Relatedness / Confounding Principal Component Analysis (PCA) September 2013 Entebbe, Uganda 11 Definition of Heterozygosity Rate Copy 1 Copy 2 SNP1 A A

SNP2 C C SNP3 A T SNP4 C C SNP5 C G SNP6 A

A SNP7 A A SNP8 T T homozygous heterozygous heterozygosity = 2/8 = 0.25 chromosome 1 September 2013 Entebbe, Uganda 12 Heterozygosity Remove samples deviating from average Deviations could arise due to several reasons

heterozygosity Contamination of samples (high heterozygosity) Sample 1 Sample 2 Inbreeding (low heterozygosity) heterozygous Ancestral differenceshomozygous homozygous calling Data quality / Poor genotype Heterozygotes more likely to be missing September 2013 populations Entebbe, Uganda 13 Correlation between quality metrics September 2013 Entebbe, Uganda 14 Quality Control Steps

Sample QC SNP QC Sample Call Rate/Proportion SNP Call Rate/Proportion Autosomal Heterozygosity Hardy Weinberg Equilibrium (HWE) Sex / Gender X Chromosome Heterozygosity Too Much Relatedness Identity By Descent (IBD) Too Little Relatedness / Confounding Principal Component Analysis (PCA) September 2013 Entebbe, Uganda 15 Sex check Looking for mislabelled samples Females Bad Good

Males Good Male 1 allele September 2013 Female 2 alleles Bad Crossover between the X and Y chromosome happens between pseudoautosomal regions. SNPs in PARs are thus excluded from analysis. Entebbe, Uganda 16 Quality Control Steps Sample QC SNP QC Sample Call Rate/Proportion SNP Call Rate/Proportion Autosomal Heterozygosity

Hardy Weinberg Equilibrium (HWE) Sex / Gender X Chromosome Heterozygosity Too Much Relatedness Identity By Descent (IBD) Too Little Relatedness / Confounding Principal Component Analysis (PCA) September 2013 Entebbe, Uganda 17 Relatedness Relatedness is a problem because of overrepresentation of selected alleles, which will bias any multivariate analysis (correlated data!); e.g. PCA or multivariate regression Related samples need to be excluded or taken into account during subsequent analyses One metric of relatedness is Identity By Descent (IBD), which involves calculation of proportion of common alleles between two individuals. Prior to the calculation of IBD, SNPs with a low call rate are permanently excluded and rare SNPs (MAF<5%) and SNPs in Linkage Disequilibrium (LD) are temporarily excluded. September 2013

Entebbe, Uganda 18 Relatedness / IBD Relationship category Monozygotic twins Relatedness Parent-Offspring 1/2 Full siblings 1/2 Grandparent-grandchild 1/4 Uncle/Aunt-Nephew/Niece 1/4 First cousins 1/8

Unrelated 0 Completely identical 1 Half-identical Not identical Me and my mom September 2013 Entebbe, Uganda Me and my sister 19 Count of sample pairs Relatedness / IBD A Ugandan cohort study as an example threshold for temporary exclusion prior to HWE check first cousins (12.5%) etc.

siblings parent-child uncle-niece aunt-nephew etc. duplicates identical twins Maximum IBD for each sample September 2013 Entebbe, Uganda 20 Quality Control Steps Sample QC SNP QC Sample Call Rate/Proportion SNP Call Rate/Proportion Autosomal Heterozygosity

Hardy Weinberg Equilibrium (HWE) Sex / Gender X Chromosome Heterozygosity Too Much Relatedness Identity By Descent (IBD) Too Little Relatedness / Confounding Principal Component Analysis (PCA) September 2013 Entebbe, Uganda 21 How did we get the data? Genotype Calling Good data SNP1 Bad data SNP2 AA or AB? 00! AA AB BB

September 2013 Entebbe, Uganda 22 SNP Call Rate/Proportion SNP1 SNP2 SNP3 SNP4 SNP5 Sample1 00 AG GG GA 00

Sample2 00 GG GG AA CC Sample3 AC 00 GG AA CC Sample4 AA AG

GC AA CC Sample5 AC AA 00 AA CA 60% 80% 80% 100% 80%

SNP Call Rate September 2013 Entebbe, Uganda 23 Quality Control Steps Sample QC SNP QC Sample Call Rate/Proportion SNP Call Rate/Proportion Autosomal Heterozygosity Hardy Weinberg Equilibrium (HWE) Sex / Gender X Chromosome Heterozygosity Too Much Relatedness Identity By Descent (IBD) Too Little Relatedness Principal Component Analysis (PCA) September 2013 Entebbe, Uganda

24 Hardy Weinberg Equilibrium random mating Females Males A (p) C (q) A (p) AA (p2) AC (pq) C (q) AC (pq) CC (q2) September 2013 Sample1

Sample2 Sample3 Sample4 f (A)=p f(C)=q=1-p fe(AA)=p2 SNP1 AC AA AC CC 4/8 4/8 1/4 SNP2 AA AA CC CC 4/8 4/8 1/4 SNP3 AC AC AC

AC 4/8 4/8 1/4 fe(AC)=2pq 2/4 2/4 2/4 fe(CC)=q2 1/4 1/4 1/4 fo(AA) 1/4 2/4 0/4

fo(AC) 2/4 0/4 4/4 fo(CC) 1/4 2/4 0/4 Entebbe, Uganda allele frequencies expected genotype frequencies observed genotype frequencies 25 When HWE does not apply Non-random mating

Selection forces Alleles in disease causing loci Apply HWE only to controls in a case-control study Migration Data quality September 2013 Entebbe, Uganda 26 Quality Control Steps Sample QC SNP QC Sample Call Rate/Proportion SNP Call Rate/Proportion Autosomal Heterozygosity Hardy Weinberg Equilibrium (HWE) Sex / Gender X Chromosome Heterozygosity Too Much Relatedness

Identity By Descent (IBD) Too Little Relatedness / Confounding Principal Component Analysis (PCA) September 2013 Entebbe, Uganda 27 Principal Component Analysis A statistical technique for summarizing many variables with minimal loss of information PCA requires clean non-correlated data September 2013 Entebbe, Uganda 28 Principal Component Analysis A statistical technique for summarizing many variables with minimal loss of information PCA can reveal Population outliers September 2013 Entebbe, Uganda

29 PCA and Population Outliers Africa Europe Study samples Asia September 2013 Entebbe, Uganda 30 Principal Component Analysis A statistical technique for summarizing many variables with minimal loss of information PCA can reveal Population outliers Population structure / Confounding September 2013 Entebbe, Uganda 31

Why population structure / confounding is a problem in a case-control study Causal allele Non-causal allele Balding, Nature Reviews Genetics (2006) September 2013 Entebbe, Uganda 32 Confounding due to chip effects EAST QUAD EAST OCTO September 2013 Entebbe, Uganda 33 PC 1 Confounding due to plate effects Plate Number September 2013

Entebbe, Uganda 34 Observed test statistic Population structure - Inflated QQ-plot September 2013 is a measure of the deviation from the diagonal Expected Entebbe, test Uganda statistic 35 Simple But Effective QC Common Thresholds Sample QC SNP QC Sample Call Rate

>97% SNP Call Rate/Proportion >97% Autosomal Heterozygosity mean3SD Hardy Weinberg Equilibrium (HWE) p>10-4 Sex / Gender PLINK default thresholds Identity By Descent (IBD) <0.05 Principal Component Analysis (PCA) EIGENSTRAT, 6SD, PCs 1-10 September 2013 Entebbe, Uganda 36 Useful references September 2013 37

Useful references September 2013 Entebbe, Uganda 38 Useful software PLINK (QC) http://pngu.mgh.harvard.edu/~purcell/plink EIGENSTRAT (PCA) http://genetics.med.harvard.edu/reich/Reich_Lab/Software.html GEMMA (Association) http://home.uchicago.edu/xz7/software SNPTEST (Association) https://mathgen.stats.ox.ac.uk/genetics_software/snptest/snptest.html shellfish : Parallel PCA and data processing for genome-wide SNP data http://www.stats.ox.ac.uk/~davidson/software/shellfish/shellfish.php September 2013 Entebbe, Uganda 39 Thank you for your attention! Deepti Gurdasani Liz Young

Katherine Ripullone September 2013 Entebbe, Uganda 40

Recently Viewed Presentations

  • Money, Banking, and the Financial System

    Money, Banking, and the Financial System

    R. GLENN HUBBARD ANTHONY PATRICK O'BRIEN Money, Banking, and the Financial System * * Our discussion of the IS curve has left it something of a mystery as to why it is labeled IS.
  • Chronic Rural Poverty and Resilience: Some Reflections and

    Chronic Rural Poverty and Resilience: Some Reflections and

    Lots of potential for collaboration around any of several topics, especially related to CNH systems, poverty traps and resilience. Much to do in all of these areas … a massive research, teaching and outreach agenda. Our group's work is almost...
  • OCLC Online Computer Library Center Preserving Access for

    OCLC Online Computer Library Center Preserving Access for

    Use this template when projecting presentations in a well-lit room
  • Chapter 14: Working with Drawing In this chapter,

    Chapter 14: Working with Drawing In this chapter,

    An auxiliary view is a projected view, which is created by projecting the edges of an object normal to the edge of an existing drawing view. Creating a Detail View A detail view is used to show a portion of...
  • SLAM - Simultaneous Localization and Modeling

    SLAM - Simultaneous Localization and Modeling

    Coexistence in WiFi-NC. 40 MHz = 2 independent 20 MHz. Use wider channels. More Hz -> Higher data rate! 8. 0 MHz = 4 independent 20 MHz. Independent transmit, receive, CCA. All nodes have fair access in all parts of...
  • Introductory Task

    Introductory Task

    In Ilsham Primary school 96% gain level 4 in English. In Ellacombe primary school this is 64%. Free school meals: 13% of Torbay pupils are eligible- Higher than national average and centred around particular areas.
  • Ns. Jukarnain, S.Kep. Mahasiswa manmpu memahami: 1. Pengertian

    Ns. Jukarnain, S.Kep. Mahasiswa manmpu memahami: 1. Pengertian

    Menjaga Akurasi Mencatat semua tindakan keperawatan yang dilakukan Mencatat semua komponen proses keperawatan sesuai dengan waktu pelaksanaan SOR (Source Oriented Record) POR (Problem Oriented Record) PROGRES NOTE CBE (Charting by Exception) PIE (Problem Intervention & Evaluation) FOCUS - DAR Jangan...
  • I like…… I don&#x27;t like… and food

    I like…… I don't like… and food

    I like/ don't like and after click and check. MyCuteGraphics.com . I like ice cream. I don't like bananas. I like chips. I like cereal. I don't like cheese. I like yogurt. I like meat. I like bread. I don't...