Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Exome Sequencing reads alignment outside capture regions

    For illumina exome sequencing pair end single lane data with Agilent Sure Select protocal, anyone know what is the expected amount of reads you will get from non exon part of genome? Because I found a large portion, 30% of reads aligned outside of capture region. If you have worked with exome sequencing, is this a reasonable number?

    Thanks

  • #2
    That is reasonable.. we see ~40% off target. Perhaps this number changes with their latest 50MB capture kit!
    --
    bioinfosm

    Comment


    • #3
      Here as well. Using solid: 65-70 % of bases (of mappable reads) are on target.

      Comment


      • #4
        Are those reads aligned outside capture region really from non capture regions or are they just misaligned to those place.

        Comment


        • #5
          Out of curiosity, how are you defining 'off target' vs. 'on target' reads. Do you mean reads that fall entirely within the capture sequence (i.e. the boundaries of each probe), or overlap the target region by at least one base, or are within distance X of a target, etc.??

          Comment


          • #6
            I counted the individual base hit in the capture region and outside capture region. I suppose I also could just count the starting of the read loc, it wouldn't make much difference.

            Comment


            • #7
              I pad the target regions with ~80-100bp and then do the intersection with my bam files using bedtools:

              Code:
              intersectBed -wa -abam alignments.bam -b target_regions.100bp_padded.bed > alignments.ontarget.bam

              Comment


              • #8
                Thanks zee. That's what I was wondering. Without the padding it seems like you could substantially overestimate the rate of off-target hits. And since the hybridization is going to grab DNA fragments that have partial overlap with the target regions, you expect a fair amount of off-target sequence. Aligned bases that are within ~80-100 bases off the target region are quite different from bases that are very distant from the target region when evaluating the success of the enrichment.
                Last edited by malachig; 09-16-2010, 10:24 PM.

                Comment


                • #9
                  I picked up this hint from the Bainbridge et al., 2010 paper on whole exome capture sequencing. There is a section in the Materials and Methods which contains:

                  Target exons were padded to a minimum length of 80 bp, and consolidated to remove redundant overlaps.
                  It makes sense to allow for some padding around the target region. Even with their protocol they only recovered at most 51% for SOLiD and 78% with Illumina PE.

                  Comment


                  • #10
                    Originally posted by malachig View Post
                    Out of curiosity, how are you defining 'off target' vs. 'on target' reads. Do you mean reads that fall entirely within the capture sequence (i.e. the boundaries of each probe), or overlap the target region by at least one base, or are within distance X of a target, etc.??
                    We do not use "on target reads" but "on target bases". From all mapped reads the percentage of bases "on target" is determined. So if a read overlaps "target region" for 10%, it contributes 10%. And if overlaps almost completely at 95%, it contributes 95%. It seems a more honest definition of "on target" to me than some cutoff (overlap or distance)...

                    Comment


                    • #11
                      Percentage of coverage for each chromosome - Agilent design

                      Hi there,
                      Does anyone have some numbers on % of coverage for each individual chromosome? I got some weird results for chr7. Looking at the file Agilent provided us with the probe design (based on hg18) I saw that the last coordinates for chr7 were
                      chr7 158828612 158828732
                      chr7 158829397 158829517
                      chr7 158829517 158829637
                      chr7 158835676 158835796
                      chr7 158835748 158835868
                      chr7 158851160 158851280
                      chr7 158896436 158896556
                      chr7 158902496 158902616
                      chr7 158935127 158935247
                      chr7 158937377 158937497

                      whereas the length of chr7 in hg18 was 158821424
                      Was the whole-exome capture kit (the one for 38Mb) designed on hg18 or hg19 human reference. We were told it was hg18.
                      I'm very confused...

                      Regards,

                      S.

                      Comment


                      • #12
                        Originally posted by Sheila View Post
                        Hi there,
                        chr7 158937377 158937497

                        whereas the length of chr7 in hg18 was 158821424
                        Was the whole-exome capture kit (the one for 38Mb) designed on hg18 or hg19 human reference. We were told it was hg18.
                        It is designed to hg19. From my hg19 annotation file:
                        chr7 158937377 158937497 A_36_B106723 1000 +

                        You can get the original file on the agilent earray site
                        earray.chem.agilent.com/earray/

                        Comment


                        • #13
                          Originally posted by Sheila View Post
                          Hi there,
                          Does anyone have some numbers on % of coverage for each individual chromosome? I got some weird results for chr7. Looking at the file Agilent provided us with the probe design (based on hg18) I saw that the last coordinates for chr7 were
                          chr7 158828612 158828732
                          chr7 158829397 158829517
                          ...

                          S.
                          Hi Sheila,

                          I contacted Agilent and they didn't send me the reference file for the exome. As I read, it seems that they did send you this file, so could you please send me your reference file? As I read you used the whole-exome capture kit (the one for 38Mb) in your analysis.

                          Thanks

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Recent Developments in Metagenomics
                            by seqadmin





                            Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                            09-23-2024, 06:35 AM
                          • seqadmin
                            Understanding Genetic Influence on Infectious Disease
                            by seqadmin




                            During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                            Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                            09-09-2024, 10:59 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 04:51 AM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 10-01-2024, 07:10 AM
                          0 responses
                          13 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 09-30-2024, 08:33 AM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 09-26-2024, 12:57 PM
                          0 responses
                          16 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X