Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tonio100680
    Member
    • Apr 2010
    • 25

    % On-Off target

    hello,
    I'm looking for a tool (or command line) to determine the % On-Off target + or - 50 bp of exon from my capture file but not annotated (!!!). Capture SureSelect agilent home.

    Current pipeline:
    GAIIx Illumina
    CASAVA1.8
    IGV
    CNV-seq
    SAMtools
    BEDtools
    GALAXY
    NextGENe

    Please HELP
  • Dameon
    Member
    • Dec 2011
    • 14

    #2
    Create a bed file of your Agilent SureSelect targets and use BEDtools to merge adjacent targets and then slopBed to add 50 bps to either side of your merged targets. Then use Bedtools BedtoBam to convert your bam file to a bed file and then use intersectBed to create an intersection of your bam.bed and the target.bed. This will create a bed file illustrating the target regions covered by your bam file which you can then parse for percent on and off target. I think there may be examples of this workflow in the BEDtools manual available online.

    Comment

    • laura
      Senior Member
      • Sep 2008
      • 151

      #3
      You may find picards CalculateHsMetrics useful

      Comment

      • swbarnes2
        Senior Member
        • May 2008
        • 910

        #4
        BEDTools intersectBed will work with bam files, and output .bam files. I use it that way all the time. Your command line look something like:

        intersectBed -abam yourbam.bam -b paddedExometarget.bed > intersect.bam
        I just used Excel to change the coordinates in the target .bed file, to pad them. I'm not sure that's necessry, since intersectBed will get reads that are hanging off of your target regions.

        So then use samtools flagstat to count the number of mapped reads of the original .bam, and then of the intersect.bam

        Comment

        • gwilymh
          Member
          • Dec 2011
          • 72

          #5
          Originally posted by laura View Post
          You may find picards CalculateHsMetrics useful

          http://picard.sourceforge.net/comman...ulateHsMetrics
          What is the difference between the BAIT_INTERVALS and TARGET_INTERVALS files?

          Comment

          • laura
            Senior Member
            • Sep 2008
            • 151

            #6
            Originally posted by gwilymh View Post
            What is the difference between the BAIT_INTERVALS and TARGET_INTERVALS files?
            To be honest we use the same file for both

            I suspect that when you have files from your pull down supplier there may be some subtle differences but I don't think it matters to hugely

            Comment

            • Jon_Keats
              Senior Member
              • Mar 2010
              • 279

              #7
              Picard HsMetrics is designed by the Broad, which helped develop the Agilent in solution capture method. In the case of Agilent they provide positions for both the baits and the target regions.

              Think:
              Code:
                     1---------Target------------1
              ------IIIIIIIIIIII  exon  IIIIIIIIIIIIII--------
                   xbaitx         ybaity          zbaitz

              Unfortunately, Illumina only provides the target regions not the actual bait locations. So it is harder to decide if some of the non-exonic reads are uncaptured flow-through or captured regions that are not in the "official" targets. Clearly, in my opinion Illumina has captured entire 5kb regions that include 3 exons totaling 1kb resulting in 4kb of high coverage intronic region. Not sure if the designer was just lazy, has ulterior motives, some internal data that this makes target recovery the best?
              Last edited by Jon_Keats; 01-20-2012, 08:18 PM.

              Comment

              • gwilymh
                Member
                • Dec 2011
                • 72

                #8
                Originally posted by Jon_Keats View Post
                Picard HsMetrics is designed by the Broad, which helped develop the Agilent in solution capture method. In the case of Agilent they provide positions for both the baits and the target regions.

                Think:
                Code:
                       1---------Target------------1
                ------IIIIIIIIIIII  exon  IIIIIIIIIIIIII--------
                     xbaitx         ybaity          zbaitz

                Unfortunately, Illumina only provides the target regions not the actual bait locations. So it is harder to decide if some of the non-exonic reads are uncaptured flow-through or captured regions that are not in the "official" targets. Clearly, in my opinion Illumina has captured entire 5kb regions that include 3 exons totaling 1kb resulting in 4kb of high coverage intronic region. Not sure if the designer was just lazy, has ulterior motives, some internal data that this makes target recovery the best?
                A colleague of mine directed me to the UCSC Genome Browser Tables page (http://genome.ucsc.edu/cgi-bin/hgTables?org=human), which can be used to search gene names against genome and chromosomes to identify where on a given genome assembly a specific gene is located.

                Comment

                • gwilymh
                  Member
                  • Dec 2011
                  • 72

                  #9
                  Originally posted by laura View Post
                  You may find picards CalculateHsMetrics useful

                  http://picard.sourceforge.net/comman...ulateHsMetrics
                  Can Picard be used on a Windows system?

                  Comment

                  • gwilymh
                    Member
                    • Dec 2011
                    • 72

                    #10
                    Originally posted by swbarnes2 View Post
                    ...So then use samtools flagstat to count the number of mapped reads of the original .bam, and then of the intersect.bam
                    Does anyone know the specific commands to compile and execute flagstat in samtools? The samtools literature is frustratingly vague!

                    (I am using samtools in Cygwin in a Windows 7 system)

                    Comment

                    • swbarnes2
                      Senior Member
                      • May 2008
                      • 910

                      #11
                      samtools flagstat my_data.bam
                      You have to get samtools installed. I think you just do a make command. On my install, I think I had a slight problem with the curses file, someone admin at my work suggested I tweak that line in the make file slightly, and it works fine now.l

                      Comment

                      • ECO
                        --Site Admin--
                        • Oct 2007
                        • 1360

                        #12
                        Another vote for Picard's CalculateHsMetrics. It's in the public Galaxy (http://main.g2.bx.psu.edu/ under "NGS: Picard (beta)").

                        Comment

                        • Heisman
                          Senior Member
                          • Dec 2010
                          • 534

                          #13
                          Picard by default does +/- 250 bp. I recommend using the same file for baits and targets: you can have baits that extend past targets and hence get more coverage for baits than for target if you had all of your target sequence covered by baits. If you had say only 80% of your targets covered by baits, then you already know this, and it just complicates things to try to consider it again. So, again, I recommend using the actual bait intervals if possible.

                          Comment

                          • gwilymh
                            Member
                            • Dec 2011
                            • 72

                            #14
                            Originally posted by Heisman View Post
                            Picard by default does +/- 250 bp. I recommend using the same file for baits and targets: you can have baits that extend past targets and hence get more coverage for baits than for target if you had all of your target sequence covered by baits. If you had say only 80% of your targets covered by baits, then you already know this, and it just complicates things to try to consider it again. So, again, I recommend using the actual bait intervals if possible.
                            I want to verify that picard does indeed use a +/- 250 bp around each target/bait, but have so far not been able to find this written down anywhere. Where did you come by this information?

                            Also, can the interval be modified (to, say, +/- 300bp)?

                            Comment

                            • Heisman
                              Senior Member
                              • Dec 2010
                              • 534

                              #15
                              Somewhere there is a web page that has the code for each of the programs... I stumbled on it before and am not really a computer guy so don't know where to find it. But it had 250 set for that metric.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              96 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              117 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              109 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...