Announcement

Collapse
No announcement yet.

% On-Off target

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • % On-Off target

    hello,
    I'm looking for a tool (or command line) to determine the % On-Off target + or - 50 bp of exon from my capture file but not annotated (!!!). Capture SureSelect agilent home.

    Current pipeline:
    GAIIx Illumina
    CASAVA1.8
    IGV
    CNV-seq
    SAMtools
    BEDtools
    GALAXY
    NextGENe

    Please HELP

  • #2
    Create a bed file of your Agilent SureSelect targets and use BEDtools to merge adjacent targets and then slopBed to add 50 bps to either side of your merged targets. Then use Bedtools BedtoBam to convert your bam file to a bed file and then use intersectBed to create an intersection of your bam.bed and the target.bed. This will create a bed file illustrating the target regions covered by your bam file which you can then parse for percent on and off target. I think there may be examples of this workflow in the BEDtools manual available online.

    Comment


    • #3
      You may find picards CalculateHsMetrics useful

      http://picard.sourceforge.net/comman...ulateHsMetrics

      Comment


      • #4
        BEDTools intersectBed will work with bam files, and output .bam files. I use it that way all the time. Your command line look something like:

        intersectBed -abam yourbam.bam -b paddedExometarget.bed > intersect.bam
        I just used Excel to change the coordinates in the target .bed file, to pad them. I'm not sure that's necessry, since intersectBed will get reads that are hanging off of your target regions.

        So then use samtools flagstat to count the number of mapped reads of the original .bam, and then of the intersect.bam

        Comment


        • #5
          Originally posted by laura View Post
          You may find picards CalculateHsMetrics useful

          http://picard.sourceforge.net/comman...ulateHsMetrics
          What is the difference between the BAIT_INTERVALS and TARGET_INTERVALS files?

          Comment


          • #6
            Originally posted by gwilymh View Post
            What is the difference between the BAIT_INTERVALS and TARGET_INTERVALS files?
            To be honest we use the same file for both

            I suspect that when you have files from your pull down supplier there may be some subtle differences but I don't think it matters to hugely

            Comment


            • #7
              Picard HsMetrics is designed by the Broad, which helped develop the Agilent in solution capture method. In the case of Agilent they provide positions for both the baits and the target regions.

              Think:
              Code:
                     1---------Target------------1
              ------IIIIIIIIIIII  exon  IIIIIIIIIIIIII--------
                   xbaitx         ybaity          zbaitz

              Unfortunately, Illumina only provides the target regions not the actual bait locations. So it is harder to decide if some of the non-exonic reads are uncaptured flow-through or captured regions that are not in the "official" targets. Clearly, in my opinion Illumina has captured entire 5kb regions that include 3 exons totaling 1kb resulting in 4kb of high coverage intronic region. Not sure if the designer was just lazy, has ulterior motives, some internal data that this makes target recovery the best?
              Last edited by Jon_Keats; 01-20-2012, 08:18 PM.

              Comment


              • #8
                Originally posted by Jon_Keats View Post
                Picard HsMetrics is designed by the Broad, which helped develop the Agilent in solution capture method. In the case of Agilent they provide positions for both the baits and the target regions.

                Think:
                Code:
                       1---------Target------------1
                ------IIIIIIIIIIII  exon  IIIIIIIIIIIIII--------
                     xbaitx         ybaity          zbaitz

                Unfortunately, Illumina only provides the target regions not the actual bait locations. So it is harder to decide if some of the non-exonic reads are uncaptured flow-through or captured regions that are not in the "official" targets. Clearly, in my opinion Illumina has captured entire 5kb regions that include 3 exons totaling 1kb resulting in 4kb of high coverage intronic region. Not sure if the designer was just lazy, has ulterior motives, some internal data that this makes target recovery the best?
                A colleague of mine directed me to the UCSC Genome Browser Tables page (http://genome.ucsc.edu/cgi-bin/hgTables?org=human), which can be used to search gene names against genome and chromosomes to identify where on a given genome assembly a specific gene is located.

                Comment


                • #9
                  Originally posted by laura View Post
                  You may find picards CalculateHsMetrics useful

                  http://picard.sourceforge.net/comman...ulateHsMetrics
                  Can Picard be used on a Windows system?

                  Comment


                  • #10
                    Originally posted by swbarnes2 View Post
                    ...So then use samtools flagstat to count the number of mapped reads of the original .bam, and then of the intersect.bam
                    Does anyone know the specific commands to compile and execute flagstat in samtools? The samtools literature is frustratingly vague!

                    (I am using samtools in Cygwin in a Windows 7 system)

                    Comment


                    • #11
                      samtools flagstat my_data.bam
                      You have to get samtools installed. I think you just do a make command. On my install, I think I had a slight problem with the curses file, someone admin at my work suggested I tweak that line in the make file slightly, and it works fine now.l

                      Comment


                      • #12
                        Another vote for Picard's CalculateHsMetrics. It's in the public Galaxy (http://main.g2.bx.psu.edu/ under "NGS: Picard (beta)").

                        Comment


                        • #13
                          Picard by default does +/- 250 bp. I recommend using the same file for baits and targets: you can have baits that extend past targets and hence get more coverage for baits than for target if you had all of your target sequence covered by baits. If you had say only 80% of your targets covered by baits, then you already know this, and it just complicates things to try to consider it again. So, again, I recommend using the actual bait intervals if possible.

                          Comment


                          • #14
                            Originally posted by Heisman View Post
                            Picard by default does +/- 250 bp. I recommend using the same file for baits and targets: you can have baits that extend past targets and hence get more coverage for baits than for target if you had all of your target sequence covered by baits. If you had say only 80% of your targets covered by baits, then you already know this, and it just complicates things to try to consider it again. So, again, I recommend using the actual bait intervals if possible.
                            I want to verify that picard does indeed use a +/- 250 bp around each target/bait, but have so far not been able to find this written down anywhere. Where did you come by this information?

                            Also, can the interval be modified (to, say, +/- 300bp)?

                            Comment


                            • #15
                              Somewhere there is a web page that has the code for each of the programs... I stumbled on it before and am not really a computer guy so don't know where to find it. But it had 250 set for that metric.

                              Comment

                              Working...
                              X