Header Leaderboard Ad

Collapse

% On-Off target

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • % On-Off target

    hello,
    I'm looking for a tool (or command line) to determine the % On-Off target + or - 50 bp of exon from my capture file but not annotated (!!!). Capture SureSelect agilent home.

    Current pipeline:
    GAIIx Illumina
    CASAVA1.8
    IGV
    CNV-seq
    SAMtools
    BEDtools
    GALAXY
    NextGENe

    Please HELP

  • #2
    Create a bed file of your Agilent SureSelect targets and use BEDtools to merge adjacent targets and then slopBed to add 50 bps to either side of your merged targets. Then use Bedtools BedtoBam to convert your bam file to a bed file and then use intersectBed to create an intersection of your bam.bed and the target.bed. This will create a bed file illustrating the target regions covered by your bam file which you can then parse for percent on and off target. I think there may be examples of this workflow in the BEDtools manual available online.

    Comment


    • #3
      You may find picards CalculateHsMetrics useful

      http://picard.sourceforge.net/comman...ulateHsMetrics

      Comment


      • #4
        BEDTools intersectBed will work with bam files, and output .bam files. I use it that way all the time. Your command line look something like:

        intersectBed -abam yourbam.bam -b paddedExometarget.bed > intersect.bam
        I just used Excel to change the coordinates in the target .bed file, to pad them. I'm not sure that's necessry, since intersectBed will get reads that are hanging off of your target regions.

        So then use samtools flagstat to count the number of mapped reads of the original .bam, and then of the intersect.bam

        Comment


        • #5
          Originally posted by laura View Post
          You may find picards CalculateHsMetrics useful

          http://picard.sourceforge.net/comman...ulateHsMetrics
          What is the difference between the BAIT_INTERVALS and TARGET_INTERVALS files?

          Comment


          • #6
            Originally posted by gwilymh View Post
            What is the difference between the BAIT_INTERVALS and TARGET_INTERVALS files?
            To be honest we use the same file for both

            I suspect that when you have files from your pull down supplier there may be some subtle differences but I don't think it matters to hugely

            Comment


            • #7
              Picard HsMetrics is designed by the Broad, which helped develop the Agilent in solution capture method. In the case of Agilent they provide positions for both the baits and the target regions.

              Think:
              Code:
                     1---------Target------------1
              ------IIIIIIIIIIII  exon  IIIIIIIIIIIIII--------
                   xbaitx         ybaity          zbaitz

              Unfortunately, Illumina only provides the target regions not the actual bait locations. So it is harder to decide if some of the non-exonic reads are uncaptured flow-through or captured regions that are not in the "official" targets. Clearly, in my opinion Illumina has captured entire 5kb regions that include 3 exons totaling 1kb resulting in 4kb of high coverage intronic region. Not sure if the designer was just lazy, has ulterior motives, some internal data that this makes target recovery the best?
              Last edited by Jon_Keats; 01-20-2012, 08:18 PM.

              Comment


              • #8
                Originally posted by Jon_Keats View Post
                Picard HsMetrics is designed by the Broad, which helped develop the Agilent in solution capture method. In the case of Agilent they provide positions for both the baits and the target regions.

                Think:
                Code:
                       1---------Target------------1
                ------IIIIIIIIIIII  exon  IIIIIIIIIIIIII--------
                     xbaitx         ybaity          zbaitz

                Unfortunately, Illumina only provides the target regions not the actual bait locations. So it is harder to decide if some of the non-exonic reads are uncaptured flow-through or captured regions that are not in the "official" targets. Clearly, in my opinion Illumina has captured entire 5kb regions that include 3 exons totaling 1kb resulting in 4kb of high coverage intronic region. Not sure if the designer was just lazy, has ulterior motives, some internal data that this makes target recovery the best?
                A colleague of mine directed me to the UCSC Genome Browser Tables page (http://genome.ucsc.edu/cgi-bin/hgTables?org=human), which can be used to search gene names against genome and chromosomes to identify where on a given genome assembly a specific gene is located.

                Comment


                • #9
                  Originally posted by laura View Post
                  You may find picards CalculateHsMetrics useful

                  http://picard.sourceforge.net/comman...ulateHsMetrics
                  Can Picard be used on a Windows system?

                  Comment


                  • #10
                    Originally posted by swbarnes2 View Post
                    ...So then use samtools flagstat to count the number of mapped reads of the original .bam, and then of the intersect.bam
                    Does anyone know the specific commands to compile and execute flagstat in samtools? The samtools literature is frustratingly vague!

                    (I am using samtools in Cygwin in a Windows 7 system)

                    Comment


                    • #11
                      samtools flagstat my_data.bam
                      You have to get samtools installed. I think you just do a make command. On my install, I think I had a slight problem with the curses file, someone admin at my work suggested I tweak that line in the make file slightly, and it works fine now.l

                      Comment


                      • #12
                        Another vote for Picard's CalculateHsMetrics. It's in the public Galaxy (http://main.g2.bx.psu.edu/ under "NGS: Picard (beta)").

                        Comment


                        • #13
                          Picard by default does +/- 250 bp. I recommend using the same file for baits and targets: you can have baits that extend past targets and hence get more coverage for baits than for target if you had all of your target sequence covered by baits. If you had say only 80% of your targets covered by baits, then you already know this, and it just complicates things to try to consider it again. So, again, I recommend using the actual bait intervals if possible.

                          Comment


                          • #14
                            Originally posted by Heisman View Post
                            Picard by default does +/- 250 bp. I recommend using the same file for baits and targets: you can have baits that extend past targets and hence get more coverage for baits than for target if you had all of your target sequence covered by baits. If you had say only 80% of your targets covered by baits, then you already know this, and it just complicates things to try to consider it again. So, again, I recommend using the actual bait intervals if possible.
                            I want to verify that picard does indeed use a +/- 250 bp around each target/bait, but have so far not been able to find this written down anywhere. Where did you come by this information?

                            Also, can the interval be modified (to, say, +/- 300bp)?

                            Comment


                            • #15
                              Somewhere there is a web page that has the code for each of the programs... I stumbled on it before and am not really a computer guy so don't know where to find it. But it had 250 set for that metric.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
                                by seqadmin


                                ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

                                01-24-2023, 01:19 PM
                              • seqadmin
                                Introduction to Single-Cell Sequencing
                                by seqadmin
                                Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

                                The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
                                ...
                                01-09-2023, 03:10 PM
                              • seqadmin
                                AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
                                by seqadmin
                                Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

                                Read type and length
                                AVITI is a short-read benchtop sequencer that also offers an innovative...
                                12-29-2022, 10:44 AM

                              ad_right_rmr

                              Collapse
                              Working...
                              X