Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Heisman View Post
    Somewhere there is a web page that has the code for each of the programs... I stumbled on it before and am not really a computer guy so don't know where to find it. But it had 250 set for that metric.
    Thank you for your explanation

    Comment


    • #17
      Does anyone know what format the INTERVALS files need to be? Im using simple bed files but I keep getting this error

      Exception in thread "main" java.lang.IllegalStateException: Interval list file must contain header.

      I tried adding a Chr Start End header but it doesnt like this either. The simplicity of this is confusing me I guess.

      Thanks.

      Comment


      • #18
        That intervals file is annoying to make... here's how I do it (basically you need to add a SAM header and rearrange some columns):

        Code:
        #Input files to CalculateHsMetrics need SAM header on an interval file ("picard interval file")
        #example here: ftp://ftp.broadinstitute.org/pub/gsa/exampleFiles/thousand_genomes_alpha_redesign.targets.interval_list
        
        #put header from bam file at the top of the BI file above "baits.txt"
        samtools view -H aligned_reads.bam > header.txt
        
        #interval file needs to look like this:
        #1    1104841    1104940    +    target_1
        #1    1105283    1105599    +    target_2
        #1    1105712    1105860    +    target_3
        
        #rearrange columns of baits bed file, and add SAM header 
        awk '{print $1,$2,$3,$6,$4;}' SureSelect_baits.bed > bi.txt
        cat header.txt bi.txt > baits.txt

        Comment


        • #19
          Oh wonderful. Thank you so much.

          Comment


          • #20
            Originally posted by ECO View Post
            That intervals file is annoying to make... here's how I do it (basically you need to add a SAM header and rearrange some columns):

            Code:
            #Input files to CalculateHsMetrics need SAM header on an interval file ("picard interval file")
            #example here: ftp://ftp.broadinstitute.org/pub/gsa/exampleFiles/thousand_genomes_alpha_redesign.targets.interval_list
            
            #put header from bam file at the top of the BI file above "baits.txt"
            samtools view -H aligned_reads.bam > header.txt
            
            #interval file needs to look like this:
            #1    1104841    1104940    +    target_1
            #1    1105283    1105599    +    target_2
            #1    1105712    1105860    +    target_3
            
            #rearrange columns of baits bed file, and add SAM header 
            awk '{print $1,$2,$3,$6,$4;}' SureSelect_baits.bed > bi.txt
            cat header.txt bi.txt > baits.txt
            So this method more or less worked. Picard seems to demand the columns be tab separated so my awk was more like:

            awk -F $'\t' 'BEGIN { OFS = FS } {print $1,$2,$3,$6,$4;}' SureSelect_baits.bed > bi.txt

            Figured Id share.

            I had a question about the header still though, and I expect this is something I just dont understand about the conversion to bam process or something with picard.

            The CalculateHSMetrics still yells at me that interval file needs a header.
            It seems my aligned_reads.bam files are lacking "@HD VN:1.0 SO:coordinate" at the very top. Is this abnormal?

            If I use a bam file that has gone through Picard Read group assignment it does have the @HD etc., but it also will have a @RG line as well.

            So do these interval files need to be made for each sample after RG assignment?

            Comment


            • #21
              Ok this is really frustrating. Still is complaining I do not have header in my interval file. Here is a sample of my file...

              Code:
              @HD	VN:1.0	SO:coordinate
              @SQ	SN:1	LN:249250621
              @SQ	SN:2	LN:243199373
              @SQ	SN:3	LN:198022430
              @SQ	SN:4	LN:191154276
              @SQ	SN:5	LN:180915260
              @SQ	SN:6	LN:171115067
              @SQ	SN:7	LN:159138663
              @SQ	SN:8	LN:146364022
              ...
              
              @SQ	SN:GL000192.1	LN:547496
              @RG	ID:4346_TTAGGC_ID	PL:illumina	PU:TTAGGC	LB:4346_TTAGGC_LB	SM:4346_TTAGGC_SM
              @PG	ID:bwa	PN:bwa	VN:0.5.9-r16
              1	45787138	45787258	+	BI426105800_19859
              1	45787178	45787298	+	BI426105800_19860
              1	45790352	45790472	+	BI426105800_19867
              1	45790392	45790512	+	BI426105800_19868
              1	45791243	45791363	+	BI426105800_19875
              1	45791283	45791403	+	BI426105800_19876
              ...
              Am I doing it wrong? Does the Baits file also need to be in interval format? Thank you.

              Comment


              • #22
                Originally posted by bwubb View Post
                So this method more or less worked. Picard seems to demand the columns be tab separated so my awk was more like:

                awk -F $'\t' 'BEGIN { OFS = FS } {print $1,$2,$3,$6,$4;}' SureSelect_baits.bed > bi.txt

                Figured Id share.

                I had a question about the header still though, and I expect this is something I just dont understand about the conversion to bam process or something with picard.

                The CalculateHSMetrics still yells at me that interval file needs a header.
                It seems my aligned_reads.bam files are lacking "@HD VN:1.0 SO:coordinate" at the very top. Is this abnormal?

                If I use a bam file that has gone through Picard Read group assignment it does have the @HD etc., but it also will have a @RG line as well.

                So do these interval files need to be made for each sample after RG assignment?
                I think the “@HD VN:1.0 SO:coordinate” information can be added manually to the top of the header.
                But why not "samtools sort" added itself?

                Comment


                • #23
                  Has anyone used hsmetrics for Illumina truseq exome targets? I'm having issues running this (even though I'm using the same modified bed file - with header from samtools - for both target and bait intervals).

                  Here is what I've done

                  samtools view -h sample.bam > header.txt
                  awk -F $'\t' 'BEGIN { OFS = FS } {print $1,$2,$3,$6,$4;}' truseq.bed > intervals
                  cat header.txt intervals > truseq_intervals
                  Then I ran the hsmetrics
                  java -Xmx4g -Djava.io.tmpdir=/tmp/ \
                  -jar CalculateHsMetrics.jar \
                  INPUT=sample.bam \
                  OUTPUT=sample.hsmetrics \
                  TARGET_INTERVALS=truseq_intervals\
                  BAIT_INTERVALS=truseq_intervals
                  I run into the following error:
                  Exception in thread "main" net.sf.picard.PicardException: Invalid interval record contains 22 fields: MISEQ:23:000000000-A34AH:1:1111:11133:10035 99 chrM 339 60 151M = 469 283 ACACATCTCTGCCAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGGTATGCACTTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATACTACTAATCTCATCA AAAAABBBDDDDDDEEGGGGGGIIIHIIIIIIIIHIIIIIIHIIIIIIIIIIIIIIIIIIHHIIIIIIIIIIIHHHHEEHIIHHHHHHHHHHHHHHHHFHHGGGGHGGGGGGGGGGGGGGEGGGGGGGGGGGGGEGGGGGGEGGGGGGGGG X0:i:1 X1:i:0 MD:Z:71A79 RG:Z:Exome_2011-002 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1XO:i:0 XT:A:U
                  at net.sf.picard.util.IntervalList.fromReader(IntervalList.java:209)
                  at net.sf.picard.util.IntervalList.fromFile(IntervalList.java:169)
                  at net.sf.picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:99)
                  at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
                  at net.sf.picard.analysis.directed.CalculateHsMetrics.main(CalculateHsMetrics.java:73)
                  Can anyone help with this? I'm new to picard.

                  Comment


                  • #24
                    Can you show a bit of your intervals file?

                    Some of the header and some of the actual intervals.

                    Not sure if that is the issue, but its where I usually start.

                    Comment


                    • #25
                      Big H not little

                      samtools view -H

                      Not

                      samtools view -h

                      Comment


                      • #26
                        Thanks!

                        Thank you Jon_Keats! The -H seemed to have done the trick!

                        chongm

                        Comment


                        • #27
                          % reads on target does not equal % bp on target

                          Originally posted by ECO View Post
                          Another vote for Picard's CalculateHsMetrics. It's in the public Galaxy (http://main.g2.bx.psu.edu/ under "NGS: Picard (beta)").
                          I just wanted to note that Picard's hsmetrics will calculate the percentage of basepairs that map to target regions which isn't the same as the percentage of reads that map to target regions. e.g. if a read sits right on a target region, (correct me if i'm wrong), then picard's hsmetrics (PCT_USABLE_BASES_ON_BAIT) will count 50 bp out of 100 as mapping to the target region, whereas if you are taking % reads mapping to target region, you will count that 1 read as mapping to the target region.

                          Comment


                          • #28
                            Hello all,

                            one follow-up question regarding the baits file - I downloaded the file from Agilent earray website (SureSelect Human All Exon 50Mb), however it does not contain the strand info which seem to be needed by Picard's CalculateHsMetrics tool. Did the bait files that other people used had the strand info? If yes, where did they get them? If not, how did they circumvent Picard's requirement? Is the strand info actually used by the tool?

                            Thanks,
                            Sanja

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Techniques and Challenges in Conservation Genomics
                              by seqadmin



                              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                              Avian Conservation
                              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                              03-08-2024, 10:41 AM
                            • seqadmin
                              The Impact of AI in Genomic Medicine
                              by seqadmin



                              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                              02-26-2024, 02:07 PM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-14-2024, 06:13 AM
                            0 responses
                            33 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-08-2024, 08:03 AM
                            0 responses
                            72 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-07-2024, 08:13 AM
                            0 responses
                            80 views
                            0 likes
                            Last Post seqadmin  
                            Started by seqadmin, 03-06-2024, 09:51 AM
                            0 responses
                            68 views
                            0 likes
                            Last Post seqadmin  
                            Working...
                            X