Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rogic
    replied
    Hello all,

    one follow-up question regarding the baits file - I downloaded the file from Agilent earray website (SureSelect Human All Exon 50Mb), however it does not contain the strand info which seem to be needed by Picard's CalculateHsMetrics tool. Did the bait files that other people used had the strand info? If yes, where did they get them? If not, how did they circumvent Picard's requirement? Is the strand info actually used by the tool?

    Thanks,
    Sanja

    Leave a comment:


  • chongm
    replied
    % reads on target does not equal % bp on target

    Originally posted by ECO View Post
    Another vote for Picard's CalculateHsMetrics. It's in the public Galaxy (http://main.g2.bx.psu.edu/ under "NGS: Picard (beta)").
    I just wanted to note that Picard's hsmetrics will calculate the percentage of basepairs that map to target regions which isn't the same as the percentage of reads that map to target regions. e.g. if a read sits right on a target region, (correct me if i'm wrong), then picard's hsmetrics (PCT_USABLE_BASES_ON_BAIT) will count 50 bp out of 100 as mapping to the target region, whereas if you are taking % reads mapping to target region, you will count that 1 read as mapping to the target region.

    Leave a comment:


  • chongm
    replied
    Thanks!

    Thank you Jon_Keats! The -H seemed to have done the trick!

    chongm

    Leave a comment:


  • Jon_Keats
    replied
    Big H not little

    samtools view -H

    Not

    samtools view -h

    Leave a comment:


  • bwubb
    replied
    Can you show a bit of your intervals file?

    Some of the header and some of the actual intervals.

    Not sure if that is the issue, but its where I usually start.

    Leave a comment:


  • chongm
    replied
    Has anyone used hsmetrics for Illumina truseq exome targets? I'm having issues running this (even though I'm using the same modified bed file - with header from samtools - for both target and bait intervals).

    Here is what I've done

    samtools view -h sample.bam > header.txt
    awk -F $'\t' 'BEGIN { OFS = FS } {print $1,$2,$3,$6,$4;}' truseq.bed > intervals
    cat header.txt intervals > truseq_intervals
    Then I ran the hsmetrics
    java -Xmx4g -Djava.io.tmpdir=/tmp/ \
    -jar CalculateHsMetrics.jar \
    INPUT=sample.bam \
    OUTPUT=sample.hsmetrics \
    TARGET_INTERVALS=truseq_intervals\
    BAIT_INTERVALS=truseq_intervals
    I run into the following error:
    Exception in thread "main" net.sf.picard.PicardException: Invalid interval record contains 22 fields: MISEQ:23:000000000-A34AH:1:1111:11133:10035 99 chrM 339 60 151M = 469 283 ACACATCTCTGCCAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTTGGCGGTATGCACTTTTAACAGTCACCCCCCAACTAACACATTATTTTCCCCTCCCACTCCCATACTACTAATCTCATCA AAAAABBBDDDDDDEEGGGGGGIIIHIIIIIIIIHIIIIIIHIIIIIIIIIIIIIIIIIIHHIIIIIIIIIIIHHHHEEHIIHHHHHHHHHHHHHHHHFHHGGGGHGGGGGGGGGGGGGGEGGGGGGGGGGGGGEGGGGGGEGGGGGGGGG X0:i:1 X1:i:0 MD:Z:71A79 RG:Z:Exome_2011-002 XG:i:0 AM:i:37 NM:i:1 SM:i:37 XM:i:1XO:i:0 XT:A:U
    at net.sf.picard.util.IntervalList.fromReader(IntervalList.java:209)
    at net.sf.picard.util.IntervalList.fromFile(IntervalList.java:169)
    at net.sf.picard.analysis.directed.CollectTargetedMetrics.doWork(CollectTargetedMetrics.java:99)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.analysis.directed.CalculateHsMetrics.main(CalculateHsMetrics.java:73)
    Can anyone help with this? I'm new to picard.

    Leave a comment:


  • pengchy
    replied
    Originally posted by bwubb View Post
    So this method more or less worked. Picard seems to demand the columns be tab separated so my awk was more like:

    awk -F $'\t' 'BEGIN { OFS = FS } {print $1,$2,$3,$6,$4;}' SureSelect_baits.bed > bi.txt

    Figured Id share.

    I had a question about the header still though, and I expect this is something I just dont understand about the conversion to bam process or something with picard.

    The CalculateHSMetrics still yells at me that interval file needs a header.
    It seems my aligned_reads.bam files are lacking "@HD VN:1.0 SO:coordinate" at the very top. Is this abnormal?

    If I use a bam file that has gone through Picard Read group assignment it does have the @HD etc., but it also will have a @RG line as well.

    So do these interval files need to be made for each sample after RG assignment?
    I think the “@HD VN:1.0 SO:coordinate” information can be added manually to the top of the header.
    But why not "samtools sort" added itself?

    Leave a comment:


  • bwubb
    replied
    Ok this is really frustrating. Still is complaining I do not have header in my interval file. Here is a sample of my file...

    Code:
    @HD	VN:1.0	SO:coordinate
    @SQ	SN:1	LN:249250621
    @SQ	SN:2	LN:243199373
    @SQ	SN:3	LN:198022430
    @SQ	SN:4	LN:191154276
    @SQ	SN:5	LN:180915260
    @SQ	SN:6	LN:171115067
    @SQ	SN:7	LN:159138663
    @SQ	SN:8	LN:146364022
    ...
    
    @SQ	SN:GL000192.1	LN:547496
    @RG	ID:4346_TTAGGC_ID	PL:illumina	PU:TTAGGC	LB:4346_TTAGGC_LB	SM:4346_TTAGGC_SM
    @PG	ID:bwa	PN:bwa	VN:0.5.9-r16
    1	45787138	45787258	+	BI426105800_19859
    1	45787178	45787298	+	BI426105800_19860
    1	45790352	45790472	+	BI426105800_19867
    1	45790392	45790512	+	BI426105800_19868
    1	45791243	45791363	+	BI426105800_19875
    1	45791283	45791403	+	BI426105800_19876
    ...
    Am I doing it wrong? Does the Baits file also need to be in interval format? Thank you.

    Leave a comment:


  • bwubb
    replied
    Originally posted by ECO View Post
    That intervals file is annoying to make... here's how I do it (basically you need to add a SAM header and rearrange some columns):

    Code:
    #Input files to CalculateHsMetrics need SAM header on an interval file ("picard interval file")
    #example here: ftp://ftp.broadinstitute.org/pub/gsa/exampleFiles/thousand_genomes_alpha_redesign.targets.interval_list
    
    #put header from bam file at the top of the BI file above "baits.txt"
    samtools view -H aligned_reads.bam > header.txt
    
    #interval file needs to look like this:
    #1    1104841    1104940    +    target_1
    #1    1105283    1105599    +    target_2
    #1    1105712    1105860    +    target_3
    
    #rearrange columns of baits bed file, and add SAM header 
    awk '{print $1,$2,$3,$6,$4;}' SureSelect_baits.bed > bi.txt
    cat header.txt bi.txt > baits.txt
    So this method more or less worked. Picard seems to demand the columns be tab separated so my awk was more like:

    awk -F $'\t' 'BEGIN { OFS = FS } {print $1,$2,$3,$6,$4;}' SureSelect_baits.bed > bi.txt

    Figured Id share.

    I had a question about the header still though, and I expect this is something I just dont understand about the conversion to bam process or something with picard.

    The CalculateHSMetrics still yells at me that interval file needs a header.
    It seems my aligned_reads.bam files are lacking "@HD VN:1.0 SO:coordinate" at the very top. Is this abnormal?

    If I use a bam file that has gone through Picard Read group assignment it does have the @HD etc., but it also will have a @RG line as well.

    So do these interval files need to be made for each sample after RG assignment?

    Leave a comment:


  • bwubb
    replied
    Oh wonderful. Thank you so much.

    Leave a comment:


  • ECO
    replied
    That intervals file is annoying to make... here's how I do it (basically you need to add a SAM header and rearrange some columns):

    Code:
    #Input files to CalculateHsMetrics need SAM header on an interval file ("picard interval file")
    #example here: ftp://ftp.broadinstitute.org/pub/gsa/exampleFiles/thousand_genomes_alpha_redesign.targets.interval_list
    
    #put header from bam file at the top of the BI file above "baits.txt"
    samtools view -H aligned_reads.bam > header.txt
    
    #interval file needs to look like this:
    #1    1104841    1104940    +    target_1
    #1    1105283    1105599    +    target_2
    #1    1105712    1105860    +    target_3
    
    #rearrange columns of baits bed file, and add SAM header 
    awk '{print $1,$2,$3,$6,$4;}' SureSelect_baits.bed > bi.txt
    cat header.txt bi.txt > baits.txt

    Leave a comment:


  • bwubb
    replied
    Does anyone know what format the INTERVALS files need to be? Im using simple bed files but I keep getting this error

    Exception in thread "main" java.lang.IllegalStateException: Interval list file must contain header.

    I tried adding a Chr Start End header but it doesnt like this either. The simplicity of this is confusing me I guess.

    Thanks.

    Leave a comment:


  • gwilymh
    replied
    Originally posted by Heisman View Post
    Somewhere there is a web page that has the code for each of the programs... I stumbled on it before and am not really a computer guy so don't know where to find it. But it had 250 set for that metric.
    Thank you for your explanation

    Leave a comment:


  • Heisman
    replied
    Somewhere there is a web page that has the code for each of the programs... I stumbled on it before and am not really a computer guy so don't know where to find it. But it had 250 set for that metric.

    Leave a comment:


  • gwilymh
    replied
    Originally posted by Heisman View Post
    Picard by default does +/- 250 bp. I recommend using the same file for baits and targets: you can have baits that extend past targets and hence get more coverage for baits than for target if you had all of your target sequence covered by baits. If you had say only 80% of your targets covered by baits, then you already know this, and it just complicates things to try to consider it again. So, again, I recommend using the actual bait intervals if possible.
    I want to verify that picard does indeed use a +/- 250 bp around each target/bait, but have so far not been able to find this written down anywhere. Where did you come by this information?

    Also, can the interval be modified (to, say, +/- 300bp)?

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 03-27-2024, 06:37 PM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-27-2024, 06:07 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
53 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
69 views
0 likes
Last Post seqadmin  
Working...
X