Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • rahilsethi
    replied
    GATK -dcov option???

    I have additional question to raonyguimaraes's post
    Does anyone know in details about GATK -dcov option in UnifiedGenotyper. I tried to look in GATK Manual but could not find much about it other than the following information:
    -dcov [50 for 4x, 200 for >30x WGS or Whole exome]
    in the link:


    Also if not specified what default value this option takes?

    If you anyone knows about it could you please send me the link to the information resource?

    Thanks in advance

    Leave a comment:


  • Jane M
    replied
    Originally posted by fongchun View Post
    I think I might have figured out at least part of your question with regard to the file recal_data.grp. If you look at the GATK methods and workflow page under the "Base Quality Score Recalibrator" section, it shows the recal_data.grp being used as part of the -BQSR parameter:

    Code:
    java -jar GenomeAnalysisTK.jar \
       -T PrintReads \
       -R reference.fasta \
       -I input.bam \
       -BQSR recalibration_report.grp \
       -o output.bam
    \

    Interesting thing is the documentation for the PrintReads program doesn't include the -BQSR parameter...
    Ah, interesting.. I only noticed this information about PrintReads (http://www.broadinstitute.org/gatk/g...ntReads.html):
    java -Xmx2g -jar GenomeAnalysisTK.jar \
    -R ref.fasta \
    -T PrintReads \
    -o output.bam \
    -I input1.bam \
    -I input2.bam \
    --read_filter MappingQualityZero
    I didn't check where you suggested me. And here it's much clearer:
    java -jar GenomeAnalysisTK.jar \
    -T PrintReads \
    -R reference.fasta \
    -I input.bam \
    -BQSR recalibration_report.grp \
    -o output.bam
    The grp file is used and there is an output bam file
    Thanks fongchun!

    Leave a comment:


  • Jane M
    replied
    Originally posted by AJERYC View Post
    I'm not sure if we are running the same version of GATK. For Quality score recalibration I use the following instructions

    java -Xmx16G -jar gatk/GenomeAnalysisTK.jar -I input.marked.realigned.fixed.bam -R hg19/hg19.fa -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv -knownSites:dbsnp,VCF dbsnp135.hg19.vcf

    java -Xmx16G -jar gatk/GenomeAnalysisTK.jar \-l INFO \-R hg19.fa \-I input.marked.realigned.fixed.bam \-T TableRecalibration \--out input.marked.realigned.fixed.recal.bam \-recalFile input.recal_data.csv


    You can see I get 2 files, one is the bam file and the other one is the recal_data (that you get in the first instruction. Maybe you are missing the second instruction and that is why you dont get the bam file.
    The point is that we are not using the same version. You probably have a version before v2.0 and have a version after 2.0. From this 2.0 version, CountCovariates and TableRecalibration do not exist anymore. That's a pity because the process was rather clear. The csv file generated at the CountCovariates step is then used at the TableRecalibration step...
    Last edited by Jane M; 09-07-2012, 02:05 AM.

    Leave a comment:


  • AJERYC
    replied
    Originally posted by Jane M View Post
    Thank you AJERYC!

    Because of some troubles with my version of dbSNP, I haven't managed to run:


    but I am still wondering if I should run the PrintReads step since I only have one bam file and if my recalibrated bam file will be the recal_data.grp file. Any idea?
    I'm not sure if we are running the same version of GATK. For Quality score recalibration I use the following instructions

    java -Xmx16G -jar gatk/GenomeAnalysisTK.jar -I input.marked.realigned.fixed.bam -R hg19/hg19.fa -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv -knownSites:dbsnp,VCF dbsnp135.hg19.vcf

    java -Xmx16G -jar gatk/GenomeAnalysisTK.jar \-l INFO \-R hg19.fa \-I input.marked.realigned.fixed.bam \-T TableRecalibration \--out input.marked.realigned.fixed.recal.bam \-recalFile input.recal_data.csv


    You can see I get 2 files, one is the bam file and the other one is the recal_data (that you get in the first instruction. Maybe you are missing the second instruction and that is why you dont get the bam file.

    Leave a comment:


  • fongchun
    replied
    Originally posted by Jane M View Post
    Not yet...
    I think I might have figured out at least part of your question with regard to the file recal_data.grp. If you look at the GATK methods and workflow page under the "Base Quality Score Recalibrator" section, it shows the recal_data.grp being used as part of the -BQSR parameter:

    Code:
    java -jar GenomeAnalysisTK.jar \
       -T PrintReads \
       -R reference.fasta \
       -I input.bam \
       -BQSR recalibration_report.grp \
       -o output.bam
    \

    Interesting thing is the documentation for the PrintReads program doesn't include the -BQSR parameter...

    Leave a comment:


  • Jane M
    replied
    Originally posted by fongchun View Post
    Did you ever get an answer to these questions? I am running into the same issues with the newer version of GATK.

    Thanks,

    Fong
    Not yet...

    Leave a comment:


  • fongchun
    replied
    What is the option -l INFO? Is it still in use in the new version?
    I guess that -o recal_data.grp is equivalent to -recalFile input.recal_data.csv. Am I right? What is the interest of this file, I don't see when it is used...
    Finally, where is the recalibrated bam file ? There is only .grp output file at the BaseRecalibrator step.

    I am a bit confused by these changes between versions...
    Did you ever get an answer to these questions? I am running into the same issues with the newer version of GATK.

    Thanks,

    Fong

    Leave a comment:


  • Jane M
    replied
    Thank you AJERYC!

    Because of some troubles with my version of dbSNP, I haven't managed to run:
    java -Xmx4g -jar GenomeAnalysisTK.jar \
    -T BaseRecalibrator \
    -I my_reads.bam \
    -R resources/Homo_sapiens_assembly18.fasta \
    -knownSites bundle/hg18/dbsnp_132.hg18.vcf \
    -knownSites another/optional/setOfSitesToMask.vcf \
    -o recal_data.grp
    but I am still wondering if I should run the PrintReads step since I only have one bam file and if my recalibrated bam file will be the recal_data.grp file. Any idea?

    Leave a comment:


  • AJERYC
    replied
    Originally posted by Jane M View Post
    Thank both of you.
    I managed to run by using a different temporary folder :


    My output file is generating finally Thanks !
    But I have this output:



    Did you get also this message?
    Code:
    Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
    Jane
    This error messages means these sequences can not be aligned against the reference genome. That is why you use the VALIDATION_STRINGENCY=LENIENT option so that the programs points you the sequences but dont stop running.

    Leave a comment:


  • vivek_
    replied
    Originally posted by frewise View Post
    Hi, raonyguimaraes, why did you remove genes with multiple variants in your last 2 steps?
    I think its the opposite, he's taking genes with multiple variants into his regions of interest and treating the others with less importance.

    Leave a comment:


  • Jane M
    replied
    One additional question regarding the quality score recalibration: from the v2.0 of GATK, this seems to be performed in one step only.

    Previously, it was:
    java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa --DBSNP dbsnp132.txt -I input.marked.realigned.fixed.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv

    java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa -I input.marked.realigned.fixed.bam -T TableRecalibration --out input.marked.realigned.fixed.recal.bam -recalFile input.recal_data.csv
    but now, it's rather
    java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa -knowSites dbsnp132.txt -I input.marked.realigned.fixed.bam -T BaseRecalibrator -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate --out input.recal_data.csv

    From GATK doc:
    java -Xmx4g -jar GenomeAnalysisTK.jar \
    -T BaseRecalibrator \
    -I my_reads.bam \
    -R resources/Homo_sapiens_assembly18.fasta \
    -knownSites bundle/hg18/dbsnp_132.hg18.vcf \
    -knownSites another/optional/setOfSitesToMask.vcf \
    -o recal_data.grp
    with this additional step only if using several bam files (if I understand well the documentation: "PrintReads can dynamically merge the contents of multiple input BAM files, resulting in merged output sorted in coordinate order" )
    java -Xmx2g -jar GenomeAnalysisTK.jar \
    -R ref.fasta \
    -T PrintReads \
    -o output.bam \
    -I input1.bam \
    -I input2.bam \
    --read_filter MappingQualityZero
    What is the option -l INFO? Is it still in use in the new version?
    I guess that -o recal_data.grp is equivalent to -recalFile input.recal_data.csv. Am I right? What is the interest of this file, I don't see when it is used...
    Finally, where is the recalibrated bam file ? There is only .grp output file at the BaseRecalibrator step.

    I am a bit confused by these changes between versions...

    Leave a comment:


  • Jane M
    replied
    Thank both of you.
    I managed to run by using a different temporary folder :
    java -Djava.io.tmpdir=/mnt/seq3/seq3/LMMC/GAR/GAR_sain -jar /share/apps/picard-tools-1.76/FixMateInformation.jar INPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
    My output file is generating finally Thanks !
    But I have this output:

    [Wed Sep 05 14:53:57 CEST 2012] net.sf.picard.sam.FixMateInformation INPUT=[/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam] OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
    [Wed Sep 05 14:53:57 CEST 2012] Executing as [...]; Java HotSpot(TM) Server VM 1.6.0_26-b03; Picard version: 1.76(1261)
    INFO 2012-09-05 14:53:57 FixMateInformation Sorting input into queryname order.
    Ignoring SAM validation error: ERROR: Record 104416902, Read name HWI-ST584_0081:4:2206:2021:2237#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416903, Read name HWI-ST584_0081:4:1102:17248:85426#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416904, Read name HWI-ST584_0081:4:1102:2000:84379#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416905, Read name HWI-ST584_0081:4:1103:2169:69666#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416906, Read name HWI-ST584_0081:4:1103:8346:98868#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416907, Read name HWI-ST584_0081:4:1104:12598:60315#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416908, Read name HWI-ST584_0081:4:1105:1489:49925#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416909, Read name HWI-ST584_0081:4:1105:16587:151639#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416910, Read name HWI-ST584_0081:4:1105:8354:181686#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416911, Read name HWI-ST584_0081:4:1107:17717:4065#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416912, Read name HWI-ST584_0081:4:1108:4813:156146#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416913, Read name HWI-ST584_0081:4:1108:9333:173330#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416915, Read name HWI-ST584_0081:4:1206:4951:59169#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416916, Read name HWI-ST584_0081:4:2104:7277:38433#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416917, Read name HWI-ST584_0081:4:2106:5867:60527#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416918, Read name HWI-ST584_0081:4:2202:1737:149838#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416919, Read name HWI-ST584_0081:4:2205:16036:86626#AGTCAA, MAPQ should be 0 for unmapped read.
    Ignoring SAM validation error: ERROR: Record 104416920, Read name HWI-ST584_0081:4:2207:1759:176116#AGTCAA, MAPQ should be 0 for unmapped read.
    INFO 2012-09-05 15:13:24 FixMateInformation Sorting by queryname complete.
    INFO 2012-09-05 15:13:24 FixMateInformation Output will be sorted by coordinate
    INFO 2012-09-05 15:13:24 FixMateInformation Traversing query name sorted records and fixing up mate pair information.
    INFO 2012-09-05 15:13:34 FixMateInformation Processed 1,000,000 records. Elapsed time: 00:00:10s. Time for last 1,000,000: 10s. Last read position: chr14:74,489,555
    INFO 2012-09-05 15:13:52 FixMateInformation Processed 2,000,000 records. Elapsed time: 00:00:27s. Time for last 1,000,000: 17s. Last read position: chr4:104,072,163
    INFO 2012-09-05 15:14:05 FixMateInformation Processed 3,000,000 records. Elapsed time: 00:00:41s. Time for last 1,000,000: 13s. Last read position: chr21:31,654,746
    Did you get also this message?
    Code:
    Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
    Jane

    Leave a comment:


  • AJERYC
    replied
    IOException: No space left on device

    Originally posted by Jane M View Post
    Thank you for your help oyvindbusk,
    I changed -T CountCovariates by BaseRecalibrator and -T TableRecalibration by PrintReads.

    Nevertheless, I am stuck before this step, with Picard, when running:
    I got:
    I googled it, it seems to be a problem of space... I don't know if it's because the folder /tmp is too small or if I should add -Xmx4g... On the cluster that I'm using, this -Xmx4g is not working.
    Any idea?
    I have got the same errors and it seems I ran out of space in the hard drive. If you are running a very large exome file (i.e. 100 x exome) you may need a large space for your temporary files. Try to increase your hard drive space and see what happens.

    Leave a comment:


  • oyvindbusk
    replied
    I think I had a similar problem resolved by increasing the size of temp-folder. I would try this first. If it were the memory it would probably say something like "not sufficient memory".

    Ø

    Leave a comment:


  • Jane M
    replied
    Thank you for your help oyvindbusk,
    I changed -T CountCovariates by BaseRecalibrator and -T TableRecalibration by PrintReads.

    Nevertheless, I am stuck before this step, with Picard, when running:

    java -Djava.io.tmpdir=/tmp -jar /share/apps/picard-tools-1.76/FixMateInformation.jar INPUT=$path/$i/$i.realigned.bam OUTPUT=$path/$i/$i.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
    I got:
    [Wed Sep 05 09:59:27 CEST 2012] net.sf.picard.sam.FixMateInformation INPUT=[/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam] OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
    [Wed Sep 05 09:59:27 CEST 2012] Executing as [..]; Java HotSpot(TM) Server VM 1.6.0_26-b03; Picard version: 1.76(1261)
    INFO 2012-09-05 09:59:27 FixMateInformation Sorting input into queryname order.
    [Wed Sep 05 10:15:30 CEST 2012] net.sf.picard.sam.FixMateInformation done. Elapsed time: 16.06 minutes.
    Runtime.totalMemory()=954466304
    FAQ: http://sourceforge.net/apps/mediawik...itle=Main_Page
    Exception in thread "main" net.sf.samtools.util.RuntimeIOException: java.io.IOException: No space left on device
    at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:228)
    at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150)
    at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:147)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
    at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:75)
    Caused by: java.io.IOException: No space left on device
    at java.io.FileOutputStream.write(Native Method)
    at org.xerial.snappy.SnappyOutputStream.writeInt(SnappyOutputStream.java:105)
    at org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:126)
    at org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:100)
    at org.xerial.snappy.SnappyOutputStream.close(SnappyOutputStream.java:137)
    at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:219)
    ... 5 more
    I googled it, it seems to be a problem of space... I don't know if it's because the folder /tmp is too small or if I should add -Xmx4g... On the cluster that I'm using, this -Xmx4g is not working.
    Any idea?
    Last edited by Jane M; 09-05-2012, 12:57 AM.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 02:46 PM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-07-2024, 06:57 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-06-2024, 07:17 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-02-2024, 08:06 AM
0 responses
23 views
0 likes
Last Post seqadmin  
Working...
X