GATK -dcov option???
I have additional question to raonyguimaraes's post
Does anyone know in details about GATK -dcov option in UnifiedGenotyper. I tried to look in GATK Manual but could not find much about it other than the following information:
-dcov [50 for 4x, 200 for >30x WGS or Whole exome]
in the link:
Also if not specified what default value this option takes?
If you anyone knows about it could you please send me the link to the information resource?
Thanks in advance
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by fongchun View PostI think I might have figured out at least part of your question with regard to the file recal_data.grp. If you look at the GATK methods and workflow page under the "Base Quality Score Recalibrator" section, it shows the recal_data.grp being used as part of the -BQSR parameter:
Code:java -jar GenomeAnalysisTK.jar \ -T PrintReads \ -R reference.fasta \ -I input.bam \ -BQSR recalibration_report.grp \ -o output.bam
Interesting thing is the documentation for the PrintReads program doesn't include the -BQSR parameter...
java -Xmx2g -jar GenomeAnalysisTK.jar \
-R ref.fasta \
-T PrintReads \
-o output.bam \
-I input1.bam \
-I input2.bam \
--read_filter MappingQualityZero
java -jar GenomeAnalysisTK.jar \
-T PrintReads \
-R reference.fasta \
-I input.bam \
-BQSR recalibration_report.grp \
-o output.bam
Thanks fongchun!
Leave a comment:
-
Originally posted by AJERYC View PostI'm not sure if we are running the same version of GATK. For Quality score recalibration I use the following instructions
java -Xmx16G -jar gatk/GenomeAnalysisTK.jar -I input.marked.realigned.fixed.bam -R hg19/hg19.fa -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv -knownSites:dbsnp,VCF dbsnp135.hg19.vcf
java -Xmx16G -jar gatk/GenomeAnalysisTK.jar \-l INFO \-R hg19.fa \-I input.marked.realigned.fixed.bam \-T TableRecalibration \--out input.marked.realigned.fixed.recal.bam \-recalFile input.recal_data.csv
You can see I get 2 files, one is the bam file and the other one is the recal_data (that you get in the first instruction. Maybe you are missing the second instruction and that is why you dont get the bam file.Last edited by Jane M; 09-07-2012, 02:05 AM.
Leave a comment:
-
Originally posted by Jane M View PostThank you AJERYC!
Because of some troubles with my version of dbSNP, I haven't managed to run:
but I am still wondering if I should run the PrintReads step since I only have one bam file and if my recalibrated bam file will be the recal_data.grp file. Any idea?
java -Xmx16G -jar gatk/GenomeAnalysisTK.jar -I input.marked.realigned.fixed.bam -R hg19/hg19.fa -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv -knownSites:dbsnp,VCF dbsnp135.hg19.vcf
java -Xmx16G -jar gatk/GenomeAnalysisTK.jar \-l INFO \-R hg19.fa \-I input.marked.realigned.fixed.bam \-T TableRecalibration \--out input.marked.realigned.fixed.recal.bam \-recalFile input.recal_data.csv
You can see I get 2 files, one is the bam file and the other one is the recal_data (that you get in the first instruction. Maybe you are missing the second instruction and that is why you dont get the bam file.
Leave a comment:
-
Originally posted by Jane M View PostNot yet...
Code:java -jar GenomeAnalysisTK.jar \ -T PrintReads \ -R reference.fasta \ -I input.bam \ -BQSR recalibration_report.grp \ -o output.bam
Interesting thing is the documentation for the PrintReads program doesn't include the -BQSR parameter...
Leave a comment:
-
What is the option -l INFO? Is it still in use in the new version?
I guess that -o recal_data.grp is equivalent to -recalFile input.recal_data.csv. Am I right? What is the interest of this file, I don't see when it is used...
Finally, where is the recalibrated bam file ? There is only .grp output file at the BaseRecalibrator step.
I am a bit confused by these changes between versions...
Thanks,
Fong
Leave a comment:
-
Thank you AJERYC!
Because of some troubles with my version of dbSNP, I haven't managed to run:
java -Xmx4g -jar GenomeAnalysisTK.jar \
-T BaseRecalibrator \
-I my_reads.bam \
-R resources/Homo_sapiens_assembly18.fasta \
-knownSites bundle/hg18/dbsnp_132.hg18.vcf \
-knownSites another/optional/setOfSitesToMask.vcf \
-o recal_data.grp
Leave a comment:
-
Originally posted by Jane M View PostThank both of you.
I managed to run by using a different temporary folder :
My output file is generating finally Thanks !
But I have this output:
Did you get also this message?
Code:Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
Leave a comment:
-
Originally posted by frewise View PostHi, raonyguimaraes, why did you remove genes with multiple variants in your last 2 steps?
Leave a comment:
-
One additional question regarding the quality score recalibration: from the v2.0 of GATK, this seems to be performed in one step only.
Previously, it was:
java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa --DBSNP dbsnp132.txt -I input.marked.realigned.fixed.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv
java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa -I input.marked.realigned.fixed.bam -T TableRecalibration --out input.marked.realigned.fixed.recal.bam -recalFile input.recal_data.csv
java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa -knowSites dbsnp132.txt -I input.marked.realigned.fixed.bam -T BaseRecalibrator -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate --out input.recal_data.csv
From GATK doc:
java -Xmx4g -jar GenomeAnalysisTK.jar \
-T BaseRecalibrator \
-I my_reads.bam \
-R resources/Homo_sapiens_assembly18.fasta \
-knownSites bundle/hg18/dbsnp_132.hg18.vcf \
-knownSites another/optional/setOfSitesToMask.vcf \
-o recal_data.grp
java -Xmx2g -jar GenomeAnalysisTK.jar \
-R ref.fasta \
-T PrintReads \
-o output.bam \
-I input1.bam \
-I input2.bam \
--read_filter MappingQualityZero
I guess that -o recal_data.grp is equivalent to -recalFile input.recal_data.csv. Am I right? What is the interest of this file, I don't see when it is used...
Finally, where is the recalibrated bam file ? There is only .grp output file at the BaseRecalibrator step.
I am a bit confused by these changes between versions...
Leave a comment:
-
Thank both of you.
I managed to run by using a different temporary folder :
java -Djava.io.tmpdir=/mnt/seq3/seq3/LMMC/GAR/GAR_sain -jar /share/apps/picard-tools-1.76/FixMateInformation.jar INPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
But I have this output:
[Wed Sep 05 14:53:57 CEST 2012] net.sf.picard.sam.FixMateInformation INPUT=[/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam] OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
[Wed Sep 05 14:53:57 CEST 2012] Executing as [...]; Java HotSpot(TM) Server VM 1.6.0_26-b03; Picard version: 1.76(1261)
INFO 2012-09-05 14:53:57 FixMateInformation Sorting input into queryname order.
Ignoring SAM validation error: ERROR: Record 104416902, Read name HWI-ST584_0081:4:2206:2021:2237#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416903, Read name HWI-ST584_0081:4:1102:17248:85426#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416904, Read name HWI-ST584_0081:4:1102:2000:84379#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416905, Read name HWI-ST584_0081:4:1103:2169:69666#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416906, Read name HWI-ST584_0081:4:1103:8346:98868#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416907, Read name HWI-ST584_0081:4:1104:12598:60315#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416908, Read name HWI-ST584_0081:4:1105:1489:49925#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416909, Read name HWI-ST584_0081:4:1105:16587:151639#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416910, Read name HWI-ST584_0081:4:1105:8354:181686#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416911, Read name HWI-ST584_0081:4:1107:17717:4065#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416912, Read name HWI-ST584_0081:4:1108:4813:156146#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416913, Read name HWI-ST584_0081:4:1108:9333:173330#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416915, Read name HWI-ST584_0081:4:1206:4951:59169#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416916, Read name HWI-ST584_0081:4:2104:7277:38433#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416917, Read name HWI-ST584_0081:4:2106:5867:60527#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416918, Read name HWI-ST584_0081:4:2202:1737:149838#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416919, Read name HWI-ST584_0081:4:2205:16036:86626#AGTCAA, MAPQ should be 0 for unmapped read.
Ignoring SAM validation error: ERROR: Record 104416920, Read name HWI-ST584_0081:4:2207:1759:176116#AGTCAA, MAPQ should be 0 for unmapped read.
INFO 2012-09-05 15:13:24 FixMateInformation Sorting by queryname complete.
INFO 2012-09-05 15:13:24 FixMateInformation Output will be sorted by coordinate
INFO 2012-09-05 15:13:24 FixMateInformation Traversing query name sorted records and fixing up mate pair information.
INFO 2012-09-05 15:13:34 FixMateInformation Processed 1,000,000 records. Elapsed time: 00:00:10s. Time for last 1,000,000: 10s. Last read position: chr14:74,489,555
INFO 2012-09-05 15:13:52 FixMateInformation Processed 2,000,000 records. Elapsed time: 00:00:27s. Time for last 1,000,000: 17s. Last read position: chr4:104,072,163
INFO 2012-09-05 15:14:05 FixMateInformation Processed 3,000,000 records. Elapsed time: 00:00:41s. Time for last 1,000,000: 13s. Last read position: chr21:31,654,746
Code:Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
Leave a comment:
-
IOException: No space left on device
Originally posted by Jane M View PostThank you for your help oyvindbusk,
I changed -T CountCovariates by BaseRecalibrator and -T TableRecalibration by PrintReads.
Nevertheless, I am stuck before this step, with Picard, when running:
I got:
I googled it, it seems to be a problem of space... I don't know if it's because the folder /tmp is too small or if I should add -Xmx4g... On the cluster that I'm using, this -Xmx4g is not working.
Any idea?
Leave a comment:
-
I think I had a similar problem resolved by increasing the size of temp-folder. I would try this first. If it were the memory it would probably say something like "not sufficient memory".
Ø
Leave a comment:
-
Thank you for your help oyvindbusk,
I changed -T CountCovariates by BaseRecalibrator and -T TableRecalibration by PrintReads.
Nevertheless, I am stuck before this step, with Picard, when running:
java -Djava.io.tmpdir=/tmp -jar /share/apps/picard-tools-1.76/FixMateInformation.jar INPUT=$path/$i/$i.realigned.bam OUTPUT=$path/$i/$i.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
[Wed Sep 05 09:59:27 CEST 2012] net.sf.picard.sam.FixMateInformation INPUT=[/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam] OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
[Wed Sep 05 09:59:27 CEST 2012] Executing as [..]; Java HotSpot(TM) Server VM 1.6.0_26-b03; Picard version: 1.76(1261)
INFO 2012-09-05 09:59:27 FixMateInformation Sorting input into queryname order.
[Wed Sep 05 10:15:30 CEST 2012] net.sf.picard.sam.FixMateInformation done. Elapsed time: 16.06 minutes.
Runtime.totalMemory()=954466304
FAQ: http://sourceforge.net/apps/mediawik...itle=Main_Page
Exception in thread "main" net.sf.samtools.util.RuntimeIOException: java.io.IOException: No space left on device
at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:228)
at net.sf.samtools.util.SortingCollection.add(SortingCollection.java:150)
at net.sf.picard.sam.FixMateInformation.doWork(FixMateInformation.java:147)
at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(CommandLineProgram.java:119)
at net.sf.picard.sam.FixMateInformation.main(FixMateInformation.java:75)
Caused by: java.io.IOException: No space left on device
at java.io.FileOutputStream.write(Native Method)
at org.xerial.snappy.SnappyOutputStream.writeInt(SnappyOutputStream.java:105)
at org.xerial.snappy.SnappyOutputStream.dump(SnappyOutputStream.java:126)
at org.xerial.snappy.SnappyOutputStream.flush(SnappyOutputStream.java:100)
at org.xerial.snappy.SnappyOutputStream.close(SnappyOutputStream.java:137)
at net.sf.samtools.util.SortingCollection.spillToDisk(SortingCollection.java:219)
... 5 more
Any idea?Last edited by Jane M; 09-05-2012, 12:57 AM.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 02:46 PM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Yesterday, 02:46 PM
|
||
Started by seqadmin, 05-07-2024, 06:57 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
05-07-2024, 06:57 AM
|
||
Started by seqadmin, 05-06-2024, 07:17 AM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
05-06-2024, 07:17 AM
|
||
Started by seqadmin, 05-02-2024, 08:06 AM
|
0 responses
23 views
0 likes
|
Last Post
by seqadmin
05-02-2024, 08:06 AM
|
Leave a comment: