  • #91
    BAM file not indexed

    When I realigned reads around indels, an error occured:
    HTML Code:
    ERROR MESSAGE: Invalid command line: Cannot process the provided BAM file(s) because they were not indexed.
    The previous 2 steps and the realignment command lines are:
    java -Xmx4g \
      -jar picard-tools-1.57/SortSam.jar \
      SORT_ORDER=coordinate \
      INPUT=MK-5.sam \
      OUTPUT=MK-5.bam \
    java -Xmx4g \
      -jar picard-tools-1.57/MarkDuplicates.jar \
      INPUT=MK-5.bam \
      OUTPUT=MK-5.marked.bam \
      METRICS_FILE=metrics \
    java -Xmx4g -jar GenomeAnalysisTK-1.6-9-g47df7bb/GenomeAnalysisTK.jar \
      -T RealignerTargetCreator \
      -R hg19.fa \
      -o MK-5.bam.list \
      -I MK-5.marked.bam
    java -Xmx4g \
      -jar GenomeAnalysisTK-1.6-9-g47df7bb/GenomeAnalysisTK.jar \
      -I MK-5.marked.bam \
      -R hg19.fa \
      -T IndelRealigner \
      -targetIntervals MK-5.bam.list \
      -o MK-5.marked.realigned.bam
    Could anyone tell me what's wrong with it? Thanks in advance!
    • #92
      Try adding a CREATE_INDEX=true to your markduplicates-command as well, and I think it will work.



      • #93
        Originally posted by oyvindbusk View Post
        Try adding a CREATE_INDEX=true to your markduplicates-command as well, and I think it will work.

        Thank you very much, oyvindbusk. It works.


        • #94
          Hello everybody,

          I am using the pipeline of ulz_peter (thanks ) to perform realignment around indels and recalibration.

          1) I am stuck at the third step of realignment:

          java -jar picard/FixMateInformation.jar INPUT=input.marked.realigned.bam OUTPUT=input_bam.marked.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
          java -jar /share/apps/picard-tools-1.76/FixMateInformation.jar INPUT=$path/$i/$i.realigned.bam OUTPUT=$path/$i/$i.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
          I got this error message:
          ##### ERROR ------------------------------------------------------------------------------------------
          ##### ERROR A USER ERROR has occurred (version 2.0-39-gd091f72): 
          ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
          ##### ERROR Please do not post this error to the GATK forum
          ##### ERROR
          ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
          ##### ERROR Visit our website and forum for extensive documentation and answers to 
          ##### ERROR commonly asked questions
          ##### ERROR
          ##### ERROR MESSAGE: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0
          Before that, I removed the PCR duplicates with samtools and performed the first 2 steps of local realignment near indels:

          /share/apps/samtools-0.1.18/samtools rmdup ${bam}.bak $bam
          java -jar $GA/GenomeAnalysisTK.jar -T RealignerTargetCreator -R $db -o ${bam}.list -I $bam
          java -jar $GA/GenomeAnalysisTK.jar -I $bam -R $db -T IndelRealigner -targetIntervals ${bam}.list -o $path/$i/$i.realigned.bam
          I'am using version 1.76. I really don't know why I got such an error... Does someone see what I am doing wrong?

          2) I haven't run yet the recalibration but I saw, in the picard output:
          ##### ERROR MESSAGE: Walker CountCovariates is no longer available in the GATK; it has been deprecated since version 2.0
          ##### ERROR MESSAGE: Walker TableRecalibration is no longer available in the GATK; it has been deprecated since version 2.0
          Since I'am using v2.0-39-gd091f72, I am wondering how to perform the recalibration in these conditions Do you know why these analysis types have been removed? Should I change my GATK version?

          3) Finally, did I understand well: the recalibration of variant quality score, after variant calling, is rather not working when using a single bam? In this case, I won't perform this recalibration.

          Thank you in advance,
          • #95
            Dear Jane.

            The error message is because the countcovariates-tool of GATK is no longer supported. You have to use BaseRecalibrator and PrintReads tool as described on the GATK page:



            • #96
              Thank you for your help oyvindbusk,
              I changed -T CountCovariates by BaseRecalibrator and -T TableRecalibration by PrintReads.

              Nevertheless, I am stuck before this step, with Picard, when running:

              java -jar /share/apps/picard-tools-1.76/FixMateInformation.jar INPUT=$path/$i/$i.realigned.bam OUTPUT=$path/$i/$i.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
              I got:
              [Wed Sep 05 09:59:27 CEST 2012] net.sf.picard.sam.FixMateInformation INPUT=[/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam] OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
              [Wed Sep 05 09:59:27 CEST 2012] Executing as [..]; Java HotSpot(TM) Server VM 1.6.0_26-b03; Picard version: 1.76(1261)
              INFO 2012-09-05 09:59:27 FixMateInformation Sorting input into queryname order.
              [Wed Sep 05 10:15:30 CEST 2012] net.sf.picard.sam.FixMateInformation done. Elapsed time: 16.06 minutes.
              Exception in thread "main" net.sf.samtools.util.RuntimeIOException: No space left on device
              at net.sf.samtools.util.SortingCollection.spillToDisk(
              at net.sf.samtools.util.SortingCollection.add(
              at net.sf.picard.sam.FixMateInformation.doWork(
              at net.sf.picard.cmdline.CommandLineProgram.instanceMain(
              at net.sf.picard.cmdline.CommandLineProgram.instanceMainWithExit(
              at net.sf.picard.sam.FixMateInformation.main(
              Caused by: No space left on device
              at Method)
              at org.xerial.snappy.SnappyOutputStream.writeInt(
              at org.xerial.snappy.SnappyOutputStream.dump(
              at org.xerial.snappy.SnappyOutputStream.flush(
              at org.xerial.snappy.SnappyOutputStream.close(
              at net.sf.samtools.util.SortingCollection.spillToDisk(
              ... 5 more
              I googled it, it seems to be a problem of space... I don't know if it's because the folder /tmp is too small or if I should add -Xmx4g... On the cluster that I'm using, this -Xmx4g is not working.
              Any idea?
              • #97
                I think I had a similar problem resolved by increasing the size of temp-folder. I would try this first. If it were the memory it would probably say something like "not sufficient memory".



                • #98
                  IOException: No space left on device

                  Originally posted by Jane M View Post
                  Thank you for your help oyvindbusk,
                  I changed -T CountCovariates by BaseRecalibrator and -T TableRecalibration by PrintReads.

                  Nevertheless, I am stuck before this step, with Picard, when running:
                  I got:
                  I googled it, it seems to be a problem of space... I don't know if it's because the folder /tmp is too small or if I should add -Xmx4g... On the cluster that I'm using, this -Xmx4g is not working.
                  Any idea?
                  I have got the same errors and it seems I ran out of space in the hard drive. If you are running a very large exome file (i.e. 100 x exome) you may need a large space for your temporary files. Try to increase your hard drive space and see what happens.


                  • #99
                    Thank both of you.
                    I managed to run by using a different temporary folder :
                    java -jar /share/apps/picard-tools-1.76/FixMateInformation.jar INPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SO=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true
                    My output file is generating finally Thanks !
                    But I have this output:

                    [Wed Sep 05 14:53:57 CEST 2012] net.sf.picard.sam.FixMateInformation INPUT=[/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.bam] OUTPUT=/mnt/seq3/seq3/LMMC/GAR/GAR_sain/GAR_sain.realigned.fixed.bam SORT_ORDER=coordinate VALIDATION_STRINGENCY=LENIENT CREATE_INDEX=true VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_MD5_FILE=false
                    [Wed Sep 05 14:53:57 CEST 2012] Executing as [...]; Java HotSpot(TM) Server VM 1.6.0_26-b03; Picard version: 1.76(1261)
                    INFO 2012-09-05 14:53:57 FixMateInformation Sorting input into queryname order.
                    Ignoring SAM validation error: ERROR: Record 104416902, Read name HWI-ST584_0081:4:2206:2021:2237#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416903, Read name HWI-ST584_0081:4:1102:17248:85426#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416904, Read name HWI-ST584_0081:4:1102:2000:84379#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416905, Read name HWI-ST584_0081:4:1103:2169:69666#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416906, Read name HWI-ST584_0081:4:1103:8346:98868#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416907, Read name HWI-ST584_0081:4:1104:12598:60315#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416908, Read name HWI-ST584_0081:4:1105:1489:49925#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416909, Read name HWI-ST584_0081:4:1105:16587:151639#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416910, Read name HWI-ST584_0081:4:1105:8354:181686#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416911, Read name HWI-ST584_0081:4:1107:17717:4065#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416912, Read name HWI-ST584_0081:4:1108:4813:156146#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416913, Read name HWI-ST584_0081:4:1108:9333:173330#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416915, Read name HWI-ST584_0081:4:1206:4951:59169#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416916, Read name HWI-ST584_0081:4:2104:7277:38433#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416917, Read name HWI-ST584_0081:4:2106:5867:60527#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416918, Read name HWI-ST584_0081:4:2202:1737:149838#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416919, Read name HWI-ST584_0081:4:2205:16036:86626#AGTCAA, MAPQ should be 0 for unmapped read.
                    Ignoring SAM validation error: ERROR: Record 104416920, Read name HWI-ST584_0081:4:2207:1759:176116#AGTCAA, MAPQ should be 0 for unmapped read.
                    INFO 2012-09-05 15:13:24 FixMateInformation Sorting by queryname complete.
                    INFO 2012-09-05 15:13:24 FixMateInformation Output will be sorted by coordinate
                    INFO 2012-09-05 15:13:24 FixMateInformation Traversing query name sorted records and fixing up mate pair information.
                    INFO 2012-09-05 15:13:34 FixMateInformation Processed 1,000,000 records. Elapsed time: 00:00:10s. Time for last 1,000,000: 10s. Last read position: chr14:74,489,555
                    INFO 2012-09-05 15:13:52 FixMateInformation Processed 2,000,000 records. Elapsed time: 00:00:27s. Time for last 1,000,000: 17s. Last read position: chr4:104,072,163
                    INFO 2012-09-05 15:14:05 FixMateInformation Processed 3,000,000 records. Elapsed time: 00:00:41s. Time for last 1,000,000: 13s. Last read position: chr21:31,654,746
                    Did you get also this message?
                    Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.


                    • One additional question regarding the quality score recalibration: from the v2.0 of GATK, this seems to be performed in one step only.

                      Previously, it was:
                      java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa --DBSNP dbsnp132.txt -I input.marked.realigned.fixed.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv

                      java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa -I input.marked.realigned.fixed.bam -T TableRecalibration --out input.marked.realigned.fixed.recal.bam -recalFile input.recal_data.csv
                      but now, it's rather
                      java -Xmx4g -jar GenomeAnalysisTK.jar -l INFO -R hg19.fa -knowSites dbsnp132.txt -I input.marked.realigned.fixed.bam -T BaseRecalibrator -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate --out input.recal_data.csv

                      From GATK doc:
                      java -Xmx4g -jar GenomeAnalysisTK.jar \
                      -T BaseRecalibrator \
                      -I my_reads.bam \
                      -R resources/Homo_sapiens_assembly18.fasta \
                      -knownSites bundle/hg18/dbsnp_132.hg18.vcf \
                      -knownSites another/optional/setOfSitesToMask.vcf \
                      -o recal_data.grp
                      with this additional step only if using several bam files (if I understand well the documentation: "PrintReads can dynamically merge the contents of multiple input BAM files, resulting in merged output sorted in coordinate order" )
                      java -Xmx2g -jar GenomeAnalysisTK.jar \
                      -R ref.fasta \
                      -T PrintReads \
                      -o output.bam \
                      -I input1.bam \
                      -I input2.bam \
                      --read_filter MappingQualityZero
                      What is the option -l INFO? Is it still in use in the new version?
                      I guess that -o recal_data.grp is equivalent to -recalFile input.recal_data.csv. Am I right? What is the interest of this file, I don't see when it is used...
                      Finally, where is the recalibrated bam file ? There is only .grp output file at the BaseRecalibrator step.

                      I am a bit confused by these changes between versions...


                      • Originally posted by frewise View Post
                        Hi, raonyguimaraes, why did you remove genes with multiple variants in your last 2 steps?
                        I think its the opposite, he's taking genes with multiple variants into his regions of interest and treating the others with less importance.


                        • Originally posted by Jane M View Post
                          Thank both of you.
                          I managed to run by using a different temporary folder :

                          My output file is generating finally Thanks !
                          But I have this output:

                          Did you get also this message?
                          Ignoring SAM validation error: ERROR: Record 104416914, Read name HWI-ST584_0081:4:1204:9783:76003#AGTCAA, MAPQ should be 0 for unmapped read.
                          This error messages means these sequences can not be aligned against the reference genome. That is why you use the VALIDATION_STRINGENCY=LENIENT option so that the programs points you the sequences but dont stop running.


                          • Thank you AJERYC!

                            Because of some troubles with my version of dbSNP, I haven't managed to run:
                            java -Xmx4g -jar GenomeAnalysisTK.jar \
                            -T BaseRecalibrator \
                            -I my_reads.bam \
                            -R resources/Homo_sapiens_assembly18.fasta \
                            -knownSites bundle/hg18/dbsnp_132.hg18.vcf \
                            -knownSites another/optional/setOfSitesToMask.vcf \
                            -o recal_data.grp
                            but I am still wondering if I should run the PrintReads step since I only have one bam file and if my recalibrated bam file will be the recal_data.grp file. Any idea?


                            • What is the option -l INFO? Is it still in use in the new version?
                              I guess that -o recal_data.grp is equivalent to -recalFile input.recal_data.csv. Am I right? What is the interest of this file, I don't see when it is used...
                              Finally, where is the recalibrated bam file ? There is only .grp output file at the BaseRecalibrator step.

                              I am a bit confused by these changes between versions...
                              Did you ever get an answer to these questions? I am running into the same issues with the newer version of GATK.




                              • Originally posted by fongchun View Post
                                Did you ever get an answer to these questions? I am running into the same issues with the newer version of GATK.


                                Not yet...


