Announcement

Collapse
No announcement yet.

error during GATK indel realigner

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • error during GATK indel realigner

    Hello,

    I've performed an exome alignement (paired end reads) by using bfast match + localalign + postprocess, thereafter I've removed duplicates by Picard and when running the local realignement, during the GATK Indel Religner step I get the following error:

    Code:
    ##### ERROR MESSAGE: Error caching SAM record HWUSI-EAS1692_0001:3:55:5381:15775#0, which is usually caused by malformed SAM/BAM files in which multiple identical copies of a read are present.
    This is how the bam file looks:

    Code:
    HWUSI-EAS1692_0001:3:55:5381:15775#0	179	chr1	148354187	0	95M	=	148354236	49TAGCATCTTTCACAAAGCTCTCTGTGTTTGAGTACGCACCTTGATCCATAGGCTCACATTTGATCCCAACTGGCGGCTGCTTCTTGGCATTAACT	DGFBGGGGGGGGGGGGGGGGGGGBGFFGGGFGDGEGGAGFEGDGGGGFEGEEGBGGGGGEGDBDEDEEDBA??EEA?##################	XA:i:3	MD:Z:95	PG:Z:bfast	RG:Z:012_t_l1	IH:i:1	NH:i:11	HI:i:1	NM:i:0	MQ:i:0	AS:i:4750
    HWUSI-EAS1692_0001:3:55:5381:15775#0	81	chr1	148354236	0	95M	=	148568822	214586	AGGCTCACATTTGATCCCAACTGGCGGCTGCTTCTTGGCATTAACTTTGGATTCCCAACCAGTAAATCTTACCAAGATCTGAGTTTCTCCAGGTA	@AABAC<[email protected]>>>=4>=>=3DCDCFEDEGEECDF?DEFGFFEDCEDDDEEEDDGEGFGGGGGEGGGFGFGGDGGGGGGFGGFFGGGGGGGEGGGB	XA:i:3	MD:Z:95	PG:Z:bfastRG:Z:012_t_l2	IH:i:1	NH:i:11	HI:i:1	NM:i:0	MQ:i:0	AS:i:4750
    HWUSI-EAS1692_0001:3:55:5381:15775#0	115	chr1	148354236	0	95M	=	148354187	-49AGGCTCACATTTGATCCCAACTGGCGGCTGCTTCTTGGCATTAACTTTGGATTCCCAACCAGTAAATCTTACCAAGATCTGAGTTTCTCCAGGTA	@AABAC<[email protected]>>>=4>=>=3DCDCFEDEGEECDF?DEFGFFEDCEDDDEEEDDGEGFGGGGGEGGGFGFGGDGGGGGGFGGFFGGGGGGGEGGGB	XA:i:3	MD:Z:95	PG:Z:bfast	RG:Z:012_t_l1	IH:i:1	NH:i:11	HI:i:1	NM:i:0	MQ:i:0	AS:i:4750
    HWUSI-EAS1692_0001:3:55:5381:15775#0	161	chr1	148568822	0	95M	=	148354236	-214586	AGTTAATGCCAAGAAGCAGCCGCCAGTTGGGATCAAATGTGAGCCTATGGATCAAGGTGCGTACTCAAACACAGAGAGCTTTGTGAAAGATGCTA	##################?AEE??ABDEEDEDBDGEGGGGGBGEEGEFGGGGDGEFGAGGEGDGFGGGFFGBGGGGGGGGGGGGGGGGGGGBFGD	XA:i:3	MD:Z:95	PG:Z:bfastRG:Z:012_t_l2	IH:i:1	NH:i:11	HI:i:1	NM:i:0	MQ:i:0	AS:i:4750
    So I guess the GATK it's right. My question is:

    - I've runned the bfast postprocess with the '-a 3 -z' argument, so is it not supposed that it takes only one single alignement for each read?

    - anyway, can I somehow say to the GATK to ignore these "conflictive" reads? I've tried with the '--validation_strictness SILENT' but it is still complaining.

    Well, I'm pretty jammed with that, any help will be much appreciated. And merry christmas, by the way!

    thanks,
    david

  • #2
    Can you assign more specific read groups to avoid the collisions (Flowcell/lane)? It looks like your read group is really a sample ID. Also looks like some of your PG and RB lines are not separate lines.

    Comment


    • #3
      Thank you very much for your answer, Jon.

      I did not point out that reads are paired end. According to the bam flag, the first and third entries should correspond to the 2nd and 1st end of lane_2, whereas second and fourth entries should correspond to the 1st and 2nd end of lane_1, respectively.

      I'm newbie and maybe I am wrong, but it should be not a problem due to the read group values. Even if I assigned them a bit dummy-like, there are no problems in the remaining samples (in which I did the same, and no errors have raised).

      I am wondering if the problem is that the sequencer has given the same id to reads from two different lanes. Is it possible? I hope the above has sense and I am no missing some point about what you say.

      Many thanks.

      Comment


      • #4
        I found some of the errors in GATK were gone if I "clean" the bam files using:

        samtools view -F 0x04 -b in.bam > out.bam

        after this I sort, index and mark the duplicates using Picard before proceeding with GATK.

        -Kasthuri

        Added later: Did you merge the bam files for a same sample run on different lanes?
        Last edited by kasthuri; 12-29-2011, 07:52 PM.

        Comment


        • #5
          I also met a problem with realigner.
          I ran bwa+realigner+indelgenotyper, and I got message below during indel genotyper.
          ##### ERROR MESSAGE: Invalid command line: Argument window_size has a bad value: Read HWUSI-EAS1600R_0008:4:9:17021:5336#0: out of coverage window bounds. Probably window is too small, so increase the value of the window_size argument.
          ##### ERROR Read length=115; cigar=1M84D114M; start=128243463; end=128243661; window start (after trying to accomodate the read)=128243458; window end=128243657
          So I found the reads in bam file in realign output
          HWUSI-EAS1600R_0008:4:9:17021:5336#0 99 chr7 128243463 70 1M84D114M
          While I check the same reads in bwa output ,I found
          HWUSI-EAS1600R_0008:4:9:17021:5336#0 99 chr7 128243547 60 115M
          It seems the realigner put a wrong deletion in it.
          Did anyone meet this error?

          Comment


          • #6
            kasthuri I found some of the errors in GATK were gone if I "clean" the bam files using:

            samtools view -F 0x04 -b in.bam > out.bam

            after this I sort, index and mark the duplicates using Picard before proceeding with GATK.
            In this case, the 0x04 flag is not useful, since it labels unmapped reads. In my case, the same read is mapped to several (two) positions.

            Added later: Did you merge the bam files for a same sample run on different lanes?
            I have merged two bam files, each corresponding to a different lane. Each of these bam files have been obtained by bfast alignement and then tagged as corresponding by picard_add_groups.

            What is confusing for me is why the same read_id appears in the lane_1.bam and also in the lane_2.bam files, since this read_id appears in the lane_1.fastq but not in the lane_2.fastq original raw read files.
            Something must be wrong in my pipeline, but I've checked it one thousand times and everything seems fine (and moreover, it only occurs in one of the many samples I have processed in the same way).

            Comment


            • #7
              Hi David,

              Did you manage to clean up this error eventually? I'm sitting here with the exact same thing. And in one sample only out of several. I would hope there was some easy way through Picard or samtools, I simply haven't found it.

              Cheers,
              K

              Comment

              Working...
              X