Hi David,
Did you manage to clean up this error eventually? I'm sitting here with the exact same thing. And in one sample only out of several. I would hope there was some easy way through Picard or samtools, I simply haven't found it.
Cheers,
K
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
kasthuri I found some of the errors in GATK were gone if I "clean" the bam files using:
samtools view -F 0x04 -b in.bam > out.bam
after this I sort, index and mark the duplicates using Picard before proceeding with GATK.
Added later: Did you merge the bam files for a same sample run on different lanes?
What is confusing for me is why the same read_id appears in the lane_1.bam and also in the lane_2.bam files, since this read_id appears in the lane_1.fastq but not in the lane_2.fastq original raw read files.
Something must be wrong in my pipeline, but I've checked it one thousand times and everything seems fine (and moreover, it only occurs in one of the many samples I have processed in the same way).
Leave a comment:
-
I also met a problem with realigner.
I ran bwa+realigner+indelgenotyper, and I got message below during indel genotyper.
##### ERROR MESSAGE: Invalid command line: Argument window_size has a bad value: Read HWUSI-EAS1600R_0008:4:9:17021:5336#0: out of coverage window bounds. Probably window is too small, so increase the value of the window_size argument.
##### ERROR Read length=115; cigar=1M84D114M; start=128243463; end=128243661; window start (after trying to accomodate the read)=128243458; window end=128243657
HWUSI-EAS1600R_0008:4:9:17021:5336#0 99 chr7 128243463 70 1M84D114M
HWUSI-EAS1600R_0008:4:9:17021:5336#0 99 chr7 128243547 60 115M
Did anyone meet this error?
Leave a comment:
-
I found some of the errors in GATK were gone if I "clean" the bam files using:
samtools view -F 0x04 -b in.bam > out.bam
after this I sort, index and mark the duplicates using Picard before proceeding with GATK.
-Kasthuri
Added later: Did you merge the bam files for a same sample run on different lanes?Last edited by kasthuri; 12-29-2011, 07:52 PM.
Leave a comment:
-
Thank you very much for your answer, Jon.
I did not point out that reads are paired end. According to the bam flag, the first and third entries should correspond to the 2nd and 1st end of lane_2, whereas second and fourth entries should correspond to the 1st and 2nd end of lane_1, respectively.
I'm newbie and maybe I am wrong, but it should be not a problem due to the read group values. Even if I assigned them a bit dummy-like, there are no problems in the remaining samples (in which I did the same, and no errors have raised).
I am wondering if the problem is that the sequencer has given the same id to reads from two different lanes. Is it possible? I hope the above has sense and I am no missing some point about what you say.
Many thanks.
Leave a comment:
-
Can you assign more specific read groups to avoid the collisions (Flowcell/lane)? It looks like your read group is really a sample ID. Also looks like some of your PG and RB lines are not separate lines.
Leave a comment:
-
error during GATK indel realigner
Hello,
I've performed an exome alignement (paired end reads) by using bfast match + localalign + postprocess, thereafter I've removed duplicates by Picard and when running the local realignement, during the GATK Indel Religner step I get the following error:
Code:##### ERROR MESSAGE: Error caching SAM record HWUSI-EAS1692_0001:3:55:5381:15775#0, which is usually caused by malformed SAM/BAM files in which multiple identical copies of a read are present.
Code:HWUSI-EAS1692_0001:3:55:5381:15775#0 179 chr1 148354187 0 95M = 148354236 49TAGCATCTTTCACAAAGCTCTCTGTGTTTGAGTACGCACCTTGATCCATAGGCTCACATTTGATCCCAACTGGCGGCTGCTTCTTGGCATTAACT DGFBGGGGGGGGGGGGGGGGGGGBGFFGGGFGDGEGGAGFEGDGGGGFEGEEGBGGGGGEGDBDEDEEDBA??EEA?################## XA:i:3 MD:Z:95 PG:Z:bfast RG:Z:012_t_l1 IH:i:1 NH:i:11 HI:i:1 NM:i:0 MQ:i:0 AS:i:4750 HWUSI-EAS1692_0001:3:55:5381:15775#0 81 chr1 148354236 0 95M = 148568822 214586 AGGCTCACATTTGATCCCAACTGGCGGCTGCTTCTTGGCATTAACTTTGGATTCCCAACCAGTAAATCTTACCAAGATCTGAGTTTCTCCAGGTA @AABAC<CA@>>>=4>=>=3DCDCFEDEGEECDF?DEFGFFEDCEDDDEEEDDGEGFGGGGGEGGGFGFGGDGGGGGGFGGFFGGGGGGGEGGGB XA:i:3 MD:Z:95 PG:Z:bfastRG:Z:012_t_l2 IH:i:1 NH:i:11 HI:i:1 NM:i:0 MQ:i:0 AS:i:4750 HWUSI-EAS1692_0001:3:55:5381:15775#0 115 chr1 148354236 0 95M = 148354187 -49AGGCTCACATTTGATCCCAACTGGCGGCTGCTTCTTGGCATTAACTTTGGATTCCCAACCAGTAAATCTTACCAAGATCTGAGTTTCTCCAGGTA @AABAC<CA@>>>=4>=>=3DCDCFEDEGEECDF?DEFGFFEDCEDDDEEEDDGEGFGGGGGEGGGFGFGGDGGGGGGFGGFFGGGGGGGEGGGB XA:i:3 MD:Z:95 PG:Z:bfast RG:Z:012_t_l1 IH:i:1 NH:i:11 HI:i:1 NM:i:0 MQ:i:0 AS:i:4750 HWUSI-EAS1692_0001:3:55:5381:15775#0 161 chr1 148568822 0 95M = 148354236 -214586 AGTTAATGCCAAGAAGCAGCCGCCAGTTGGGATCAAATGTGAGCCTATGGATCAAGGTGCGTACTCAAACACAGAGAGCTTTGTGAAAGATGCTA ##################?AEE??ABDEEDEDBDGEGGGGGBGEEGEFGGGGDGEFGAGGEGDGFGGGFFGBGGGGGGGGGGGGGGGGGGGBFGD XA:i:3 MD:Z:95 PG:Z:bfastRG:Z:012_t_l2 IH:i:1 NH:i:11 HI:i:1 NM:i:0 MQ:i:0 AS:i:4750
- I've runned the bfast postprocess with the '-a 3 -z' argument, so is it not supposed that it takes only one single alignement for each read?
- anyway, can I somehow say to the GATK to ignore these "conflictive" reads? I've tried with the '--validation_strictness SILENT' but it is still complaining.
Well, I'm pretty jammed with that, any help will be much appreciated. And merry christmas, by the way!
thanks,
david
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:45 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Today, 07:45 AM
|
||
Started by seqadmin, Yesterday, 07:59 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:59 AM
|
||
Newborn Genomic Screening Shows Promise in Reducing Infant Mortality and Hospitalization
by seqadmin
Started by seqadmin, 12-09-2024, 08:22 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
12-09-2024, 08:22 AM
|
||
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
175 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
Leave a comment: