Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • iSNÖ
    replied
    Hi David,

    Did you manage to clean up this error eventually? I'm sitting here with the exact same thing. And in one sample only out of several. I would hope there was some easy way through Picard or samtools, I simply haven't found it.

    Cheers,
    K

    Leave a comment:


  • david.tamborero
    replied
    kasthuri I found some of the errors in GATK were gone if I "clean" the bam files using:

    samtools view -F 0x04 -b in.bam > out.bam

    after this I sort, index and mark the duplicates using Picard before proceeding with GATK.
    In this case, the 0x04 flag is not useful, since it labels unmapped reads. In my case, the same read is mapped to several (two) positions.

    Added later: Did you merge the bam files for a same sample run on different lanes?
    I have merged two bam files, each corresponding to a different lane. Each of these bam files have been obtained by bfast alignement and then tagged as corresponding by picard_add_groups.

    What is confusing for me is why the same read_id appears in the lane_1.bam and also in the lane_2.bam files, since this read_id appears in the lane_1.fastq but not in the lane_2.fastq original raw read files.
    Something must be wrong in my pipeline, but I've checked it one thousand times and everything seems fine (and moreover, it only occurs in one of the many samples I have processed in the same way).

    Leave a comment:


  • YunjieLiu
    replied
    I also met a problem with realigner.
    I ran bwa+realigner+indelgenotyper, and I got message below during indel genotyper.
    ##### ERROR MESSAGE: Invalid command line: Argument window_size has a bad value: Read HWUSI-EAS1600R_0008:4:9:17021:5336#0: out of coverage window bounds. Probably window is too small, so increase the value of the window_size argument.
    ##### ERROR Read length=115; cigar=1M84D114M; start=128243463; end=128243661; window start (after trying to accomodate the read)=128243458; window end=128243657
    So I found the reads in bam file in realign output
    HWUSI-EAS1600R_0008:4:9:17021:5336#0 99 chr7 128243463 70 1M84D114M
    While I check the same reads in bwa output ,I found
    HWUSI-EAS1600R_0008:4:9:17021:5336#0 99 chr7 128243547 60 115M
    It seems the realigner put a wrong deletion in it.
    Did anyone meet this error?

    Leave a comment:


  • kasthuri
    replied
    I found some of the errors in GATK were gone if I "clean" the bam files using:

    samtools view -F 0x04 -b in.bam > out.bam

    after this I sort, index and mark the duplicates using Picard before proceeding with GATK.

    -Kasthuri

    Added later: Did you merge the bam files for a same sample run on different lanes?
    Last edited by kasthuri; 12-29-2011, 07:52 PM.

    Leave a comment:


  • david.tamborero
    replied
    Thank you very much for your answer, Jon.

    I did not point out that reads are paired end. According to the bam flag, the first and third entries should correspond to the 2nd and 1st end of lane_2, whereas second and fourth entries should correspond to the 1st and 2nd end of lane_1, respectively.

    I'm newbie and maybe I am wrong, but it should be not a problem due to the read group values. Even if I assigned them a bit dummy-like, there are no problems in the remaining samples (in which I did the same, and no errors have raised).

    I am wondering if the problem is that the sequencer has given the same id to reads from two different lanes. Is it possible? I hope the above has sense and I am no missing some point about what you say.

    Many thanks.

    Leave a comment:


  • Jon_Keats
    replied
    Can you assign more specific read groups to avoid the collisions (Flowcell/lane)? It looks like your read group is really a sample ID. Also looks like some of your PG and RB lines are not separate lines.

    Leave a comment:


  • david.tamborero
    started a topic error during GATK indel realigner

    error during GATK indel realigner

    Hello,

    I've performed an exome alignement (paired end reads) by using bfast match + localalign + postprocess, thereafter I've removed duplicates by Picard and when running the local realignement, during the GATK Indel Religner step I get the following error:

    Code:
    ##### ERROR MESSAGE: Error caching SAM record HWUSI-EAS1692_0001:3:55:5381:15775#0, which is usually caused by malformed SAM/BAM files in which multiple identical copies of a read are present.
    This is how the bam file looks:

    Code:
    HWUSI-EAS1692_0001:3:55:5381:15775#0	179	chr1	148354187	0	95M	=	148354236	49TAGCATCTTTCACAAAGCTCTCTGTGTTTGAGTACGCACCTTGATCCATAGGCTCACATTTGATCCCAACTGGCGGCTGCTTCTTGGCATTAACT	DGFBGGGGGGGGGGGGGGGGGGGBGFFGGGFGDGEGGAGFEGDGGGGFEGEEGBGGGGGEGDBDEDEEDBA??EEA?##################	XA:i:3	MD:Z:95	PG:Z:bfast	RG:Z:012_t_l1	IH:i:1	NH:i:11	HI:i:1	NM:i:0	MQ:i:0	AS:i:4750
    HWUSI-EAS1692_0001:3:55:5381:15775#0	81	chr1	148354236	0	95M	=	148568822	214586	AGGCTCACATTTGATCCCAACTGGCGGCTGCTTCTTGGCATTAACTTTGGATTCCCAACCAGTAAATCTTACCAAGATCTGAGTTTCTCCAGGTA	@AABAC<CA@>>>=4>=>=3DCDCFEDEGEECDF?DEFGFFEDCEDDDEEEDDGEGFGGGGGEGGGFGFGGDGGGGGGFGGFFGGGGGGGEGGGB	XA:i:3	MD:Z:95	PG:Z:bfastRG:Z:012_t_l2	IH:i:1	NH:i:11	HI:i:1	NM:i:0	MQ:i:0	AS:i:4750
    HWUSI-EAS1692_0001:3:55:5381:15775#0	115	chr1	148354236	0	95M	=	148354187	-49AGGCTCACATTTGATCCCAACTGGCGGCTGCTTCTTGGCATTAACTTTGGATTCCCAACCAGTAAATCTTACCAAGATCTGAGTTTCTCCAGGTA	@AABAC<CA@>>>=4>=>=3DCDCFEDEGEECDF?DEFGFFEDCEDDDEEEDDGEGFGGGGGEGGGFGFGGDGGGGGGFGGFFGGGGGGGEGGGB	XA:i:3	MD:Z:95	PG:Z:bfast	RG:Z:012_t_l1	IH:i:1	NH:i:11	HI:i:1	NM:i:0	MQ:i:0	AS:i:4750
    HWUSI-EAS1692_0001:3:55:5381:15775#0	161	chr1	148568822	0	95M	=	148354236	-214586	AGTTAATGCCAAGAAGCAGCCGCCAGTTGGGATCAAATGTGAGCCTATGGATCAAGGTGCGTACTCAAACACAGAGAGCTTTGTGAAAGATGCTA	##################?AEE??ABDEEDEDBDGEGGGGGBGEEGEFGGGGDGEFGAGGEGDGFGGGFFGBGGGGGGGGGGGGGGGGGGGBFGD	XA:i:3	MD:Z:95	PG:Z:bfastRG:Z:012_t_l2	IH:i:1	NH:i:11	HI:i:1	NM:i:0	MQ:i:0	AS:i:4750
    So I guess the GATK it's right. My question is:

    - I've runned the bfast postprocess with the '-a 3 -z' argument, so is it not supposed that it takes only one single alignement for each read?

    - anyway, can I somehow say to the GATK to ignore these "conflictive" reads? I've tried with the '--validation_strictness SILENT' but it is still complaining.

    Well, I'm pretty jammed with that, any help will be much appreciated. And merry christmas, by the way!

    thanks,
    david

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Technologies
    by seqadmin



    Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

    Long-Read Sequencing
    Long-read sequencing has seen remarkable advancements,...
    12-02-2024, 01:49 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:45 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:59 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-09-2024, 08:22 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-02-2024, 09:29 AM
0 responses
175 views
0 likes
Last Post seqadmin  
Working...
X