is there a package available to directly analyze BAM files in R?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
BWA/SAMTOOLS SAM-> BAM - invalid CIGAR operation.
I am encountering exactly the same problem as the post from around #72 in this thread,
Originally posted by gcrdb View Post... APIs need to start from BAM files (-bam) , not SAM files(no "-sam" at all). I only have SAM files which from bwa, all I need is to convert SAM to BAM.
got error:
[sam_header_read2] 22 sequences loaded.
[sam_read1] reference '-143963499' is recognized as '*'.
Parse error at line 1: invalid CIGAR operation
Aborted
thanks,
Code:>SRR003966.14 1 1 chr14 -37395245 0 >SRR003966.49 1 1 chr12 -25942795 0 >SRR003966.85 5 14 chr14 +102382726 1 chr3 -24756032 2 chr2 -47795311 2 chr5 -176639213 2 chr3 -149933203 2
Here is the subsequent command line I used to try to convert it to a BAM file, and its accompanying error
Code:-sh-3.1$ ~/bin/samtools import hg18.fa.gz.fai SRR003966-05.sam SRR003966-05.bam [sam_header_read2] 49 sequences loaded. [sam_read1] reference '-37395245' is recognized as '*'. Parse error at line 1: invalid CIGAR operation Aborted
I apologize in advance if this is a newbie question, but I am generally following Heng Li's process as outlined in Short-read Alignment with MAQ and BWA
Any feedback or pointer would be most welcome.
-- bg
Comment
-
Am I doing something wrong ??
I used MAQGene to align sequence fragments of a mutant against the reference sequence. I want to use SAMTools to see the alignment of all the fragments against the ref. But I have a problem with the sam file ...
Code:[root@localhost MaqToSam]# /home/.../SAMtools_svn/samtools/misc/maq2sam-long Version: r439 Usage: maq2sam <in.map> [<readGroup>] [root@localhost MaqToSam]# /home/.../SAMtools_svn/samtools/misc/maq2sam-long fr6_34.map > fr6_34.sam [root@localhost MaqToSam]# ../samtools faidx c_elegans.WS200.dna.fa [root@localhost MaqToSam]# ../samtools view -bS -T c_elegans.WS200.dna.fa -o fr6_34.bam fr6_34.sam [COLOR="Red"][main_samview] fail to open file for reading.[/COLOR] [root@localhost MaqToSam]# ../samtools view fr6_34.sam [COLOR="Red"][bam_header_read] EOF marker is absent. [main_samview] fail to read the header.[/COLOR] [root@localhost MaqToSam]# ../samtools import c_elegans.WS200.dna.fa fr6_34.sam fr6_34.bam [COLOR="Red"][main_samview] fail to open file for reading.[/COLOR]
Last edited by Fred13; 12-02-2009, 06:33 AM.
Comment
-
Parse error at line 1: invalid CIGAR operation
RE: Posts 72 & 74 .
Originally posted by gcrdb View Posthi, I have trouble conveting sam to bam.. I tried both:
samtools import ref .fai in.sam out.bam
got error:
[sam_header_read2] 22 sequences loaded.
[sam_read1] reference '-143963499' is recognized as '*'.
Parse error at line 1: invalid CIGAR operation
Aborted
samtools view -bt ref .fai -o in.sam out.bam
and got similar error:
[sam_header_read2] 22 sequences loaded.
[sam_read1] reference '' is recognized as '*'.
[main_samview] truncated file.
thanks,
This happens when you try to use the
Code:bwa samse
Code:samtools import
I ran into this myself, and was only able to figure it out when I ran bwa in a debugger and traced the behavior.
--bg
Comment
-
eference '163285701' is recognized as '*', sequence and quality are inconsistent
>Originally posted by lh3 View Post@luisczul
>Samtools indicates that the error happens to line 164507. What does >that line look like?As for the second problem, it seems like a bug. You >are using very short reference. Could you send me an example? >Thanks.
>@henry
>Are you generating results with "samse -n"? With -n, the output is NOT >sam. You can tell this from the bwa manual page.
>These three questions are less relevant to samtools. They are mostly related to bwa.
I am having similar problem with converting sam to bam:
[root@master maq_bwa_run]# samtools view -bt hg19.fa.fai -o aln.bam aln.sam
[samopen] SAM header is present: 93 sequences.
[sam_read1] reference '163285701' is recognized as '*'.
Parse warning at line 29354: mapped sequence without CIGAR
Parse error at line 29354: sequence and quality are inconsistent
Aborted
What could happen?
the line 29354 in aln.sam looks like this
0 chr5 163285701 0 0M * 0 0 * XT:A:R NM:i:0 X0:i:100307042 XM:i:0 XO:i:0 XG:i:0 MD:Z:0
~
the lines 29352, 29353, 29354 and 29355 look like this:
ran_melanocytes:853_1092_1366 4 * 0 0 * * 0 0 TTCGCGGGGGATAGACTGTACAATCTAGGCCGGTGGTGTATGGCCCTGT AA>>AA@=>A>A;><>>>;>?<:<:7?<='6<(,))<6/&38&&(531)
ran_melanocytes:853_1093_11 4 * 0 0 * * 0 0 NANTNGNAGGCGNNANNNTNGNCNGACNNNNNNGNGGANANTNTNNNGA -"1-"=-"/-"):8,:-"-"5-"-"-"<-"1-"1-";,,-"-"-"-"-"
0 chr5 163285701 0 0M * 0 0 * XT:A:R NM:i:0 X0:i:100307042 XM:i:0 XO:i:0 XG:i:0 MD:Z:0
ran_melanocytes:853_1093_36 4 * 0 0 * * 0 0 TATAAATTTCGAGGAAATTAGACATATCTCCGTCGTAGGGATCCCCTGG =3:=;=<<<=@/=?7:7;8?><?15A57:?=;;6:=95?:8,,,,,/
~
Thanks
Alexey
Comment
-
I am working with the Blast aligner and therefore I have to convert my blast alignment files to sam format. I think I have found a bug in the third party perl script "blast2sam.pl" which is available with samtools on Sourceforge.
The problem comes from the "aln2cm" subroutine used to build the CIGAR for a query. Apparently, the script switches 'D' with 'I'. For instance, if the CIGAR must be "103M1I2D10M", the script outputs the following "103M1D2I10M".
If you look at the blast2sam.pl script, line 88:
"push(@$cigar, $cmaux->[1] . substr("MDI", $cmaux->[0], 1));"
and replace this line with:
"push(@$cigar, $cmaux->[1] . substr("MID", $cmaux->[0], 1));" (to swith 'I' and 'D' in the "MDI" string)
the problem seems to be fixed.
Note that this problem is only visible when you are working with gapped queries, which leads to a "CIGAR length and query length are inconsistent" error message, and that you have to use the -s option with blast2sam.pl so that the query string can be included in the resulting sam file.
I think a user named corthay made a post some time ago about a similar problem. If other people have faced the same problem, please let me know.
I hope this helps
Best regards
David
Comment
-
Originally posted by torrey View PostI was curious if anyone has found a workaround to this error? It is coming up for me when viewing novoalign alignments. I am using samtools 0.1.6 (r453), and get the following error when loading a samtools converted bam file from novoalign's sam file
samtools: bam_lpileup.c:116: tview_func: Assertion `l == tv->n_pre' failed.
thanks
Comment
-
question about scores of INDEL in pileup output file
I have a question about output file of pileup command.
According to samtools FAQ pagehttp://sourceforge.net/apps/mediawik...pileup_output., column 5 is phred-scaled consensus score and column 6 is phred-scaled SNP score.
For SNP line, it seems to be true that it is phred-scaled.
But for INDEL line, values in column 5,6 seem too large to be phred-scaled. In my pileup output file, these values (column 5, 6 of INDEL line) are larger than 1000 very often. (sometimes larger than 4000.)
Please let me know how these values (column 5,6 of INDEL line) are calculated.
thanks in advance
Comment
-
coordinates off in samtools tview
First, I just want to thank the author of samtools - it is a great package for the community. I especially appreciate the streaming abilities.
I have a odd problem when I use samtools tview to visualize my alignments.
I have found that the coordinates are completely off ! For example if I use this command:
samtools tview my.bowtie.sorted.bam hg18.fa
and I go to chr22, I find that the position of the visualized read alignment is more than 100kb off than the reported read position in the sam output ! However, when I look at chrM, I find the coordinates to work just fine. All chromosomes except chrM have this problem. Is this an fasta indexing issue? I have also separated out the individual chromosome reference sequence and indexed that by itself, but I still get the same problem! If indexing were a problem, then that should have solved it, no?
I know I am using the same reference fa that I used to build the search index in bowtie. I converted my sam to bam, then sorted, and indexed. I don't know what is going on here. Has anyone experienced this? Is there something I am overlooking?
Thank you for reading this and thank you for any help/suggestions.Last edited by NGSfan; 01-11-2010, 01:53 AM.
Comment
-
Another issue I found is after doing a sort:
samtools sort aln.bam aln.sorted
samtools view -H aln.sorted.bam
The header SO: field remains unsorted
@HD VN:1.0 SO:unsorted
I have to use picard's sorter to get a header with SO:sorted.
programs such as GATK complain when this header is not changed.
Comment
-
Hi,
I am trying to find a script able to convert from a output file from the solexa pipeline called 'realign' ( page 9 ) into SAM format. I will keep my search, but in the meantime I thought someone may know about it in this thread.
Or maybe there is a different (better) way of doing this before I write a parser.
The format I goes like this :
TACCATATTTATATTACATTAAATTCCTATATTTAC 14859 1 hs_ref_chr14:21265482 F TACCATATTTATAGTACATTAAATTCCTAGATATAC 14859
where each column means :
sequence
best score
numbers of hits at that score
target: pos
strand
target sequence
next best score
Thank you !Last edited by polivares; 01-14-2010, 08:33 AM.
Comment
-
Originally posted by NGSfan View PostAnother issue I found is after doing a sort:
samtools sort aln.bam aln.sorted
samtools view -H aln.sorted.bam
The header SO: field remains unsorted
@HD VN:1.0 SO:unsorted
I have to use picard's sorter to get a header with SO:sorted.
programs such as GATK complain when this header is not changed.
Ok, there is a solution to this problem:
"BAM files created by samtools which are sorted often don't have the sort order set to 'coordinate' in the BAM header (instead it's marked as 'unsorted'. Because BAMs are binary files, there's no way to easily change the tag. This walker rewrites a BAM file so that the output is identical to the input except that the sort order tag is set to 'coordinate'. "
Comment
Latest Articles
Collapse
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:55 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:55 AM
|
||
Started by seqadmin, 05-30-2024, 03:16 PM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
05-30-2024, 03:16 PM
|
||
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability
by seqadmin
Started by seqadmin, 05-29-2024, 01:32 PM
|
0 responses
29 views
0 likes
|
Last Post
by seqadmin
05-29-2024, 01:32 PM
|
||
Started by seqadmin, 05-24-2024, 07:15 AM
|
0 responses
215 views
0 likes
|
Last Post
by seqadmin
05-24-2024, 07:15 AM
|
Comment