Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • maubp
    replied
    Originally posted by jkbonfield View Post
    This topic has come up recently on the samtools-dev mailing list too, related to using SAM/BAM for denovo assemblies (and so it's very relevant to the 454 output case).
    Yes, we should probably continue this discussion there. Start of thread is here:


    Peter

    Leave a comment:


  • jkbonfield
    replied
    That also means it will disallow use of P for padding within an insertion. Eg

    Code:
    Ref: ATGC---GAG
    Seq: ATGC---GAG     CIGAR 7M
    Seq: ATGCACTGAG     CIGAR 4M 3I 3M
    Seq: ATGCA-TGAG     CIGAR 4M 1I 1P 1I 3M
    Here the correct multiple alignment is achieved using P. So far though use of this has been limited.

    Even without P though there are alignments which are best mixing I and D. Eg:

    Code:
    Ref: ACGTTT-AAA-CCCACGT
    Seq: ACGTTTTAAACCCCACGT   CIGAR 6M 1I 3M 1I 6M
    Seq: ACGTTTT---CCCCACGT   CIGAR 6M 1I 3D 1I 6M
    This topic has come up recently on the samtools-dev mailing list too, related to using SAM/BAM for denovo assemblies (and so it's very relevant to the 454 output case).

    Leave a comment:


  • maubp
    replied
    Originally posted by jkbonfield View Post
    So the Picard warning is incorrect - there aren't two I records next to each other, rather it's I and D being adjacent.
    No, it I think it is correct (*) but not clearly worded.

    Looking at 1H 140M 2I 1M 2I 1D 1I 1D 2M 1I 23M 1D 21M 1I 8M, the bit in bold is between two M operators, and includes two I operators (and two D operators). Thus both "No M or N operator between pair of I operators in CIGAR" applies, and also the other message I've seen from Picard in this context, "No M or N operator between pair of D operators in CIGAR".

    (*) As said earlier, this rule is one of Picard's inventions not in the SAM spec.

    Leave a comment:


  • jkbonfield
    replied
    Originally posted by maubp View Post
    Breaking up the CIGAR string, 1H 140M 2I 1M 2I 1D 1I 1D 2M 1I 23M 1D 21M 1I 8M, it seems Picard unhappy with the 2I 1D 1I 1D. Keeping in mind SAM/BAM is essentially a collection of pairwise alignments, I would have expected something like 3I 2D or perhaps 1I 2M (or 1I 2X with the new style).

    (But as you say, this is still legal in the current wording of the specification - its another example of Picard being extra strict)
    So the Picard warning is incorrect - there aren't two I records next to each other, rather it's I and D being adjacent.

    Also while most aligners that output to SAM/BAM are indeed pairwise, the actual file itself represents multiple alignments at each point so it is natural to assume we may one day be outputting SAM from a genuine multiple alignment tool, or in this case from a tidied up assembly. In such cases we expect such oddities in CIGAR. Indeed they often represent a better alignment.

    Programs complaining about such things (or worse processing them incorrectly and completely losing or changing bases) has been a bug-bear of mine for some time. (Ultimately it's why I ditched samtools and wrote my own code for importing SAM/BAM into Gap5.)

    Leave a comment:


  • litali
    replied
    Thank you all for your help!
    jkbonfield, I got this error "No M or N operator between pair of I operators in CIGAR" in picard for all the reads..

    Leave a comment:


  • maubp
    replied
    Originally posted by jkbonfield View Post
    I'm more intrigued by the error from picard about "Read name F01BJ5E01DP1XH, No M or N operator between pair of I operators in CIGAR". Could you quote this reading? It seems a strange alignment and I cannot see why it would occur that way. Then again there's nothing that explicitly states it as invalid although it defies point 2 of the recommended practice in the SAM spec.

    Certainly the example you gave had a portion with neighbouring I and D operators: 15M1D1I118M. I know this shows up bugs in some programs (samtools, maybe more), although personally I think it's fine and sometimes the right alignment just is that strange looking. Perhaps not with pairwise alignments, but certain these things crop up from multiple alignment.
    I've seen this kind of thing from other Newbler BAM files - it stems from the way they currently use deletion/inserts rather than substitutions.

    As a specific example, here is one "failing" read from my data (in Picard's lenient mode it is just a warning):

    SAM validation error: ERROR: Read name GD0J0NL04IXFMD, No M or N operator between pair of I operators in CIGAR

    Code:
    GD0J0NL04IXFMD  0       contig00018     1       100     1H140M2I1M2I1D1I1D2M1I23M1D21M1I8M      *       0       0       ACGGACTTTCCCAGCAGTCAGCATGGATCAGTCGCAGGCACTCAAGGAGACCGACGAACACCGTCAAATGCGACGAATTGCTTTCGTTGCGGTCGTTGTTTCAACGGTCGCTGTGATTGCATCGGTTGTCACCCTGCCGANGTTTTTCCTACAATTATGTTCAATCCTTCCATCGCATTTGATGGTCGAGACTAGATTACTG      FFFFFFFFFFFFFFFIIIIIIIIIIIIIIIIIIIIIIIIHHHHIIHHHF??==FF88?CCCCCCCFFFFFFFFFFFFFFFFFFFFFF?:::DDBBAAAAADD888FFFFFFFFFFDDDFFFFFFFFFFFFFFFFFFFFFFFFF@@ACDDDD???A6668DDDDBBAACDDCFCAAABBBAAAA6654<=99=:!3,----35
    Breaking up the CIGAR string, 1H 140M 2I 1M 2I 1D 1I 1D 2M 1I 23M 1D 21M 1I 8M, it seems Picard unhappy with the 2I 1D 1I 1D. Keeping in mind SAM/BAM is essentially a collection of pairwise alignments, I would have expected something like 3I 2D or perhaps 1I 2M (or 1I 2X with the new style).

    (But as you say, this is still legal in the current wording of the specification - its another example of Picard being extra strict)
    Last edited by maubp; 09-22-2011, 01:08 AM. Reason: (adding note about spec)

    Leave a comment:


  • jkbonfield
    replied
    Originally posted by litali View Post
    I also attach a few more reads from the first lines of the sam format:
    Code:
    F01BJ5E01DBVMG  0       chrX    138807718       100  ...
    I see the flag in coloumn2 is always 0 or 16, though I have the positions of the mapping,,
    So, do you think it is a problem with the BAM file?
    Thank you alot!!!
    The flags are fine. I suspect one issue is simply the starting coordinate. If your first reads align at base 138807718 and it's been sorted then maybe the programs you are using are simply starting up by showing a blank region of the reference.

    I've found in some cases that it can actually be quite hard to find places with alignments by scrolling around if all you have is some pulldown library. You either need a way to skip ahead to places with alignments or know (by reports or reading the SAM file) the appropriate locations.

    I'm more intrigued by the error from picard about "Read name F01BJ5E01DP1XH, No M or N operator between pair of I operators in CIGAR". Could you quote this reading? It seems a strange alignment and I cannot see why it would occur that way. Then again there's nothing that explicitly states it as invalid although it defies point 2 of the recommended practice in the SAM spec.

    Certainly the example you gave had a portion with neighbouring I and D operators: 15M1D1I118M. I know this shows up bugs in some programs (samtools, maybe more), although personally I think it's fine and sometimes the right alignment just is that strange looking. Perhaps not with pairwise alignments, but certain these things crop up from multiple alignment.

    Leave a comment:


  • maubp
    replied
    Originally posted by litali View Post
    Thank you alot!
    I now can see the akignments in IGV. However, I think I only see the coverage. how can I see the SNPs or the reads themselves? I now see where the reads are mapped, but it is just grey lines with black dots. If I zoom more I only see the bases in the reference but not in the reads themselves
    OK, I guess it was the out of date index from the old version of samtools that was the problem.

    By default IGV shows the reads in grey where they match the reference. I think you can right click to change the colour scheme.

    Alternatively, you might find another viewer more intuitive - I like Tablet http://bioinf.hutton.ac.uk/tablet/ although currently IGV handles inserts in SAM/BAM better.

    Originally posted by litali View Post
    I found hoe to see the bases, thanx!! another question: what does it mean the "cigar" line with a number following it?
    and also is it possible to use noe this sorted bam file to upload it to ucsc and see it in the viewer there?
    The CIGAR string describes how the read maps onto the reference. The viewer will use this information to construct the display. The details (and the complicated FLAG field I was looking at earlier) are in the SAM/BAM specification:

    Leave a comment:


  • litali
    replied
    I found hoe to see the bases, thanx!! another question: what does it mean the "cigar" line with a number following it?
    and also is it possible to use noe this sorted bam file to upload it to ucsc and see it in the viewer there?

    Leave a comment:


  • litali
    replied
    Bam

    Thank you alot!
    I now can see the akignments in IGV. However, I think I only see the coverage. how can I see the SNPs or the reads themselves? I now see where the reads are mapped, but it is just grey lines with black dots. If I zoom more I only see the bases in the reference but not in the reads themselves

    Leave a comment:


  • maubp
    replied
    Apologies, I miss read the FLAG. The mapping (or not) is set in the FLAG as 0x4, but I had the meaning backwards. FLAGs of 0 and 16 are mapped (forward and reverse strand).

    So you should see something on chrX if you zoom in - try the area where those reads are mapped, coordinate 138807718.

    Leave a comment:


  • litali
    replied
    Bam

    ok, i installed the newer version of samtools, so idxstats gives the following:
    chrX 154913754 47196 0
    * 0 0 0

    I also attach a few more reads from the first lines of the sam format:
    Code:
    F01BJ5E01DBVMG  0       chrX    138807718       100     1M1D34M1I2M1I10M1I6M1I1M1I1M1D1M1I81M1I10M1I1M1I26M1I17M        *       0       0       CGCACTCCAGCCTGGGCAACAGAGCAAAACCCTGTAGTACAAAAAAAAAGAAAAAAAGACGGCAGCAGCCAGGGATATGAATTAGGAGTGGGGTGGGTAGAGAGTGAGTGGGGCTGCTGGAGACAATGTTCCCATGGCACTGAACCCTGGTTAAACCAGTCTTTGAGCAAGTACTATCACTTGTCTGTAATTCCTTCTTCC       =922229;=9::<<<====;:8000308686633223000388658003..55:::82020602.........2'''''''0'866600009966000069996600000034.....---3.,.3...783...334...444473......97.......474.......44.......+444....434433----38
    F01BJ5E01C26ZG  0       chrX    138807757       100     3M10D96M1D1I118M1I2M   AAAAAGCAGCAGCAGCCAGGGATATGAATTAGGAGTGGGGTGGGTAGAGAGTGAGTGGGGCTGCTGGAGACAATGTTCCCATGGCACTGACCCTGGTTAGCAGTCTTTGAGCAAGTACTATCACTTGCTGTAATTCCTTCTTCCTCATCCTTTGCTCCTTTTGAATATGATGATTTCTAGGAATGAACCTTCTTTATGACACATGCTGTATATTATTTTGG    D=======ABDDAAA@ACCCCCCC??FA?CCCDEEEDDCCCCCCCCCCCBAAAACCCCCCCCCCC555@CCCCCCCCCCBCCCCAAABCCCCCCCCCCCCCCCCAAACCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC@@@??CCCCCBBBBCB;:444?BBBB:::B;;;;CCB;;;@>
    F01BJ5E01B5WV8  0       chrX    138807757       100     3M10D3M1I77M1D15M1D1I118M1I2M   *       0       0       AAAAAGACAGCAGCAGCCAGGGATATGAATTAGGAGTGGGGTGGGTAGAGAGTGAGTGGGGCTGCTGGAGACAATGTTCCCATGCACTGACCCTGGTTAGCAGTCTTTGAGCAAGTACTATCACTTGCTGTAATTCCTTCTTCCTCATCCTTTGCTCCTTTTGAATATGATGATTTCTAGGAATGAACCTTCTTTATGACACATGCTGTATATTATTTTGG   =922229;<9::::85888888651111.3)86<<=====<99::???<<<>>??=>>==<<<<<<+++<><??????@????<888<>><<<=:;::;=7722277777====<<<??????<<<><888:;;====66566:666:::==>??>>6677====<<<<9;:<<633468.......444234363-00::49;;;;;;A>>>>>?>>>;:
    F01BJ5E01CD9VD  16      chrX    138807757       100     5M8D52M1I49M1I88M1I2M1D26M      *       0       0       AAAAAAAGCAGCAGCAGCCAGGGATATGAATTAGGAGTGGGGTGGGTAGAGAGTGAGGTGGGGCTGCTGGAGACAATGTTCCCATGGCACTGACCCTGGTTAACAGTTCTTTGAGCAAGTACTATCACTTGCTGTAATTCCTTCTTCCTCATCCTTTGCTCCTTTTGAATATGATGATTTCTAGGAATGAACCTTCCTTATGACACATGCTGTATATTATTTGGT       888<>ACCCCAA/95==59999AA>>99<9//--7////<<<770009A>>>>>AA>>>999999AAAA9::::AAAAAB@90033@@?@@@?ACCCCCA;;;;@AAA?@9;;99;;999>>??A;;043>>AA?1;;;;<<@?BBBCCCA@@<<<@A?@@@@@@@A??99<<<<AAACCCCCCCACCCC?@77=<CCCCCCCCAAAACCC@@<<;AAC<<<<<C
    F01BJ5E01DMLRK  0       chrX    138807792       100     3M1I3M1I119M1I2M1D2M1I1D19M1D1M1I6M1I25M        *       0       0       GAACTTACGGAGTGGGGTGGGTAGAGAGTGAGTGGGGCTGCTGGAGACAATGTTCCCATGGCACTGACCCTGGTTAACAGTCTTTGAGCAAGTACTATCACTTGCTGTAATTCCTTCTTCCTCATCCGTTGCCCCTTTTGAATATGATGATTCCTAGGAAATGAACCTTCTTTATGACACATGCTG     ;2222299<777:<:=<:::82003220220022658556.0***080000620202....222822225:<00280000299;======???<<<<====;86000006974...83433:666664444799.............79............2......34444.............
    F01BJ5E01DZ0XI  0       chrX    138807831       100     114M1I40M       *      TGGAGACAATGTTCCCATGGCACTGACCCTGGTTAACAGTCTTTGAGCAAGTACTATCACTTGCTGTAATTCCTTCTTCCTCATCCTTTGCTCCTTTTGAATATGATGATTTCTTAGGAATGAACCTTCTTTATGACACATGCTGTATATTATTT     <8222288:8::8997<<9999?=??????A?<==<?@?:::::::<??=>><>>>22229589802299===>>>=?????@?>>>>=;76666990000000====;;98899;.........9799;;99;;11..06777779944.....
    I see the flag in coloumn2 is always 0 or 16, though I have the positions of the mapping,,
    So, do you think it is a problem with the BAM file?
    Thank you alot!!!

    Leave a comment:


  • maubp
    replied
    Originally posted by litali View Post
    when i try samtools idxstats, I receive command not found. indeed in the help menu of samtools which I installed I have only:
    Program: samtools (Tools for alignments in the SAM format)
    Version: 0.1.6 (r453)
    ...
    That is very out of date, the current release is samtools 0.1.18.

    P.S. Once you have updated samtools, rebuild the index. The newer versions include more information in the index which is used by samtools idxstats and some viewers too.

    P.P.S. You need to leave the spaces out of the [ code ] and the closing tag [ /code ] - I'm using them here otherwise you wouldn't see them, just their formatting effect.

    UPDATE - see end

    However, there does seem to be a problem with this 1st read - despite columns 3 to 5 saying it maps to chrX at position 138807104 with quality 100, the FLAG in column 2 is 16 (0x010 in hex) and that says reverse complemented but not mapped.

    Code:
    F01BJ5E01C2F11 16 chrX 138807104 100 16M1I6M1D3M1I54M1I1D104 ATCTATTATGTATCAAATAAAAATAAATAAAATATTGTTCACTATTTTTTCTAGTGATAACTCAAACTAGAATCCAAAACTGCTAACAATATACTAAGGTCAACTTCAGTGAAAAGTGAATTGGGCCGAGCGCAGAGGCTCACACCTGTAATCCCAGCACTTTGGGAAGCCGAGGCGGGCGGACA ***622....26644...(4......47======>??>6222==9:::;::1111;====5566:===8;111167=:777;;;==;77788=@?????????AABAA????AA??????A@????????><9979<<997779979==<46:1106886:<562222285888::8:9822222;
    The second read is similar, here the flag is just 0 meaning not mapped (and not reverse complemented):
    Code:
    F01BJ5E01DMYWU 0 chrX 138807511 100 12M1I2M1I159M1D50M AAAAAAGAAAAAAGTAGAATTGTTTACTTTCTGTGCACCTTTTCTACTGGGACAGTCTTTCTGAGAAGTGCTTTAAGTGCATGTTAGAAGCCAGGGACATTTAGCCAGGCGTGGTGGCACGTGACTGTAGTCCCAGCTACTCAGGAGGCTGAGGTGGGAGGATCTCTTGAACGGGAAGTCAAGGCTGCAGTGAGCTATGATCATGCCACTGCACTCCAGCCTGGG C@@<<<@@CCCC@@@A@>>>@A43//////6*222--AA-@@@==<000<>---4/64<<<>>////3-<<<<<<<<<<AAA<3304;????:9:ABB444156@CCCCAAACAAAAAAAAAAA@@?AAACCCCCCCCCCCCCCCCCCAAACCA?@@CCCCCCCCCCCCCCCCCCCCCCCC?:77CCB>9999>>92222259AAAAAACCCCCCCCCCCCCCC@
    This looks like a problem with the BAM output from Newbler gsMapper v2.6, but having only looked at two reads it is premature to say that for sure.

    UPDATE - Apologies, I miss read the FLAG. The mapping (or not) is set in the FLAG as bit 0x4, but I had the meaning backwards. FLAGs of 0 and 16 are mapped (forward and reverse strand). So you should see something on chrX if you zoom in.
    Last edited by maubp; 09-21-2011, 11:10 PM. Reason: Added P.S. and examining 2 reads; correction

    Leave a comment:


  • litali
    replied
    samtools, igv and more

    when i try samtools idxstats, I receive command not found. indeed in the help menu of samtools which I installed I have only:
    Program: samtools (Tools for alignments in the SAM format)
    Version: 0.1.6 (r453)

    Usage: samtools <command> [options]

    Command: view SAM<->BAM conversion
    sort sort alignment file
    pileup generate pileup output
    faidx index/extract FASTA
    tview text alignment viewer
    index index alignment
    fixmate fix mate information
    glfview print GLFv3 file
    flagstat simple stats
    calmd recalculate MD/NM tags and '=' bases
    merge merge sorted alignments (Picard recommended)
    rmdup remove PCR duplicates (Picard recommended)
    maybe it is because this is an old version?
    However, when I look at the first lines of the file as you suggested, I receive:
    [ code ]
    F01BJ5E01C2F11 16 chrX 138807104 100 16M1I6M1D3M1I54M1I1D104ATCTATTATGTATCAAATAAAAATAAATAAAATATTGTTCACTATTTTTTCTAGTGATAACTCAAACTAGAATCCAAAACTGCTAACAATATACTAAAGGTCAACTTCAGTGAAAAGTGAATTGGGCCGAGCGCAGAGGCTCACACCTGTAATCCCAGCACTTTGGGAAGCCGAGGCGGGCGGACA ***622....26644...(4......47======>??>6222==9:::;::1111;====5566:===8;111167=:777;;;==;77788=@?????????AABAA????AA??????A@????????><9979<<997779979==<46:1106886:<562222285888::8:9822222;
    F01BJ5E01DMYWU 0 chrX 138807511 100 12M1I2M1I159M1D50M A AAAAAGAAAAAAGTAGAATTGTTTACTTTCTGTGCACCTTTTCTACTGGGACAGTCTTTCTGAGAAGTGCTTTAAGTGCATGTTAGAAGCCAGGGACATTTAGCCAGGCGTGGTGGCACGTGACTGTAGTCCCAGCTACTCAGGAGGCTGAGGTGGGAGGATCTCTTGAACGGGAAGTCAAGGCTGCAGTGAGCTATGATCATGCCACTGCACTCCAGCCTGGG C@@<<<@@CCCC@@@A@>>>@A43//////6*222--AA-@@@==<000<>---4/64<<<>>////3-<<<<<<<<<<AAA<3304;????:9:ABB444156@CCCCAAACAAAAAAAAAAA@@?AAACCCCCCCCCCCCCCCCCCAAACCA?@@CCCCCCCCCCCCCCCCCCCCCCCC?:77CCB>9999>>92222259AAAAAACCCCCCCCCCCCCCC@
    [ /code]
    etc...
    In the IGV if I upload the bam file only it says @zoom in to see the alignments@ but it is empty...

    Leave a comment:


  • maubp
    replied
    Originally posted by litali View Post
    it says: " loading SAM/BAM index files is not supported ....sorted.bam.bai..Load the SAM/ BAM files directly"
    When I click "ok" it says "xoom in to see the alignments" but even if I soom in, it is empty
    i.e. If you open the BAM file with IGV (or another viewer like Tablet), it should automatically find and use the index file (named *.bam.bai or *.bai).

    It does sound like there is a problem with the BAM file from Newbler - which is why I was suggesting looking at the file with samtools idxstats etc.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Advanced Tools Transforming the Field of Cytogenomics
    by seqadmin


    At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
    09-26-2023, 06:26 AM
  • seqadmin
    How RNA-Seq is Transforming Cancer Studies
    by seqadmin



    Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
    09-07-2023, 11:15 PM
  • seqadmin
    Methods for Investigating the Transcriptome
    by seqadmin




    Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

    Whole Transcriptome RNA-seq
    Whole transcriptome sequencing...
    08-31-2023, 11:07 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:57 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-26-2023, 07:53 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-25-2023, 07:42 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-22-2023, 09:05 AM
0 responses
45 views
0 likes
Last Post seqadmin  
Working...
X