Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • nilshomer
    replied
    Originally posted by tgolubch View Post
    Thanks, Nils, that clears up the first question. I'm still not sure why the other numbers are different.

    Re BFAST - looks interesting. Can BFAST produce SAM output files?
    BFAST outputs to the SAM format, as well as UCSC MAF, GFF, as well as other formats. Also note it includes unmapped reads in the SAM file, and for SOLiD reads it retains the color space information, which is critically important for variant discovery.

    Originally posted by ech View Post
    Does DNAA have any documentation?
    "Release early, Release often" ~ Eric S. Raymond

    So this is the "early" step. Unfortunately there is very little actual documentation (beyond the code) and I hope to get it up soon. PM me if you want examples or if you want to know how use the package to solve your problems.

    Leave a comment:


  • ech
    replied
    DNAA and SAM

    Does DNAA have any documentation?

    Leave a comment:


  • tgolubch
    replied
    Thanks, Nils, that clears up the first question. I'm still not sure why the other numbers are different.

    Re BFAST - looks interesting. Can BFAST produce SAM output files?

    Leave a comment:


  • nilshomer
    replied
    Originally posted by tgolubch View Post
    Does anyone have a sense for why the output of 'samtools flagstat' doesn't match the read numbers in the original fastq file? Total reads reported is actually the number of reads mapped, according to maq.

    The other numbers (mapped, paired) also seem out. This happens both when converting from maq with maq2sam, and with SSAHA2's sam output. I've compared the output produced by 'maq map' vs samtools flagstat, and the numbers are different.
    I agree with your assessment. It is based on the aligner, since most aligners do not include unmapped reads in their SAM file. This does not happen with my own software BFAST. If you want a more accurate assessment, see the "dbamstats" utility in the DNAA package.

    Nils

    Leave a comment:


  • tgolubch
    replied
    Does anyone have a sense for why the output of 'samtools flagstat' doesn't match the read numbers in the original fastq file? Total reads reported is actually the number of reads mapped, according to maq.

    The other numbers (mapped, paired) also seem out. This happens both when converting from maq with maq2sam, and with SSAHA2's sam output. I've compared the output produced by 'maq map' vs samtools flagstat, and the numbers are different.

    Leave a comment:


  • henry
    replied
    Originally posted by lh3 View Post
    For NGS data analysis, an aligner tends to be successful when it comes with utilities for comprehensive downstream analyses such as reference based assembly, SNP/indel calling and alignment viewer. Eland/GAPipeline, Soap and Maq are such examples. Unfortunately, it is non-trivial to implement all these downstream analyses and implementing these for each aligner would be a waste of time and human resources as well. Mostly we want to separate alignment from the downstream analyses after the alignment. To achieve this, we need a generic alignment format that makes all aligners happy. NovoAlign and Bowtie can output Maq alignment format to take the advantage of Maq downstream data processing. However, Maq format does not really suit the goal. It does not support longer reads nor alignment with more than one indel and it is too specific to Maq. To solve this problem, the 1000Genome Project Committee decided to develop a generic alignment format. And now the first version of specification and implementation have come out.

    The new alignment format, SAM (Sequence Alignment/Map), is the collaborative result of several major genome centres. It eliminates the major defects of Maq format while retaining its advantages. We also migrated and improved various downstream data processing implemented in Maq/Maqview, such as indexing, pileup, viewer and consensus caller. For more information, please check website:



    I hope samtools may help aligner developers to promote their own software: once a program can generate alignment in SAM format, Maq-like downstream analysis will be available right now.
    With SAMtools, can I extract the number of unique aligned reads, the number of aligned reads with one mismatches, the number of aligned reads with two mismatches?
    With SAMtools, can I extract the number of aligned reads in each chromosome?

    Thank you a lot.

    Jing

    Leave a comment:


  • henry
    replied
    Originally posted by lh3 View Post
    @luisczul

    Samtools indicates that the error happens to line 164507. What does that line look like?As for the second problem, it seems like a bug. You are using very short reference. Could you send me an example? Thanks.

    @henry
    Are you generating results with "samse -n"? With -n, the output is NOT sam. You can tell this from the bwa manual page.

    These three questions are less relevant to samtools. They are mostly related to bwa.
    hi, lh3,
    I ran ./bwa samse -n 255 ref.fa tissue.sai tissue.fastq> tissue.sam
    I will try bwa without setting -n.
    Thank you for your help.

    Best

    Jing

    Leave a comment:


  • lh3
    replied
    @luisczul

    Samtools indicates that the error happens to line 164507. What does that line look like?As for the second problem, it seems like a bug. You are using very short reference. Could you send me an example? Thanks.

    @henry
    Are you generating results with "samse -n"? With -n, the output is NOT sam. You can tell this from the bwa manual page.

    These three questions are less relevant to samtools. They are mostly related to bwa.

    Leave a comment:


  • henry
    replied
    Originally posted by luisczul View Post
    I am having a big problem and I would appreciatte the help of anybody,

    Everytime I try to go from the SAM file to the BAM file I get an error similar to this.

    [bwa_aln_core] convert to sequence coordinate... 0.09 sec
    [bwa_aln_core] refine gapped alignments... 0.04 sec
    [bwa_aln_core] print alignments... [sam_header_read2] 7719 sequences loaded.
    [sam_read1] reference '2921' is recognized as '*'.
    Parse error at line 164507: invalid CIGAR character
    Aborted

    THis is how my reference looks like:

    >gi|148727254|ref|NM_015183.1
    GTGCTGAAGTAGAGGTAGTACAGCATGGCTAGACTGTTGTGAGAGGCTCAGAGAAAGCAGAGGGTGAGATGGATGAGTCC
    AGCATTCTAAGACGAAGAGGGCTCCAGAAGGAGCTGAGTCTCCCCAGAAGAGGAAGTTTGATAGATTCCCAGAAGTGGAA
    TTGCTTGGTCAAACGTTGCCGAACAAGCAACCGGAAAAGCTTAATAGGCAATGGGCAGTCACCAGCATTGCCTCGACCAC
    .
    .
    .
    >gi|34303925|ref|NM_152268.2
    GCGCGCTGGCCCGGCACGGCGGTGGTCTTGCGGGAGGCGTGGGCTGGGATTGCGGTGCCTGTGCTTCCCGGTGCCAGGGT
    GTCATGGAAGGGCTGCTG...

    and it keeps going like that....

    My REF_LIST looks like this

    gi|148727254|ref|NM_015183.1 7586 30 80 81
    gi|34303925|ref|NM_152268.2 2364 7740 80 81
    gi|187829199|ref|NM_031284.4 2565 10164 80 81
    gi|57634540|ref|NM_004156.2 1807 12791 80 81


    I created using samtools faidx

    What could be happening? Thanks in advanced, i really need to get this fixed.
    I also ran into the almost the same problem. I'm still stuck with it. I got sam from bwa samse. I created ref_list.fai file using samtools faidx on ref.fa. However errors popped up as follows:

    [sam_header_read2] 24 sequences loaded.
    Parse error at line 1: invalid CIGAR character

    Could anyone help me fix this problem? Thanks a lot.

    The content of my ref_list.fai is as follows:
    gi|224384768|gb|CM000663.1| 249250621 87 70 71
    gi|224384767|gb|CM000664.1| 243199373 252811519 70 71
    gi|224384766|gb|CM000665.1| 198022430 499485256 70 71
    gi|224384765|gb|CM000666.1| 191154276 700336665 70 71
    gi|224384764|gb|CM000667.1| 180915260 894221804 70 71
    gi|224384763|gb|CM000668.1| 171115067 1077721655 70 71
    gi|224384762|gb|CM000669.1| 159138663 1251281310 70 71
    gi|224384761|gb|CM000670.1| 146364022 1412693470 70 71
    gi|224384760|gb|CM000671.1| 141213431 1561148494 70 71
    gi|224384759|gb|CM000672.1| 135534747 1704379348 70 71
    gi|224384758|gb|CM000673.1| 135006516 1841850394 70 71
    gi|224384757|gb|CM000674.1| 133851895 1978785663 70 71
    gi|224384756|gb|CM000675.1| 115169878 2114549816 70 71
    gi|224384755|gb|CM000676.1| 107349540 2231365066 70 71
    gi|224384754|gb|CM000677.1| 102531392 2340248259 70 71
    gi|224384753|gb|CM000678.1| 90354753 2444244474 70 71
    gi|224384752|gb|CM000679.1| 81195210 2535890098 70 71
    gi|224384751|gb|CM000680.1| 78077248 2618245328 70 71
    gi|224384750|gb|CM000681.1| 59128983 2697438054 70 71
    gi|224384749|gb|CM000682.1| 63025520 2757411825 70 71
    gi|224384748|gb|CM000683.1| 48129895 2821337798 70 71
    gi|224384747|gb|CM000684.1| 51304566 2870155351 70 71
    gi|224384746|gb|CM000685.1| 155270560 2922192927 70 71
    gi|224384745|gb|CM000686.1| 59373566 3079681725 70 71

    Leave a comment:


  • luisczul
    replied
    I also got the following error while producing the SAM file

    GAGACTNAGCACNCAACGGA -",883&,,:#/1-"6&2)-":#6=67*#9,9&/0&3)-"3+=4<-"&5
    /mnt/scratch/awadalla/czuldieg/392Tmod/sources/I392T:35_18_1596 4 * 0 0 * * 0 0 NCCGGCATAGCTNTACCNTTAGACATAGCTCCGTCNCAGACNAACCCCA -"6;:<9>7=5<$-"4?>5-"67:&5=;8&6/0=2.7=-"37*3=-"1:
    [bns_coor_pac2real] bug! Coordinate is longer than sequence (4294967295>=4929). Abort!
    Aborted

    Leave a comment:


  • luisczul
    replied
    I am having a big problem and I would appreciatte the help of anybody,

    Everytime I try to go from the SAM file to the BAM file I get an error similar to this.

    [bwa_aln_core] convert to sequence coordinate... 0.09 sec
    [bwa_aln_core] refine gapped alignments... 0.04 sec
    [bwa_aln_core] print alignments... [sam_header_read2] 7719 sequences loaded.
    [sam_read1] reference '2921' is recognized as '*'.
    Parse error at line 164507: invalid CIGAR character
    Aborted

    THis is how my reference looks like:

    >gi|148727254|ref|NM_015183.1
    GTGCTGAAGTAGAGGTAGTACAGCATGGCTAGACTGTTGTGAGAGGCTCAGAGAAAGCAGAGGGTGAGATGGATGAGTCC
    AGCATTCTAAGACGAAGAGGGCTCCAGAAGGAGCTGAGTCTCCCCAGAAGAGGAAGTTTGATAGATTCCCAGAAGTGGAA
    TTGCTTGGTCAAACGTTGCCGAACAAGCAACCGGAAAAGCTTAATAGGCAATGGGCAGTCACCAGCATTGCCTCGACCAC
    .
    .
    .
    >gi|34303925|ref|NM_152268.2
    GCGCGCTGGCCCGGCACGGCGGTGGTCTTGCGGGAGGCGTGGGCTGGGATTGCGGTGCCTGTGCTTCCCGGTGCCAGGGT
    GTCATGGAAGGGCTGCTG...

    and it keeps going like that....

    My REF_LIST looks like this

    gi|148727254|ref|NM_015183.1 7586 30 80 81
    gi|34303925|ref|NM_152268.2 2364 7740 80 81
    gi|187829199|ref|NM_031284.4 2565 10164 80 81
    gi|57634540|ref|NM_004156.2 1807 12791 80 81


    I created using samtools faidx

    What could be happening? Thanks in advanced, i really need to get this fixed.

    Leave a comment:


  • nilshomer
    replied
    Originally posted by samt View Post
    Hi,

    I created a sam file from BWA samse and am trying to import it into sam so I can view and sort alignments. I get the following core dump:

    [sam_header_read2] 1 sequences loaded.
    Parse error at line 262110: sequence and quality are inconsistent
    /local/scratch/1250714001.6494533: line 8: 3753 Aborted (core dumped) ../samtools-0.1.5c_x86_64-linux/./samtools import ref_list.txt 15251.align.sam 15251

    Also, there is no clear specification on this but what should ref_list.txt contain?
    For the ref_list.txt, you can use the .fai file that is created when you index your reference FASTA file (see: samtools faidx).

    As for the SAM format output of BWA, it could be the incorrect ref_list.txt, or the incorrect output of BWA (see lh3, who is the author).

    Leave a comment:


  • samt
    replied
    SAM import core dump

    Hi,

    I created a sam file from BWA samse and am trying to import it into sam so I can view and sort alignments. I get the following core dump:

    [sam_header_read2] 1 sequences loaded.
    Parse error at line 262110: sequence and quality are inconsistent
    /local/scratch/1250714001.6494533: line 8: 3753 Aborted (core dumped) ../samtools-0.1.5c_x86_64-linux/./samtools import ref_list.txt 15251.align.sam 15251

    Also, there is no clear specification on this but what should ref_list.txt contain?

    Leave a comment:


  • ech
    replied
    blat psl files and sam

    is there a converter of blat psl files to sam format?

    Leave a comment:


  • lh3
    replied
    @ech

    You may try picard. Its implementation of merge and rmdup are better.

    Leave a comment:

Latest Articles

Collapse

  • SEQadmin2
    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
    by SEQadmin2


    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

    Here are nine questions we think about, in roughly the order they matter, before...
    06-18-2026, 07:11 AM
  • SEQadmin2
    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
    by SEQadmin2


    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
    ...
    06-02-2026, 10:05 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by SEQadmin2, Yesterday, 11:10 AM
0 responses
8 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-17-2026, 06:09 AM
0 responses
44 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-09-2026, 11:58 AM
0 responses
104 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-05-2026, 10:09 AM
0 responses
125 views
0 reactions
Last Post SEQadmin2  
Working...