Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • scami
    Member
    • Sep 2010
    • 55

    Bowtie + samtools problem

    Hi guys

    I generated an alignment with bowtie asking for a sam format. Everything went nice and smooth and got the file paired_mapped.sam. After this I wanted to use the samtools package and therefore, after installing, the first thing I did was to use the samtools import command. After elaborating for a couple of hours I received the following error message:

    <code>
    sam_header_line_parse] expected '@XY', got [@ILLUMINA-C3C24B_0047:1:1:1052:1111#0 77 * 0 0 * *0 0 TTTATCTAATAAATGCATCCNTTCCAGAAGTCGGGGTTTGTTGCACGTATTAGCTCTAGAATTACTACGGTTANC bbbbbbbbbbbcbbb`````Ebbbb`````bbbbbbabbbbbbbbbbbbbbbbbbba`bbcbbb`bb`b`BBBBB XM:i:0]
    Hint: The header tags must be tab-separated.
    [sam_header_read2] 33 sequences loaded.
    <\code>



    the first lines of my sam files are

    [code]
    @HD VN:1.0 SO:unsorted
    @SQ SN:chr1 LN:23037639
    @SQ SN:chr1_random LN:568933
    @SQ SN:chr2 LN:18779844
    @SQ SN:chr3 LN:19341862
    @SQ SN:chr3_random LN:1220746
    @SQ SN:chr4 LN:23867706
    @SQ SN:chr4_random LN:76237
    @SQ SN:chr5 LN:25021643
    @SQ SN:chr5_random LN:421237
    @SQ SN:chr6 LN:21508407
    @SQ SN:chr7 LN:21026613
    @SQ SN:chr7_random LN:1447032
    @SQ SN:chr8 LN:22385789
    @SQ SN:chr9 LN:23006712
    @SQ SN:chr9_random LN:487831
    @SQ SN:chr10 LN:18140952
    @SQ SN:chr10_random LN:789605
    @SQ SN:chr11 LN:19818926
    @SQ SN:chr11_random LN:282498
    @SQ SN:chr12 LN:22702307
    @SQ SN:chr12_random LN:1566225
    @SQ SN:chr13 LN:24396255
    @SQ SN:chr13_random LN:3268264
    @SQ SN:chr14 LN:30274277
    @SQ SN:chr15 LN:20304914
    @SQ SN:chr16 LN:22053297
    @SQ SN:chr16_random LN:740079
    @SQ SN:chr17 LN:17126926
    @SQ SN:chr17_random LN:829735
    @SQ SN:chr18 LN:29360087
    @SQ SN:chr18_random LN:5170003
    @SQ SN:chr19 LN:24021853
    @SQ SN:chrUn LN:43154196
    @PG ID:Bowtie VN:0.12.5 CL:"dir"
    @ILLUMINA-C3C24B_0047:1:1:1052:1111#0 77 * 0 0 * * 0 0TTTATCTAATAAATGCATCCNTTCCAGAAGTCGGGGTTTGTTGCACGTATTAGCTCTAGAATTACTACGGTTANC bbbbbbbbbbbcbbb`````Ebbbb`````bbbbbbabbbbbbbbbbbbbbbbbbba`bbcbbb`bb`b`BBBBB XM:i:0
    [\code]



    I do not understand what is going wrong, the field are separated by tabs.
    Any idea please?

    thanks a lot for your help

    S
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    I'd guess there is a missing new line in this bit:

    @ILLUMINA-C3C24B_0047:1:1:1052:1111#0

    Either that or your readnames start with @ which is not allowed in SAM format?

    Comment

    • scami
      Member
      • Sep 2010
      • 55

      #3
      Hi there

      thanks for your reply. I gave a better look at the sam format and the number of "new line" should be correct. You are right when you say that the @ symbol can not be used in the reads lines. At the moment I am removing them, but I don't understand how come bowtie does not do this by its own when output a Sam format. I mean it gets in input Fastaq files which of course have the @symbol, and then when it outputs the sam file does not remove such symbol.... sound a bit wierd.....

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        What version of bowtie are you using? It sounds like a bug if all the read names in the SAM output have a @ at the start.

        Comment

        • scami
          Member
          • Sep 2010
          • 55

          #5
          I am using version 0.12.5. May be I did something wrong with the command (I am just beginning to play around with these software..... and with this topic actually.... ) The command I used is the following:
          ./bowtie-0.12.5/bowtie -v 2 -k 5 --best --fr -p 8 -I 100 -S --solexa1.3-quals ./indexes/riferimento_pinot --12 exp_47_s_1.fastq_bowtie_pe,exp_47_s_2.fastq_bowtie_pe,exp_47_s_3.fastq_bowtie_pe paired_end.map

          is there anything wrong?

          thanks a lot for your help

          Comment

          • maubp
            Peter (Biopython etc)
            • Jul 2009
            • 1544

            #6
            Well bowtie 0.12.5 is several months out of date, currently at 0.12.7, but the release notes don't mention any SAM output bug fixes:


            Could you post the first few reads of your FASTQ files?

            P.S. Use the [ code ] and [ /code ] tags to display it nicely on the forum (but without the spaces I have put round the square brackets). If you use the advanced editor then this can be accessed via the # toolbar button. Your original post tried <code> ... <\code> and [ code ] ... [ \code ] which are both wrong - use the other slash.

            Comment

            • scami
              Member
              • Sep 2010
              • 55

              #7
              Hi,

              I used a script to covert my fastaq files in order to be used with bowtie. I had paired mates in one file and I used the 12 flag in bowtie. In according with the manual the input file should have been in a TAB separated text format:

              Code:
              <r>   Comma-separated list of files containing a mix of unpaired and paired-end reads in Tab-delimited format. Tab-delimited format is a 1-read-per-line format where unpaired reads consist of a read name, sequence and quality string each separated by tabs. A paired-end read consists of a read name, sequnce of the #1 mate, quality values of the #1 mate, sequence of the #2 mate, and quality values of the #2 mate separated by tabs. Quality values can be expressed using any of the scales supported in FASTQ files. Reads may be a mix of different lengths and paired-end and unpaired reads may be intermingled in the same file. If - is specified, bowtie will read the Tab-delimited reads from the "standard in" filehandle.
              Therefore I used a script to convert the fastaq file which generated the following output:

              Code:
              @ILLUMINA-C3C24B_0047:1:1:1052:12086#0/1	TTCCGCGTCCTGACCTCCCCNGTTCAAGTAAGGCAACAACTACATATCCATCCTCTGCGTTAATCCATGTtaant	bbbbbbbbbbbbbbbbbbbaDaaaa```a`bb`bbbbbbbbbbbbbbbbb`bbbbbb_bbbb]bb_cbaaBBBBB	aaatttnggggtnagcaagtaacatacctaaagttgaaacataggnnancnancgagccacannnnannngnnnn	_____]E]]]NNEOO[[ZYV_____________\_________BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
              .... all fields are TAB separated. Considering your observations I think I should modify the script to avoid to output the '@' symbol in the bowtie input file. What do you think?

              thanks a lot

              Comment

              • maubp
                Peter (Biopython etc)
                • Jul 2009
                • 1544

                #8
                Originally posted by scami View Post
                Therefore I used a script to convert the fastaq file which generated the following output:

                ...

                .... all fields are TAB separated. Considering your observations I think I should modify the script to avoid to output the '@' symbol in the bowtie input file. What do you think
                Yes, remove the @ from your tabular output - you are telling Bowtie the read names all start with an @ character, so it is putting this in the SAM output (which is invalid).

                Ideally I think you should also report this as a bug in Bowtie - arguably it should check the readnames don't start with @ when writing SAM format.

                In FASTQ files, the @ is just a record marker - not part of the read name.

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 07-02-2026, 11:08 AM
                0 responses
                11 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                20 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                54 views
                0 reactions
                Last Post SEQadmin2  
                Working...