Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bowtie + samtools problem

    Hi guys

    I generated an alignment with bowtie asking for a sam format. Everything went nice and smooth and got the file paired_mapped.sam. After this I wanted to use the samtools package and therefore, after installing, the first thing I did was to use the samtools import command. After elaborating for a couple of hours I received the following error message:

    <code>
    sam_header_line_parse] expected '@XY', got [@ILLUMINA-C3C24B_0047:1:1:1052:1111#0 77 * 0 0 * *0 0 TTTATCTAATAAATGCATCCNTTCCAGAAGTCGGGGTTTGTTGCACGTATTAGCTCTAGAATTACTACGGTTANC bbbbbbbbbbbcbbb`````Ebbbb`````bbbbbbabbbbbbbbbbbbbbbbbbba`bbcbbb`bb`b`BBBBB XM:i:0]
    Hint: The header tags must be tab-separated.
    [sam_header_read2] 33 sequences loaded.
    <\code>



    the first lines of my sam files are

    [code]
    @HD VN:1.0 SO:unsorted
    @SQ SN:chr1 LN:23037639
    @SQ SN:chr1_random LN:568933
    @SQ SN:chr2 LN:18779844
    @SQ SN:chr3 LN:19341862
    @SQ SN:chr3_random LN:1220746
    @SQ SN:chr4 LN:23867706
    @SQ SN:chr4_random LN:76237
    @SQ SN:chr5 LN:25021643
    @SQ SN:chr5_random LN:421237
    @SQ SN:chr6 LN:21508407
    @SQ SN:chr7 LN:21026613
    @SQ SN:chr7_random LN:1447032
    @SQ SN:chr8 LN:22385789
    @SQ SN:chr9 LN:23006712
    @SQ SN:chr9_random LN:487831
    @SQ SN:chr10 LN:18140952
    @SQ SN:chr10_random LN:789605
    @SQ SN:chr11 LN:19818926
    @SQ SN:chr11_random LN:282498
    @SQ SN:chr12 LN:22702307
    @SQ SN:chr12_random LN:1566225
    @SQ SN:chr13 LN:24396255
    @SQ SN:chr13_random LN:3268264
    @SQ SN:chr14 LN:30274277
    @SQ SN:chr15 LN:20304914
    @SQ SN:chr16 LN:22053297
    @SQ SN:chr16_random LN:740079
    @SQ SN:chr17 LN:17126926
    @SQ SN:chr17_random LN:829735
    @SQ SN:chr18 LN:29360087
    @SQ SN:chr18_random LN:5170003
    @SQ SN:chr19 LN:24021853
    @SQ SN:chrUn LN:43154196
    @PG ID:Bowtie VN:0.12.5 CL:"dir"
    @ILLUMINA-C3C24B_0047:1:1:1052:1111#0 77 * 0 0 * * 0 0TTTATCTAATAAATGCATCCNTTCCAGAAGTCGGGGTTTGTTGCACGTATTAGCTCTAGAATTACTACGGTTANC bbbbbbbbbbbcbbb`````Ebbbb`````bbbbbbabbbbbbbbbbbbbbbbbbba`bbcbbb`bb`b`BBBBB XM:i:0
    [\code]



    I do not understand what is going wrong, the field are separated by tabs.
    Any idea please?

    thanks a lot for your help

    S

  • #2
    I'd guess there is a missing new line in this bit:

    @ILLUMINA-C3C24B_0047:1:1:1052:1111#0

    Either that or your readnames start with @ which is not allowed in SAM format?

    Comment


    • #3
      Hi there

      thanks for your reply. I gave a better look at the sam format and the number of "new line" should be correct. You are right when you say that the @ symbol can not be used in the reads lines. At the moment I am removing them, but I don't understand how come bowtie does not do this by its own when output a Sam format. I mean it gets in input Fastaq files which of course have the @symbol, and then when it outputs the sam file does not remove such symbol.... sound a bit wierd.....

      Comment


      • #4
        What version of bowtie are you using? It sounds like a bug if all the read names in the SAM output have a @ at the start.

        Comment


        • #5
          I am using version 0.12.5. May be I did something wrong with the command (I am just beginning to play around with these software..... and with this topic actually.... ) The command I used is the following:
          ./bowtie-0.12.5/bowtie -v 2 -k 5 --best --fr -p 8 -I 100 -S --solexa1.3-quals ./indexes/riferimento_pinot --12 exp_47_s_1.fastq_bowtie_pe,exp_47_s_2.fastq_bowtie_pe,exp_47_s_3.fastq_bowtie_pe paired_end.map

          is there anything wrong?

          thanks a lot for your help

          Comment


          • #6
            Well bowtie 0.12.5 is several months out of date, currently at 0.12.7, but the release notes don't mention any SAM output bug fixes:


            Could you post the first few reads of your FASTQ files?

            P.S. Use the [ code ] and [ /code ] tags to display it nicely on the forum (but without the spaces I have put round the square brackets). If you use the advanced editor then this can be accessed via the # toolbar button. Your original post tried <code> ... <\code> and [ code ] ... [ \code ] which are both wrong - use the other slash.

            Comment


            • #7
              Hi,

              I used a script to covert my fastaq files in order to be used with bowtie. I had paired mates in one file and I used the 12 flag in bowtie. In according with the manual the input file should have been in a TAB separated text format:

              Code:
              <r>   Comma-separated list of files containing a mix of unpaired and paired-end reads in Tab-delimited format. Tab-delimited format is a 1-read-per-line format where unpaired reads consist of a read name, sequence and quality string each separated by tabs. A paired-end read consists of a read name, sequnce of the #1 mate, quality values of the #1 mate, sequence of the #2 mate, and quality values of the #2 mate separated by tabs. Quality values can be expressed using any of the scales supported in FASTQ files. Reads may be a mix of different lengths and paired-end and unpaired reads may be intermingled in the same file. If - is specified, bowtie will read the Tab-delimited reads from the "standard in" filehandle.
              Therefore I used a script to convert the fastaq file which generated the following output:

              Code:
              @ILLUMINA-C3C24B_0047:1:1:1052:12086#0/1	TTCCGCGTCCTGACCTCCCCNGTTCAAGTAAGGCAACAACTACATATCCATCCTCTGCGTTAATCCATGTtaant	bbbbbbbbbbbbbbbbbbbaDaaaa```a`bb`bbbbbbbbbbbbbbbbb`bbbbbb_bbbb]bb_cbaaBBBBB	aaatttnggggtnagcaagtaacatacctaaagttgaaacataggnnancnancgagccacannnnannngnnnn	_____]E]]]NNEOO[[ZYV_____________\_________BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
              .... all fields are TAB separated. Considering your observations I think I should modify the script to avoid to output the '@' symbol in the bowtie input file. What do you think?

              thanks a lot

              Comment


              • #8
                Originally posted by scami View Post
                Therefore I used a script to convert the fastaq file which generated the following output:

                ...

                .... all fields are TAB separated. Considering your observations I think I should modify the script to avoid to output the '@' symbol in the bowtie input file. What do you think
                Yes, remove the @ from your tabular output - you are telling Bowtie the read names all start with an @ character, so it is putting this in the SAM output (which is invalid).

                Ideally I think you should also report this as a bug in Bowtie - arguably it should check the readnames don't start with @ when writing SAM format.

                In FASTQ files, the @ is just a record marker - not part of the read name.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM
                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                28 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-04-2024, 09:00 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Working...
                X