Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difficulty with bowtie's output sam format

    Hello. I was wondering if I could get some help.

    I'm trying to use bowtie to align a fasta file using an index and output it in sam format. Later on, I'll need to process the output sam file using another script. Anyway, I called bowtie with these options

    bowtie -f -t -p 8 -n 3 -l 32 -k 1 -m 100 -S -y --chunkmbs 1024 --max FASTA_FILE.mm.fasta --best

    using input of the form

    >38-1
    TGGAACGGAACGGAATGGAAGGGAATGGAATGGAAT


    and got output of the form

    38-1 0 chrY:28807964-28808132 275 255 36M * 0 0 TGGAACGGAACGGAATGGAAGGGAATGGAATGGAAT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII XA:i:2 MD:Z:5T14A15 NM:i:2

    Maybe I'm misunderstanding something, but this doesn't exactly appear to be in sam format. What I would like is for the coordinate of the leftmost position of the sequence to be in the fourth field (instead of 275--I'm not sure what that number represents, but given the label in field three, I don't think it's the coordinate) and for only the chromosome to be in the third field. I could manually modify the output, but I'm afraid to throw away that 275 because I have no idea what it is.

    Does anyone know what I'm doing wrong? Any help is appreciated. If you need more information, I'll do my best to provide it.

    Thanks,

    David

  • #2
    Difficulty with bowtie's output sam format

    Your parameters seem OK, I assume you did also include the name of your index.

    It does look like sam format, although as you say, the 275 seems difficult to explain.

    Have you compared your read sequence to the sequence of whatever is called chrY:28807964-28808132 in your reference to see if the alignment makes sense?

    Comment


    • #3
      Check this: http://bowtie-bio.sourceforge.net/ma...-bowtie-output

      Comment


      • #4
        I think that's all proper SAM format

        38-1 is the name of the read, 0 is the flag (the read maps forward) the reference is named " chrY:28807964-28808132", 275 is the position, 255 is the mapping quality (255 seems to mean that the mapping quality is not available), 36M is the CIGAR, etc

        Comment


        • #5
          Thanks for the quick responses, everyone.

          Your parameters seem OK, I assume you did also include the name of your index.

          It does look like sam format, although as you say, the 275 seems difficult to explain.

          Have you compared your read sequence to the sequence of whatever is called chrY:28807964-28808132 in your reference to see if the alignment makes sense?
          I did include the name of the index when I called bowtie. I found the reference sequence in a fasta file off of which the index was based. The sequence corresonding to chrY:28807964-28808132 is

          >chrY:28807964-28808132
          NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNtgaagtggagtggagtgtaacgaaatggggtggaatgtaattgaatggagtggagtgtttggagtctactggagtggaatggaacggaatggaaaggaatggaatggaatggagtgaagtgcagtgcagtgaaatggagtggaaaggaatggaatggaatcaaatggaNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN


          This doesn't match exactly, but that can be acceptable from what I understand, so I believe this alignment makes sense. That being said...

          I think that's all proper SAM format

          38-1 is the name of the read, 0 is the flag (the read maps forward) the reference is named " chrY:28807964-28808132", 275 is the position, 255 is the mapping quality (255 seems to mean that the mapping quality is not available), 36M is the CIGAR, etc
          At first I was going to say that 275 can't be the position since I would expect the position to fall between 28807964 and 28808132. But then I realized that this number actually corresponds to position in the reference sequence, and sure enough when I checked these characters, they matched the ones given in the bowtie output!

          Anyway, I feel comfortable with making some manual adjustments now.

          Thanks again for your help.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Yesterday, 08:47 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X