Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting the CIGAR and Read Sequence in sam/bam

    Hi,

    I ran bowtie2 to align the fastq reads to align against a de novo assembled transcriptome. I ran the following command.

    Code:
    bowtie2 -p 30 /path/to/bowti2_indexes -U Sample1.fastq -S Sample1.sam
    samtools view -F 4 -bS Sample1.sam -o Sample1.bam
    samtools sort Sample1.bam Sorted_Sample1
    I obtained a sam file which I later converted it into a bam file and then sorted it out. When I open the bam file I found that for some transcripts has CIGAR value of less than 25M and has Read Sequence corresponding to CIGAR value. For an example, if the CIGAR for particular transcript is 12M and has the Read Sequence like ATGGCAGTGTTG. How could I interpret the information?

    Kindly guide me

    Regards
    Deena

  • #2
    You must have seen the SAM spec https://samtools.github.io/hts-specs/SAMv1.pdf (Page 1 and 5).

    Comment


    • #3
      Originally posted by GenoMax View Post
      You must have seen the SAM spec https://samtools.github.io/hts-specs/SAMv1.pdf (Page 1 and 5).
      Thanks for your reply. I read the article. In my case, as I ran Bowtie2 with default parameters(Seed length 25 and no mismatch). Now when there is 12M in Cigar and Read Sequence like ATGGCAGTGTTG, what does it indicate? Is this 12 nucleotides matches or mismatches with my reference transcript? If this is match, then how can it be possible as minimum seed lenght is 25 and if it a mismacth, then by default setting, there is no mismatch allowed.

      As I am new to RNAseq analysis, kindly guide me.

      Comment


      • #4
        12M just means 12 bases aligned. It does not indicate whether they're matches or mismatches, you have to look at the MD auxiliary tag for that information.

        Regarding the seeding, I'd have to check the source code, but I suspect it just uses the whole sequence as the seed if it's shorter than the -L parameter.

        Comment


        • #5
          Originally posted by dpryan View Post
          12M just means 12 bases aligned. It does not indicate whether they're matches or mismatches, you have to look at the MD auxiliary tag for that information.

          Regarding the seeding, I'd have to check the source code, but I suspect it just uses the whole sequence as the seed if it's shorter than the -L parameter.
          Hi Ryan,

          My doubt is that when the seed length is 25 and I havent allowed any mismatches, then how could there be Read Sequences in my bam/sam file which are less than 25. Am I missing something.

          Which source code you wanna check out. I am using bwotie2-align-version 2.1.0.

          Comment


          • #6
            Originally posted by dena.dinesh View Post
            My doubt is that when the seed length is 25 and I havent allowed any mismatches, then how could there be Read Sequences in my bam/sam file which are less than 25.
            I addressed that in my reply.

            Comment


            • #7
              Can you post the full SAM record for the sequence you are asking about?

              Why are you using an old version of bowtie2 BTW?

              Comment


              • #8
                Originally posted by GenoMax View Post
                Can you post the full SAM record for the sequence you are asking about?

                Why are you using an old version of bowtie2 BTW?
                Here is a full SAM record for particular transcript

                H134:235:C701AACXX:1:1101:11985:26940 0 ref_comp3_c0_seq1 888 1 16M * 0 0 TAGATCAAAATCAACC BCCFFFFFHHHHHJJJ AS:i:0 XS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:16 YT:Z:UU

                Also I downloaded the source package for the bowtie2 2.2.5 and later compiled using "make" command. After compiling, when run the command bowtie2 --version, it gives me Bowtie2 version 2.1.0. But I downloaded the version 2.2.5
                Last edited by dena.dinesh; 05-29-2015, 04:14 AM.

                Comment


                • #9
                  On Linux/Unix, running "bowtie" would give you the installed version found via the $PATH environment variable, even if there is a different "bowtie" in the current working directory.

                  Running "/path/to/where/you/compiled/it/bowtie" would explicitly use that version.

                  Did you install the new version (by editing your $PATH or copying the new binary to a folder on the path)? If not, try "make install" to do this.

                  Comment


                  • #10
                    Originally posted by maubp View Post
                    On Linux/Unix, running "bowtie" would give you the installed version found via the $PATH environment variable, even if there is a different "bowtie" in the current working directory.

                    Running "/path/to/where/you/compiled/it/bowtie" would explicitly use that version.

                    Did you install the new version (by editing your $PATH or copying the new binary to a folder on the path)? If not, try "make install" to do this.
                    Hi,

                    I just uninstalled that older version and re installed the newer version. Now bowtiw has 2.2.5. Now I ma finidng difficult to interpret the results os sam/bam alignment file. When I dont specify seed length in bowtie2, then how could I get Read Sequence less than 25 nucletotides. Kindly guide me

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-25-2024, 11:49 AM
                    0 responses
                    19 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-24-2024, 08:47 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    62 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X