Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpreting the CIGAR and Read Sequence in sam/bam

    Hi,

    I ran bowtie2 to align the fastq reads to align against a de novo assembled transcriptome. I ran the following command.

    Code:
    bowtie2 -p 30 /path/to/bowti2_indexes -U Sample1.fastq -S Sample1.sam
    samtools view -F 4 -bS Sample1.sam -o Sample1.bam
    samtools sort Sample1.bam Sorted_Sample1
    I obtained a sam file which I later converted it into a bam file and then sorted it out. When I open the bam file I found that for some transcripts has CIGAR value of less than 25M and has Read Sequence corresponding to CIGAR value. For an example, if the CIGAR for particular transcript is 12M and has the Read Sequence like ATGGCAGTGTTG. How could I interpret the information?

    Kindly guide me

    Regards
    Deena

  • #2
    You must have seen the SAM spec https://samtools.github.io/hts-specs/SAMv1.pdf (Page 1 and 5).

    Comment


    • #3
      Originally posted by GenoMax View Post
      You must have seen the SAM spec https://samtools.github.io/hts-specs/SAMv1.pdf (Page 1 and 5).
      Thanks for your reply. I read the article. In my case, as I ran Bowtie2 with default parameters(Seed length 25 and no mismatch). Now when there is 12M in Cigar and Read Sequence like ATGGCAGTGTTG, what does it indicate? Is this 12 nucleotides matches or mismatches with my reference transcript? If this is match, then how can it be possible as minimum seed lenght is 25 and if it a mismacth, then by default setting, there is no mismatch allowed.

      As I am new to RNAseq analysis, kindly guide me.

      Comment


      • #4
        12M just means 12 bases aligned. It does not indicate whether they're matches or mismatches, you have to look at the MD auxiliary tag for that information.

        Regarding the seeding, I'd have to check the source code, but I suspect it just uses the whole sequence as the seed if it's shorter than the -L parameter.

        Comment


        • #5
          Originally posted by dpryan View Post
          12M just means 12 bases aligned. It does not indicate whether they're matches or mismatches, you have to look at the MD auxiliary tag for that information.

          Regarding the seeding, I'd have to check the source code, but I suspect it just uses the whole sequence as the seed if it's shorter than the -L parameter.
          Hi Ryan,

          My doubt is that when the seed length is 25 and I havent allowed any mismatches, then how could there be Read Sequences in my bam/sam file which are less than 25. Am I missing something.

          Which source code you wanna check out. I am using bwotie2-align-version 2.1.0.

          Comment


          • #6
            Originally posted by dena.dinesh View Post
            My doubt is that when the seed length is 25 and I havent allowed any mismatches, then how could there be Read Sequences in my bam/sam file which are less than 25.
            I addressed that in my reply.

            Comment


            • #7
              Can you post the full SAM record for the sequence you are asking about?

              Why are you using an old version of bowtie2 BTW?

              Comment


              • #8
                Originally posted by GenoMax View Post
                Can you post the full SAM record for the sequence you are asking about?

                Why are you using an old version of bowtie2 BTW?
                Here is a full SAM record for particular transcript

                H134:235:C701AACXX:1:1101:11985:26940 0 ref_comp3_c0_seq1 888 1 16M * 0 0 TAGATCAAAATCAACC BCCFFFFFHHHHHJJJ AS:i:0 XS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:16 YT:Z:UU

                Also I downloaded the source package for the bowtie2 2.2.5 and later compiled using "make" command. After compiling, when run the command bowtie2 --version, it gives me Bowtie2 version 2.1.0. But I downloaded the version 2.2.5
                Last edited by dena.dinesh; 05-29-2015, 04:14 AM.

                Comment


                • #9
                  On Linux/Unix, running "bowtie" would give you the installed version found via the $PATH environment variable, even if there is a different "bowtie" in the current working directory.

                  Running "/path/to/where/you/compiled/it/bowtie" would explicitly use that version.

                  Did you install the new version (by editing your $PATH or copying the new binary to a folder on the path)? If not, try "make install" to do this.

                  Comment


                  • #10
                    Originally posted by maubp View Post
                    On Linux/Unix, running "bowtie" would give you the installed version found via the $PATH environment variable, even if there is a different "bowtie" in the current working directory.

                    Running "/path/to/where/you/compiled/it/bowtie" would explicitly use that version.

                    Did you install the new version (by editing your $PATH or copying the new binary to a folder on the path)? If not, try "make install" to do this.
                    Hi,

                    I just uninstalled that older version and re installed the newer version. Now bowtiw has 2.2.5. Now I ma finidng difficult to interpret the results os sam/bam alignment file. When I dont specify seed length in bowtie2, then how could I get Read Sequence less than 25 nucletotides. Kindly guide me

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Best Practices for Single-Cell Sequencing Analysis
                      by seqadmin



                      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                      06-06-2024, 07:15 AM
                    • seqadmin
                      Latest Developments in Precision Medicine
                      by seqadmin



                      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                      Somatic Genomics
                      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                      05-24-2024, 01:16 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 06-07-2024, 06:58 AM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-06-2024, 08:18 AM
                    0 responses
                    21 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-06-2024, 08:04 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 06-03-2024, 06:55 AM
                    0 responses
                    14 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X