Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • analysis after mapping

    Hi,I'm a beginner in RNA-Seq analysis,now I've mapped the short reads with Bowtie,TopHat and other softwares,I'm trying to analyze the mapping efficiency by counting the uniquely mapped reads,spanning junction reads and unmappable reads in different mapping results.But how can I get the information about which one is uniquely mapped,which one is mutiple reads from the SAM format file I get with the softwares?
    Thank you for your attention.

  • #2
    Originally posted by Huijuan View Post
    Hi,I'm a beginner in RNA-Seq analysis,now I've mapped the short reads with Bowtie,TopHat and other softwares,I'm trying to analyze the mapping efficiency by counting the uniquely mapped reads,spanning junction reads and unmappable reads in different mapping results.But how can I get the information about which one is uniquely mapped,which one is mutiple reads from the SAM format file I get with the softwares?
    Thank you for your attention.
    First you have to separate the Aligned and unaligned reads in the SAM file. For this you can use Picard:
    For Aligned reads :
    java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
    For Unaligned reads:
    java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

    You can then count the rows in each file that don't start with @ with:
    grep -cv "^@" your_Unaligned_reads.sam
    Should give you the number of unaligned reads.

    For the uniquely mapped reads: there should have this tag in the sam file: XT:A:U, so to count them:
    grep -c "XT:A:U" your_Aligned_reads.sam
    The reads that map to more than one position should be have "XT:A:R", so:
    grep -c "XT:A:R" your_Aligned_reads.sam

    Hope this helps.....

    Comment


    • #3
      Originally posted by upendra_35 View Post
      First you have to separate the Aligned and unaligned reads in the SAM file. For this you can use Picard:
      For Aligned reads :
      java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
      For Unaligned reads:
      java -jar /usr/local/bin/ViewSam.jar INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

      You can then count the rows in each file that don't start with @ with:
      grep -cv "^@" your_Unaligned_reads.sam
      Should give you the number of unaligned reads.

      For the uniquely mapped reads: there should have this tag in the sam file: XT:A:U, so to count them:
      grep -c "XT:A:U" your_Aligned_reads.sam
      The reads that map to more than one position should be have "XT:A:R", so:
      grep -c "XT:A:R" your_Aligned_reads.sam

      Hope this helps.....
      Many thanks for helping.I did as what you said and got what I want.Thank you again
      Huijuan

      Comment


      • #4
        Here's the problem
        I use the way you told me with the mapping results with BWA. And the total number of unaligned reads, uniquely mapped reads and multireads is not equal to the total number of short reads and far less than it. What's the problem ?
        Thank you again
        Huijuan

        Comment


        • #5
          Originally posted by Huijuan View Post
          Here's the problem
          I use the way you told me with the mapping results with BWA. And the total number of unaligned reads, uniquely mapped reads and multireads is not equal to the total number of short reads and far less than it. What's the problem ?
          Thank you again
          Huijuan
          Do you get any error when you run the way i mentioned before? If yes then try running the modified way which is below. It should fix the problem. Good luck

          #To separate Aligned and unaligned reads I am using Picard now:
          #Aligned:
          java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
          #Unaligned:
          java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

          #You can then count the rows in each file that don't start with @ should give you the number of unaligned reads.:
          grep -cv "^@" your_Unaligned_reads.sam

          #For the uniquely mapped reads: they should have this tag in the sam file: XT:A:U, so to count them:
          grep -c "XT:A:U" your_Aligned_reads.sam
          #The reads that map to more than one position should be have "XT:A:R", so:
          grep -c "XT:A:R" your_Aligned_reads.sam

          Comment


          • #6
            Originally posted by upendra_35 View Post
            Do you get any error when you run the way i mentioned before? If yes then try running the modified way which is below. It should fix the problem. Good luck

            #To separate Aligned and unaligned reads I am using Picard now:
            #Aligned:
            java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Aligned PF_STATUS=All > your_Aligned_reads.sam
            #Unaligned:
            java -jar /usr/local/bin/ViewSam.jar VALIDATION_STRINGENCY=LENIENT INPUT=your_mapped_reads.sam ALIGNMENT_STATUS=Unaligned PF_STATUS=All > your_Unaligned_reads.sam

            #You can then count the rows in each file that don't start with @ should give you the number of unaligned reads.:
            grep -cv "^@" your_Unaligned_reads.sam

            #For the uniquely mapped reads: they should have this tag in the sam file: XT:A:U, so to count them:
            grep -c "XT:A:U" your_Aligned_reads.sam
            #The reads that map to more than one position should be have "XT:A:R", so:
            grep -c "XT:A:R" your_Aligned_reads.sam
            I‘ve figured out the reason.Thanks so much for helping.
            Last edited by Huijuan; 04-26-2011, 05:00 PM.

            Comment


            • #7
              Originally posted by Huijuan View Post
              I‘ve figure out the reason.Thanks so much for helping.
              Can you share with me how you fixed the problem? Thanks

              Comment


              • #8
                This time I didn't use viewSam.jar and just used awk and grep to get them apart and then did counting

                Comment


                • #9
                  #For the uniquely mapped reads: they should have this tag in the sam file: XT:A:U, so to count them:
                  grep -c "XT:A:U" your_Aligned_reads.sam
                  #The reads that map to more than one position should be have "XT:A:R", so:
                  grep -c "XT:A:R" your_Aligned_reads.sam
                  Both of these grep's return 0 for me.


                  Are you doing any other steps between tophat and changing the file from bam to sam?

                  Comment


                  • #10
                    I used awk and grep for this task - just need to base on the 'read name' and CIGAR in sam file.

                    Comment


                    • #11
                      Originally posted by brpet View Post
                      Both of these grep's return 0 for me.


                      Are you doing any other steps between tophat and changing the file from bam to sam?
                      Yes i do the following to convert the bam to sam

                      samtools view accepted_hits.bam > accepted_hits.sam
                      samtools view -H accepted_hits.bam > header.txt
                      cat header.txt accepted_hits.sam > accepted_hits2.sam

                      But you can't use this sam file for counting the number of reads that are mapped to the reference or not as this sam is made from bam and bam file is supposed to contain only mapped reads.

                      I am still to figure out how to use tophat generated bam file for counting the number of reads that are mapped to the reference or not. I will let you know once i know it.

                      Comment


                      • #12
                        Originally posted by upendra_35 View Post
                        Yes i do the following to convert the bam to sam

                        samtools view accepted_hits.bam > accepted_hits.sam
                        samtools view -H accepted_hits.bam > header.txt
                        cat header.txt accepted_hits.sam > accepted_hits2.sam

                        But you can't use this sam file for counting the number of reads that are mapped to the reference or not as this sam is made from bam and bam file is supposed to contain only mapped reads.

                        I am still to figure out how to use tophat generated bam file for counting the number of reads that are mapped to the reference or not. I will let you know once i know it.
                        there‘s a method saying that you can keep the tmp files when you run Tophat by the parameter --keep-tmp then you will get both sam and bam files

                        Comment


                        • #13
                          Originally posted by upendra_35 View Post
                          Yes i do the following to convert the bam to sam

                          samtools view accepted_hits.bam > accepted_hits.sam
                          samtools view -H accepted_hits.bam > header.txt
                          cat header.txt accepted_hits.sam > accepted_hits2.sam
                          You can output the header with the alignments to your SAM file in one step, avoiding long cat job. The '-h' option for samtools view will output the header in addition to the alignments.

                          Code:
                          % samtools view -h accepted_hits.bam -o accepted_hits.sam
                          The -o option is used to give the output file name (or redirect stdout as shown in upendra's example).

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Exploring the Dynamics of the Tumor Microenvironment
                            by seqadmin




                            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                            07-08-2024, 03:19 PM
                          • seqadmin
                            Exploring Human Diversity Through Large-Scale Omics
                            by seqadmin


                            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                            06-25-2024, 06:43 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 07-16-2024, 05:49 AM
                          0 responses
                          27 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-15-2024, 06:53 AM
                          0 responses
                          33 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-10-2024, 07:30 AM
                          0 responses
                          40 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-03-2024, 09:45 AM
                          0 responses
                          205 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X