Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired end reads in Tophat

    We recently did a RNA-seq experiment and have a paired end read RNA-seq data. We ran TopHat and generated a bam file. When we visualize this bam file there were only 5-10 % of paired end reads rest are not. I was wondering how TopHat takes paired end reads? I know by reading other posts that Bowtie and TopHat treat paired end reads differently. How come that it is not displaying only small fraction of reads as paired in Bam output file of TopHat?

  • #2
    How does this compare to what you get when you use Bowtie, or whatever aligner alone?

    It could certainly be that your library is of lower quality, as opposed to something wrong with tophat.

    Comment


    • #3
      paired end reads

      Thanks peromhc,
      If I understand Bowtie and TopHat will treat reads independently;
      Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


      I also read somewhere in this forum Cole's post on similar line that both TopHat and Bowtie uses different alogrithm.
      Any advise please

      Comment


      • #4
        I'm just playing with tophat but here is one of our human RNAseq with standard illumina protocol mapping results with the must recent version of tophat using a GTF annotation file. (I noticed the percent mapped is always better if I use the GTF option)

        30795354 in total
        0 QC failure
        19282362 duplicates
        30795354 mapped (100.00%)
        30795354 paired in sequencing
        15965925 read1
        14829429 read2
        18961808 properly paired (61.57%)
        26151540 with itself and mate mapped
        4643814 singletons (15.08%)
        0 with mate mapped to a different chr
        0 with mate mapped to a different chr (mapQ>=5)

        Comment


        • #5
          Hi permohc, I looked at the topHat run stat and it says 85% are paired ends and 15% are singltone which is not matching what we see visualy in Bam file

          Comment


          • #6
            Hi,
            We have illumina sequence data for paired end reads and we are analyzing them for an RNASeq experiment. The reads are 101 bases long. The library size selected was 225-500 bp which INCLUDES the 2 adapters (60 bp each, on each end of the cDNA fragment).

            Subtracting a total of 120 bp, we are left with an insert size of 105 - 380 bases.

            Since our read length is 101 bases, I am wondering whether there is literally going to be an overlap or rather a redundancy in reading the lower end of the range of fragment sizes (i.e. the 105 base long fragments) in the two opposite directions (paired-end reads)?

            If this is the case, what do I set the mean insert size to when I use the tophat command (-r option) and the standard deviation option? - I know it takes integer, but in my case the mean inner distance is negative since there is an overlap.

            Also known to us is that the median insert size is 170. Hope someone has the answers and can help me out as soon as possible. Would really appreciate it. Thanks

            Comment


            • #7
              Tophat will accept a negative value

              We calculate from the sequencing data by aligning to a transcriptome reference around 1 million reads then use picard to get the actual library distribution metrics and feed those values to tophat. Check my intro thread or the code on our website (www.keatslab.org)

              Comment


              • #8
                Since 1.3.2 version :

                includes the following fixes and improvements:
                Deprecated -r as a required parameter (defaults to 50)

                Comment


                • #9
                  Thanks for the link to your website, looks great, but where can I find this intro/script? That makes sense if you would be able to use a portion of your reads to estimate the inner distance - but what is picard?

                  Would I be able to estimate both the mean transcript size (from which I subtract my paired end reads length) and also the variation (std deviation)?

                  Thanks a lot for your help

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Non-Coding RNA Research and Technologies
                    by seqadmin


                    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                    [Article Coming Soon!]...
                    Today, 08:07 AM
                  • seqadmin
                    Recent Developments in Metagenomics
                    by seqadmin





                    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                    09-23-2024, 06:35 AM
                  • seqadmin
                    Understanding Genetic Influence on Infectious Disease
                    by seqadmin




                    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                    09-09-2024, 10:59 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 10-02-2024, 04:51 AM
                  0 responses
                  14 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-01-2024, 07:10 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 09-30-2024, 08:33 AM
                  1 response
                  31 views
                  0 likes
                  Last Post EmiTom
                  by EmiTom
                   
                  Started by seqadmin, 09-26-2024, 12:57 PM
                  0 responses
                  19 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X