Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 50 bp paired end reads vs. 100 bp single end reads

    In an RNA seq experiment in which the goal is simply to count transcripts (vs., e.g., finding evidence for translocations or novel splice variants), is it better to have 100 base pair single end reads or 50 base pair paired end reads? (100 base pair single end reads seem better to me.)

    Thanks.

    Eric

  • #2
    50 base-pair paired-end reads span a longer region of the transcript. Each read represents one end of a ~200-300 base-pair RNA fragment, compared to a 100 base-pair read which only gives you information about 100 bases. A larger fragment means you are more likely to span a splice junction, insertion, or deletion. Therefore 50 bp is preferable. Remember that with next gen sequencing technology, for the most part your read is only a "tag" that tells you where in the genome the fragment originated. As long as the "tag" is long enough to be unique (and 50 bp is for the most part) you are set.

    Comment


    • #3
      An important application of paired read is that each pair suppose to come from same gene. So by using paired end read one can easily mark approx gene boundaries.

      Comment


      • #4
        Hi Eric,

        Is your genome well-annotated? If not, and you plan to build gene models, then PE-50bp would be preferable.

        Comment


        • #5
          Originally posted by HESmith View Post
          Hi Eric,

          Is your genome well-annotated? If not, and you plan to build gene models, then PE-50bp would be preferable.
          Yes, the genomes are well annotated. I'm working with human and yeast.

          Eric

          Comment


          • #6
            In that case, single end 50bp should be fine for obtaining gene counts.

            Comment


            • #7
              Also, depending on the software you use, the RNA-seq module may not be able to use PE reads. At least without some extra work. Yes, CASAVA, I'm looking at you!

              I'll agree that for well annotated genomes 50 bp SE should be satisfactory.

              Comment


              • #8
                If you're sequencing an equal number of base pairs, I vote for paired end reads. I agree with dcfactor that there won't be much difference between estimates from 50 bp and 100 bp read data if you're counting the alignment hits per gene. You get more data for your sequencing buck because each aligned pair gives you information on not only the sequences covered by the reads but the region between them as well.

                50 bp of sequence in a mate pair can be more useful for read mapping than an extra 50 bp in the read itself. If you don't find what you're looking for the gene-level expression patterns, having paired end data leaves more avenues open for other analysis.

                [email protected]
                Spiral Genetics has 6 repositories available. Follow their code on GitHub.

                Comment


                • #9
                  Thanks very much everyone. I'll be going with 50 bp paired reads.

                  Eric

                  Comment


                  • #10
                    If you want to look for differential expression, sequencing depth is usually the most limiting factor. Hence, I would go for 50bp single-end and invest the money you saved into sequencing another lane (ideally, with a biological replicate).

                    Longer reads are useful to see where splice sites are located, but for humans, we already know that quite well, and for yeast, there is hardly any splicing, anyway. Long reads don't help much for mapping, because the transcribed part of the genome is usually not that repetitive, and 50 bp is usually long enough to even distinguish most orthologs.

                    Paired-end reads may or may not help to distinguish isoforms of the same gene, but, at least for yeast, this is unimportant, of course.

                    Comment


                    • #11
                      I'll second Simon's advice. You'll get very little additional information by paired-end sequencing (since, in essence, you're just counting each gene twice). Biological replicates provide much more information. In fact, I'd recommend triplicates at a minimum to provide statistical power to your analysis. If you index the samples, they can be sequenced in the same lane; the only additional cost would be preparing separate libraries (~$50 each for Illumina). Three biological replicates of 20 million reads each are much better than a single sample of 60 million.

                      Comment


                      • #12
                        @Simon and HESmith

                        does your recommendation apply to poorly annotated genomes as well ?

                        Comment


                        • #13
                          I am also very confused to why 50bp SE give higher alignment rate than 100bp PE ? I have done the experiment myself using 100 bp PE and compared to trimmed 50 bp SE.
                          Thank you

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          9 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          51 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          67 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X