Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • 50 bp paired end reads vs. 100 bp single end reads

    In an RNA seq experiment in which the goal is simply to count transcripts (vs., e.g., finding evidence for translocations or novel splice variants), is it better to have 100 base pair single end reads or 50 base pair paired end reads? (100 base pair single end reads seem better to me.)

    Thanks.

    Eric

  • #2
    50 base-pair paired-end reads span a longer region of the transcript. Each read represents one end of a ~200-300 base-pair RNA fragment, compared to a 100 base-pair read which only gives you information about 100 bases. A larger fragment means you are more likely to span a splice junction, insertion, or deletion. Therefore 50 bp is preferable. Remember that with next gen sequencing technology, for the most part your read is only a "tag" that tells you where in the genome the fragment originated. As long as the "tag" is long enough to be unique (and 50 bp is for the most part) you are set.

    Comment


    • #3
      An important application of paired read is that each pair suppose to come from same gene. So by using paired end read one can easily mark approx gene boundaries.

      Comment


      • #4
        Hi Eric,

        Is your genome well-annotated? If not, and you plan to build gene models, then PE-50bp would be preferable.

        Comment


        • #5
          Originally posted by HESmith View Post
          Hi Eric,

          Is your genome well-annotated? If not, and you plan to build gene models, then PE-50bp would be preferable.
          Yes, the genomes are well annotated. I'm working with human and yeast.

          Eric

          Comment


          • #6
            In that case, single end 50bp should be fine for obtaining gene counts.

            Comment


            • #7
              Also, depending on the software you use, the RNA-seq module may not be able to use PE reads. At least without some extra work. Yes, CASAVA, I'm looking at you!

              I'll agree that for well annotated genomes 50 bp SE should be satisfactory.

              Comment


              • #8
                If you're sequencing an equal number of base pairs, I vote for paired end reads. I agree with dcfactor that there won't be much difference between estimates from 50 bp and 100 bp read data if you're counting the alignment hits per gene. You get more data for your sequencing buck because each aligned pair gives you information on not only the sequences covered by the reads but the region between them as well.

                50 bp of sequence in a mate pair can be more useful for read mapping than an extra 50 bp in the read itself. If you don't find what you're looking for the gene-level expression patterns, having paired end data leaves more avenues open for other analysis.

                [email protected]
                Spiral Genetics has 6 repositories available. Follow their code on GitHub.

                Comment


                • #9
                  Thanks very much everyone. I'll be going with 50 bp paired reads.

                  Eric

                  Comment


                  • #10
                    If you want to look for differential expression, sequencing depth is usually the most limiting factor. Hence, I would go for 50bp single-end and invest the money you saved into sequencing another lane (ideally, with a biological replicate).

                    Longer reads are useful to see where splice sites are located, but for humans, we already know that quite well, and for yeast, there is hardly any splicing, anyway. Long reads don't help much for mapping, because the transcribed part of the genome is usually not that repetitive, and 50 bp is usually long enough to even distinguish most orthologs.

                    Paired-end reads may or may not help to distinguish isoforms of the same gene, but, at least for yeast, this is unimportant, of course.

                    Comment


                    • #11
                      I'll second Simon's advice. You'll get very little additional information by paired-end sequencing (since, in essence, you're just counting each gene twice). Biological replicates provide much more information. In fact, I'd recommend triplicates at a minimum to provide statistical power to your analysis. If you index the samples, they can be sequenced in the same lane; the only additional cost would be preparing separate libraries (~$50 each for Illumina). Three biological replicates of 20 million reads each are much better than a single sample of 60 million.

                      Comment


                      • #12
                        @Simon and HESmith

                        does your recommendation apply to poorly annotated genomes as well ?

                        Comment


                        • #13
                          I am also very confused to why 50bp SE give higher alignment rate than 100bp PE ? I have done the experiment myself using 100 bp PE and compared to trimmed 50 bp SE.
                          Thank you

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Exploring the Dynamics of the Tumor Microenvironment
                            by seqadmin




                            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                            07-08-2024, 03:19 PM
                          • seqadmin
                            Exploring Human Diversity Through Large-Scale Omics
                            by seqadmin


                            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                            06-25-2024, 06:43 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Today, 11:09 AM
                          0 responses
                          14 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-19-2024, 07:20 AM
                          0 responses
                          146 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-16-2024, 05:49 AM
                          0 responses
                          120 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 07-15-2024, 06:53 AM
                          0 responses
                          111 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X