Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks' estimated fragment length mean and standard deviation

    I have a question about the Fragment Length Distribution that Cufflinks (v 0.9.0+) estimates, based on paired end read alignments:

    Is this the length of the fragment, including the two reads; or is it the inner distance between mate pairs, without the reads?

    The value I'm talking about is outputted to the screen when Cufflinks runs, e.g.,

    > Read Type: 60bp paired-end
    > Fragment Length Distribution: Empirical (learned)
    > Estimated Mean: 86.94
    > Estimated Std Dev: 13.90

    In the above example, if the value is the fragment length including reads, then I must be having a lot of adapter read through.

  • #2
    It is the length of the full fragment, including reads.

    Comment


    • #3
      Thanks for the info!

      I guess my fragments are a lot shorter than what the Agilent High Sensitivity chip predicts/measures. Sigh.

      Comment


      • #4
        I am having a similar issue-- both the gel I ran to look at the fragment length and the bioanalyzer result had a peak at 240 bp, with the bioanalyzer result showing a tight distribution centered around 240. Cufflinks estimates the insert length of this sample as 120 bp with a standard deviation of 70 bp. That is not even close! There is literally no signal at all at 120 bp on the bioanalyzer trace.

        If I supply a GTF file, the estimated distributions are similar in length (120 vs 107) but not std. dev (70 vs. 19). Either way, I am not really understanding where the report of insert length of 100 bp less than I cut out of the gel, visualized on a gel and visualized on a bioanalyzer trace comes from. Any thoughts?
        Last edited by roryk; 10-27-2010, 06:57 AM.

        Comment


        • #5
          roryk,

          A few questions:

          1. What read mapper did you use?
          2. Does Cufflinks correctly report the read length?
          3. Have you tried using Cufflinks 0.9.2?

          Thanks.

          Comment


          • #6
            Hi adarob,

            I used bowtie (tophat) to map the reads, an example:

            tophat -p 4 -G ../misc/rat_knowngene.gtf -o /mnt/sc_exp/E_L4 -r 178 rn4 E_L4_1.fq E_L4_2.fq

            178 is from 250 - 36 * 2, two 36 basepair reads. rat_knowngene.gtf is from the UCSC knowngenes.

            Cufflinks does correctly report the read length. I get the same result using Cufflinks 0.9.1 and 0.9.2.

            Comment


            • #7
              roryk,

              Are you compiling cufflinks from source? There is a small change you can make to the source code to have it output the empirical distribution. Otherwise, would you be willing to make your bam file available for me to help resolve this?

              -Adam

              Comment


              • #8
                I have just been using the precompiled binaries for linux but am not adverse to compiling it from source. I emailed you a link to the bam file.

                Comment


                • #9
                  Originally posted by roryk View Post
                  I am having a similar issue-- both the gel I ran to look at the fragment length and the bioanalyzer result had a peak at 240 bp, with the bioanalyzer result showing a tight distribution centered around 240. Cufflinks estimates the insert length of this sample as 120 bp with a standard deviation of 70 bp. That is not even close! There is literally no signal at all at 120 bp on the bioanalyzer trace.

                  If I supply a GTF file, the estimated distributions are similar in length (120 vs 107) but not std. dev (70 vs. 19). Either way, I am not really understanding where the report of insert length of 100 bp less than I cut out of the gel, visualized on a gel and visualized on a bioanalyzer trace comes from. Any thoughts?
                  Just to be absolutely clear, was the sample which you measured as 240bp before or after ligating the Illumina adapters? The combined length of the Illumina RNA-Seq (or PE) adapters is 119bp. If your 240bp includes these than the estimate from cufflinks is spot on as it is estimating the size of the insert only.

                  Comment


                  • #10
                    Originally posted by kmcarr View Post
                    Just to be absolutely clear, was the sample which you measured as 240bp before or after ligating the Illumina adapters? The combined length of the Illumina RNA-Seq (or PE) adapters is 119bp. If your 240bp includes these than the estimate from cufflinks is spot on as it is estimating the size of the insert only.
                    Yup, this is exactly right; I thought the combined length of the PE adaptors was half what it is. Thanks!

                    Comment


                    • #11
                      Glad to see the problem was resolved.

                      Comment


                      • #12
                        Thanks, kmcarr!

                        I had a similar problem: The fragment length estimated by Cufflinks completely disagreed with that measured by the Agilent High Sensitivity Chip. But, subtracting 119bp from the Agilent measurement, they now agree. Another discrepancy resolved, thank goodness.

                        Comment


                        • #13
                          It is helpful to read this old thread.

                          I am new to Illumina RNA-Seq data. Our lab used SOLiD/Torrent/Proton in the past but we will use Illumina platform for future RNA-Seq projects with large sample size. I am looking at an unstranded RNA-Seq data generated by another lab using TruSeq RNA sample prep kit v2 protocol to get familiar with the Illumina data. My question related to this thread is whether the combined length of adapters for all Illumina protocols including the stranded protocol is always 119bp?

                          Thanks!

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          12 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          10 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          51 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          68 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X