Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks' estimated fragment length mean and standard deviation

    I have a question about the Fragment Length Distribution that Cufflinks (v 0.9.0+) estimates, based on paired end read alignments:

    Is this the length of the fragment, including the two reads; or is it the inner distance between mate pairs, without the reads?

    The value I'm talking about is outputted to the screen when Cufflinks runs, e.g.,

    > Read Type: 60bp paired-end
    > Fragment Length Distribution: Empirical (learned)
    > Estimated Mean: 86.94
    > Estimated Std Dev: 13.90

    In the above example, if the value is the fragment length including reads, then I must be having a lot of adapter read through.

  • #2
    It is the length of the full fragment, including reads.

    Comment


    • #3
      Thanks for the info!

      I guess my fragments are a lot shorter than what the Agilent High Sensitivity chip predicts/measures. Sigh.

      Comment


      • #4
        I am having a similar issue-- both the gel I ran to look at the fragment length and the bioanalyzer result had a peak at 240 bp, with the bioanalyzer result showing a tight distribution centered around 240. Cufflinks estimates the insert length of this sample as 120 bp with a standard deviation of 70 bp. That is not even close! There is literally no signal at all at 120 bp on the bioanalyzer trace.

        If I supply a GTF file, the estimated distributions are similar in length (120 vs 107) but not std. dev (70 vs. 19). Either way, I am not really understanding where the report of insert length of 100 bp less than I cut out of the gel, visualized on a gel and visualized on a bioanalyzer trace comes from. Any thoughts?
        Last edited by roryk; 10-27-2010, 06:57 AM.

        Comment


        • #5
          roryk,

          A few questions:

          1. What read mapper did you use?
          2. Does Cufflinks correctly report the read length?
          3. Have you tried using Cufflinks 0.9.2?

          Thanks.

          Comment


          • #6
            Hi adarob,

            I used bowtie (tophat) to map the reads, an example:

            tophat -p 4 -G ../misc/rat_knowngene.gtf -o /mnt/sc_exp/E_L4 -r 178 rn4 E_L4_1.fq E_L4_2.fq

            178 is from 250 - 36 * 2, two 36 basepair reads. rat_knowngene.gtf is from the UCSC knowngenes.

            Cufflinks does correctly report the read length. I get the same result using Cufflinks 0.9.1 and 0.9.2.

            Comment


            • #7
              roryk,

              Are you compiling cufflinks from source? There is a small change you can make to the source code to have it output the empirical distribution. Otherwise, would you be willing to make your bam file available for me to help resolve this?

              -Adam

              Comment


              • #8
                I have just been using the precompiled binaries for linux but am not adverse to compiling it from source. I emailed you a link to the bam file.

                Comment


                • #9
                  Originally posted by roryk View Post
                  I am having a similar issue-- both the gel I ran to look at the fragment length and the bioanalyzer result had a peak at 240 bp, with the bioanalyzer result showing a tight distribution centered around 240. Cufflinks estimates the insert length of this sample as 120 bp with a standard deviation of 70 bp. That is not even close! There is literally no signal at all at 120 bp on the bioanalyzer trace.

                  If I supply a GTF file, the estimated distributions are similar in length (120 vs 107) but not std. dev (70 vs. 19). Either way, I am not really understanding where the report of insert length of 100 bp less than I cut out of the gel, visualized on a gel and visualized on a bioanalyzer trace comes from. Any thoughts?
                  Just to be absolutely clear, was the sample which you measured as 240bp before or after ligating the Illumina adapters? The combined length of the Illumina RNA-Seq (or PE) adapters is 119bp. If your 240bp includes these than the estimate from cufflinks is spot on as it is estimating the size of the insert only.

                  Comment


                  • #10
                    Originally posted by kmcarr View Post
                    Just to be absolutely clear, was the sample which you measured as 240bp before or after ligating the Illumina adapters? The combined length of the Illumina RNA-Seq (or PE) adapters is 119bp. If your 240bp includes these than the estimate from cufflinks is spot on as it is estimating the size of the insert only.
                    Yup, this is exactly right; I thought the combined length of the PE adaptors was half what it is. Thanks!

                    Comment


                    • #11
                      Glad to see the problem was resolved.

                      Comment


                      • #12
                        Thanks, kmcarr!

                        I had a similar problem: The fragment length estimated by Cufflinks completely disagreed with that measured by the Agilent High Sensitivity Chip. But, subtracting 119bp from the Agilent measurement, they now agree. Another discrepancy resolved, thank goodness.

                        Comment


                        • #13
                          It is helpful to read this old thread.

                          I am new to Illumina RNA-Seq data. Our lab used SOLiD/Torrent/Proton in the past but we will use Illumina platform for future RNA-Seq projects with large sample size. I am looking at an unstranded RNA-Seq data generated by another lab using TruSeq RNA sample prep kit v2 protocol to get familiar with the Illumina data. My question related to this thread is whether the combined length of adapters for all Illumina protocols including the stranded protocol is always 119bp?

                          Thanks!

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Advanced Methods for the Detection of Infectious Disease
                            by seqadmin




                            The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
                            ...
                            11-27-2023, 01:15 PM
                          • seqadmin
                            Strategies for Investigating the Microbiome
                            by seqadmin




                            Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
                            11-09-2023, 07:02 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 12-05-2023, 02:24 PM
                          0 responses
                          17 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-05-2023, 07:37 AM
                          0 responses
                          26 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-04-2023, 08:23 AM
                          0 responses
                          12 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 12-01-2023, 09:55 AM
                          0 responses
                          26 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X