Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks did not assembly a mark-gene ! Any solution?

    Hi,

    I have Hi-seq directional paired-end data. I used the standard pipeline based on their "Nature protocol" paper to assembly and compare the annotated genes. It is very strange that cufflinks did not assembly a highly-expressed mark gene (which is well-known in my cell-type). Have anyone else also met the similar problem of cufflinks previously? Is that the bug of cufflinks or the problem of processing with cufflinks / parameters?

    Many thanks!

  • #2
    Cufflinks might have a problem, if your data inheres a 5' or a 3' bias.
    Did you supplied the inner mate pair distance as parameters to Cufflinks? You can derive it from the library QC plots (Bioanalyzer, Tapestation, etc.).

    The easiest sanity check is to view your data in a genome browser (e.g. IGV) and have a look at your mark gene.

    Cheers,
    Michael

    Comment


    • #3
      Thank you Michael. Yes, IGV indicates that mark gene is highly expressed, over 4000 reads. Right now I guess it is the problem of a default configuration of cufflinks. Cufflinks/cuffdiff etc have a maximum number of fragments that can fall within a locus. If a locus has more than this maximum, it is skipped. The threshold is configurable via the --max-bundle-frags option.

      I will check if that gene will be picked up after increasing the --max-bundle-frags.

      Chan

      Comment


      • #4
        Hi Chan,

        I fear, that this is not the crucial point. Per default, Cufflinks' and Cuffdiff's parameter max-bundle-frags is set to 1,000,000 fragments per locus.

        Here are a view checks you can make to pin-point the problem:
        Compare Cufflinks' estimated inner-mat-pair distance from the log-files with the library size distribution. Denote, that you add to the "inner-mat-pair distance" the length of both reads and the adapter length.

        Compare a view highly abundant genes from Cufflinks' output with the IGV browser or the actual read count of these loci.

        Use a small subset of your data to run the Tuxedo-pipeline with only the read 1 set. And compare the mark gene's abundance.

        Use RSeQC to check your alignment for the "read coverage over gene body". It'll give you an hint for coverage biases, which might confuse Cufflinks.

        Comment


        • #5
          Thanks!

          After setting the --max-bundle-frags parameters as 10,000,000, the mark gene was assembled by cufflinks. I checked the expression abundance in IGV with big-wig files, the number ranges from 3000-4000+. That mark gene has 6000nt of CDS. That means > 1,000,000 reads mapping to that gene, so if using the default value of "--max-bundle-frags", that mark gene will be skipped by cufflinks.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM
          • seqadmin
            Strategies for Sequencing Challenging Samples
            by seqadmin


            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
            03-22-2024, 06:39 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          21 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          23 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          18 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-04-2024, 09:00 AM
          0 responses
          49 views
          0 likes
          Last Post seqadmin  
          Working...
          X