Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufmerge and the max-bundle-length

    Hi,

    I could finally run cuffmerge but I realized that me and a colleague of mine got some
    annoying skipping with human samples:

    chr21:38435145-45760353 Warning: Skipping large bundle.

    chr6:126102278-130463972 Warning: Skipping large bundle.

    Looking at the genome browser, I see a lot of genes in these regions. Do I have to assume now that cuffmerge has produced one single geneID, spanning the whole region, when in fact I have a lot of reads distributed for many smaller gene entries?

    I naively thought to fix this with increasing the --max-bundle-length by adding the option in the python script for cuffmerge:

    Code:
    def cufflinks(out_dir,
                  sam_file,
                  min_isoform_frac,
                  gtf_file=None,
                  extra_opts=["-q", "--overhang-tolerance", "200", "--library-type=transfrags",  "-A","0.0", "--min-frags-per-transfrag", "0", "--no-5-extend", "--max-bundle-length", "9925208"],
                  lsf=False,
                  curr_queue=None):
    But this resulted in this error at the level of cuffcompare:

    Error: duplicate GFF ID 'ENST00000506472' encountered!
    [FAILED]

    So I wonder if

    first it is a good idea to use the merge.gtf at all, given that you would either skip whole chromosome regions or potentially get huge merged gene entries.

    And second how I could run the script with the --max-bundle-length option ?

    Thanks,
    Marc

  • #2
    I've seen skipping like that with mouse as well. I've seen it in all of their programs: cufflinks, cuffdiff, and cuffmerge. I can't say what it means in the results though..I haven't looked into it because I don't use these programs for primary analysis.
    /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
    Salk Institute for Biological Studies, La Jolla, CA, USA */

    Comment


    • #3
      Strange, I think I have no choice but to run cuttdiff with the reference gtf for now if I want to stick with the tuxedo pipeline. Need to run it again and test if all the known reference IDs of the skipped region will be deleted as well. In that case I would need to run cuffdiff twice with a known reference and the merged one and that doesn't seem like the most elegant way.

      Comment


      • #4
        Confirmed,

        merged.gtf did not contain the GeneIDs of the skipped region. So would be nice to get an idea if cuffmerge makes sense at all for human assemblies?

        Comment


        • #5
          that's pretty odd. the whole point of cufflinks is to be able to do this kind of analysis. i know it sometimes fails on "overly complex loci" but this is a little crazy. i looked around that region you posted myself and there's plenty of genes in there that have a good bit of space on either side of them - enough to delineate them from other genes. i'd think cufflinks would be able to work in that region.

          maybe this issue will go away, or become improved, in their upcoming v1.4 release. i know it's in the works.

          my experience with cufflinks has been like this:

          1. it was released, i tried it, we didn't like the results.
          2. we did more sequencing 4 months later and i tried it again, we didn't like the results
          3. we did more sequencing a year later, i tried it again, we didn't like the results
          4. since then we have done more sequencing every 6 months and i've tried the current versions every time and have been disappointed with the output

          "not liking" the results hasn't just been because it didn't tell us what we wanted but for reasons similar to what you've posted here. it seems to fail in illogical ways and in ways that make it unusable when you're worried about missing information or misleading information. sequencing is expensive so one really wants to be sure they get the right information out a run.
          /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
          Salk Institute for Biological Studies, La Jolla, CA, USA */

          Comment


          • #6
            Yes,
            we are disappointed by the results from cuffmerge as well. Why nearly the whole chr21 gets pasted together is a mystery as nothing hints to this behaviour in the code description.
            Working with the reference GTF works fine though for all the way down to cummeRbund and it's easy to pipe it together.
            For unknown isoforms and genes however we need to look for another tool. Which software are you using for this purpose right now?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM
            • seqadmin
              Multiomics Techniques Advancing Disease Research
              by seqadmin


              New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

              A major leap in the field has
              ...
              02-08-2024, 06:33 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 02-28-2024, 06:12 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-23-2024, 04:11 PM
            0 responses
            74 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-21-2024, 08:52 AM
            0 responses
            83 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-20-2024, 08:57 AM
            0 responses
            69 views
            0 likes
            Last Post seqadmin  
            Working...
            X