Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ercfrtz
    Member
    • Aug 2010
    • 23

    Cuffmerge Error

    I am getting the following error when trying to merge several files produced by cufflinks using output from tophat. I'm using the same reference in both cases and do not know why it would be giving me this error.


    Error (GFaSeqGet): end coordinate (121191482) cannot be larger than sequence length 121191424
    Error (GFaSeqGet): end coordinate (121191482) cannot be larger than sequence length 121191424
    Error (GFaSeqGet): subsequence cannot be larger than 16338
    Error getting subseq for CUFF.24532.1 (1..16348)!
    [FAILED]
    Error: could not execute cuffcompare
    Traceback (most recent call last):
    File "/shared/local/cufflinks/cuffmerge", line 573, in ?
    sys.exit(main())
    File "/shared/local/cufflinks/cuffmerge", line 556, in main
    compare_meta_asm_against_ref(params.ref_gtf, params.fasta, output_dir+"/transcripts.gtf")
    File "/shared/local/cufflinks/cuffmerge", line 406, in compare_meta_asm_against_ref
    tmap = compare_to_reference(gtf_input_file, ref_gtf, fasta_file)
    File "/shared/local/cufflinks/cuffmerge", line 342, in compare_to_reference
    exit(1)
    TypeError: 'str' object is not callable

    If anyone knows why this is happening or how to circumvent it, that would be great.
  • arodrigu1
    Junior Member
    • Nov 2009
    • 1

    #2
    Hi ercfrtz,
    Were you able to figure out what was causing your cuffcompare error message? I am getting the same message. I have about 15 samples I am running this for but I am getting this error for one of the samples.

    Please let me know if you were able to figure out what was the problem.

    Thanks!

    Comment

    • lukas1848
      Member
      • Jun 2011
      • 54

      #3
      sorry for bumbing the thread, but I get the same error as well. Has anyone found the cause for that error yet?

      Comment

      • peromhc
        Senior Member
        • Sep 2009
        • 108

        #4
        strange: I'm getting a similar error-- never seen it before.

        I'm running Bowtie2 > samtools view | sort > samtools merge > cufflinks



        Code:
        matthew@macmanes:/media/hd/working/tuco/social.cuff$ cufflinks -p8 -m320 -u -o /media/hd/working/tuco/social.cuff -L social \
        > -b /media/hd/working/tuco/tuco29dec11.fa --upper-quartile-norm --max-mle-iterations 20000 \
        > /media/hd/working/tuco/b2.bams/all/social.bam
        You are using Cufflinks v1.3.0, which is the most recent release.
        [07:43:18] Inspecting reads and determining fragment length distribution.
        > Processed 154768 loci.                       [*************************] 100%
        > Map Properties:
        >	Upper Quartile: 241.00
        >	Number of Multi-Reads: 0 (with 0 total hits)
        >	Fragment Length Distribution: Truncated Gaussian (user-specified)
        >	              Default Mean: 320
        >	           Default Std Dev: 80
        [08:10:53] Assembling transcripts and initializing abundances for multi-read correction.
        > Processed 154768 loci.                       [*************************] 100%
        [08:48:16] Loading reference annotation and sequence.
        Error (GFaSeqGet): subsequence cannot be larger than 384
        Error getting subseq for social.2.1 (1..385)!

        Comment

        • peromhc
          Senior Member
          • Sep 2009
          • 108

          #5
          for me, at least, removing the -b <in>.fasta 'solves' the problem. I'd really like to use the -b option however.

          This is the same fasta file that was used in mapping--for building the bowtie index..

          Comment

          • rpauly
            Member
            • Apr 2011
            • 32

            #6
            I am getting the same error with or without the -b option in cufflinks..
            I mapped the reads with hg19.fa UCSC using samtools.
            Then removed duplicate using picard...and now I again sorted and indexed the data based using samtools.
            Finally I used cufflinks..1st part works only without -b option, then I tried cuffmerge and it fails with :
            Error (GFaSeqGet): subsequence cannot be larger than 16571
            Error getting subseq for CUFF.42374.1 (2..16614)!

            Any help is appreciated....

            Comment

            • johnwu
              Junior Member
              • Jun 2011
              • 5

              #7
              I got the same cuffmerge error too.

              I mapped reads to genome with tophat 2.0.6, then assemble transcripts with cufflinks 2.0.2. All the above steps were successful.

              however, when i tried to merge transcript.gtf files from all my samples with cuffmerge 2.0.2, it failed with error messages:

              Error (GFaSeqGet): subsequence cannot be larger than 100
              Error getting subseq for CUFF.63509.1 (1..103)!
              [FAILED]
              Error: could not execute cuffcompare

              Strangely, the CUFF.63509.1 transcript locates at chromosome 8, which is way longer than 100 bp (148491826 bp)..


              8 Cufflinks transcript 58753100 58756101 1000 - . gene_id "CUFF.63509"; transcript_id "CUFF.63509.1"; FPKM "0.3200324464"; frac "0.180108"; conf_lo "0.246484"; conf_hi "0.393581"; cov "5.392457";
              8 Cufflinks exon 58753100 58756101 1000 - . gene_id "CUFF.63509"; transcript_id "CUFF.63509.1"; exon_number "1"; FPKM "0.3200324464"; frac "0.180108"; conf_lo "0.246484"; conf_hi "0.393581"; cov "5.392457";


              chromosome 8 info:

              >8 dna:chromosome chromosome:Sscrofa10.2:8:1:148491826:1 REF

              Did anyone have an solution to this problem? Any help is appreciated. Thanks.
              Last edited by johnwu; 02-28-2013, 03:49 PM.

              Comment

              • DJParker
                Junior Member
                • Jan 2012
                • 7

                #8
                Hello

                Just to add weight to this - I got the same cuffmerge error too. I mapped my reads back to the my reference as usual - but now I get this error.

                Has anyone found a solution yet?

                Darren

                Comment

                • fongchun
                  Member
                  • May 2011
                  • 55

                  #9
                  I am guessing no one has found a solution? I also have the same problem...

                  Comment

                  • fongchun
                    Member
                    • May 2011
                    • 55

                    #10
                    I found this post on biostar if it helps anyone. I think the problem might be, for me at least, is that I aligned to my RNA-seq libraires to a different fasta file than what I am passing into cufflinks

                    Comment

                    • johnwu
                      Junior Member
                      • Jun 2011
                      • 5

                      #11
                      I found that for some reason cufflinks would assemble some frags/transcripts/contigs that are longer than chromosome length.

                      After removing/modifying those records from transcript.gtf generated by cufflinks, cuffmerge could proceed without any problem.

                      Here's an example from my project:

                      chromosome/scaffold/contig name : GL893313.2
                      chromosome/scaffold/contig length : 161573
                      exon coordinate: 161578 ( > chromosome length )

                      GL893313.2 Cufflinks exon 161457 161578 1000 + . gene_id "CUFF.77262"; transcript_id "CUFF.77262.1"; exon_number "3"; FPKM "1.1077759277"; frac "1.000000"; conf_lo "1.008891"; conf_hi "1.206661"; cov "18.665712";


                      CORRECTED:

                      GL893313.2 Cufflinks exon 161457 161573 1000 + . gene_id "CUFF.77262"; transcript_id "CUFF.77262.1"; exon_number "3"; FPKM "1.1077759277"; frac "1.000000"; conf_lo "1.008891"; conf_hi "1.206661"; cov "18.665712";

                      In my case, it seems that cufflinks only generated longer frags/contigs when processing assembly on genome sequence contig (not chromosome).

                      Comment

                      • DJParker
                        Junior Member
                        • Jan 2012
                        • 7

                        #12
                        Hello,

                        So I have been troubleshooting my problem with Geo Pertea and basically we found the problem was arising from the fact that CLC (which I mapped my reads with) only soft clipped reads when they mapped past the end of the reference contig.

                        Take for example this (partial) SAM record:

                        502_1735_1931_F3 16 scaffold_10212 558 0 36S39M [etc.]

                        CLC aligned only 39 bases of this read to the end of this short contig (596 bases), the rest of 36 nt of the read are hanging beyond the contig boundary and are thus reported soft clipped (which makes sense). Unfortunately it looks like Cufflinks didn't exclude the soft clipped part from further consideration when determining the boundaries of the transfrag. The Tuxedo pipeline (specifically TopHat) does not normally deal with soft clipped alignments so I guess that's why we didn't get to test and make Cufflinks work properly with such alignments.

                        Comment

                        • seqing.help
                          Junior Member
                          • Aug 2012
                          • 1

                          #13
                          Courtesy of Alex Dobin, this might be useful to those dealing with this problem.

                          Comment

                          Latest Articles

                          Collapse

                          • SEQadmin2
                            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                            by SEQadmin2


                            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                            ...
                            06-02-2026, 10:05 AM
                          • SEQadmin2
                            Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                            by SEQadmin2


                            With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                            Introduction

                            Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                            05-22-2026, 06:42 AM
                          • SEQadmin2
                            Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                            by SEQadmin2

                            Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                            Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                            05-06-2026, 09:04 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, Today, 08:59 AM
                          0 responses
                          10 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 12:03 PM
                          0 responses
                          21 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 11:40 AM
                          0 responses
                          17 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 05-28-2026, 11:40 AM
                          0 responses
                          31 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...