Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffmerge Error

    I am getting the following error when trying to merge several files produced by cufflinks using output from tophat. I'm using the same reference in both cases and do not know why it would be giving me this error.

    Error (GFaSeqGet): end coordinate (121191482) cannot be larger than sequence length 121191424
    Error (GFaSeqGet): end coordinate (121191482) cannot be larger than sequence length 121191424
    Error (GFaSeqGet): subsequence cannot be larger than 16338
    Error getting subseq for CUFF.24532.1 (1..16348)!
    Error: could not execute cuffcompare
    Traceback (most recent call last):
    File "/shared/local/cufflinks/cuffmerge", line 573, in ?
    File "/shared/local/cufflinks/cuffmerge", line 556, in main
    compare_meta_asm_against_ref(params.ref_gtf, params.fasta, output_dir+"/transcripts.gtf")
    File "/shared/local/cufflinks/cuffmerge", line 406, in compare_meta_asm_against_ref
    tmap = compare_to_reference(gtf_input_file, ref_gtf, fasta_file)
    File "/shared/local/cufflinks/cuffmerge", line 342, in compare_to_reference
    TypeError: 'str' object is not callable

    If anyone knows why this is happening or how to circumvent it, that would be great.

  • #2
    Hi ercfrtz,
    Were you able to figure out what was causing your cuffcompare error message? I am getting the same message. I have about 15 samples I am running this for but I am getting this error for one of the samples.

    Please let me know if you were able to figure out what was the problem.



    • #3
      sorry for bumbing the thread, but I get the same error as well. Has anyone found the cause for that error yet?


      • #4
        strange: I'm getting a similar error-- never seen it before.

        I'm running Bowtie2 > samtools view | sort > samtools merge > cufflinks

        matthew@macmanes:/media/hd/working/tuco/social.cuff$ cufflinks -p8 -m320 -u -o /media/hd/working/tuco/social.cuff -L social \
        > -b /media/hd/working/tuco/tuco29dec11.fa --upper-quartile-norm --max-mle-iterations 20000 \
        > /media/hd/working/tuco/b2.bams/all/social.bam
        You are using Cufflinks v1.3.0, which is the most recent release.
        [07:43:18] Inspecting reads and determining fragment length distribution.
        > Processed 154768 loci.                       [*************************] 100%
        > Map Properties:
        >	Upper Quartile: 241.00
        >	Number of Multi-Reads: 0 (with 0 total hits)
        >	Fragment Length Distribution: Truncated Gaussian (user-specified)
        >	              Default Mean: 320
        >	           Default Std Dev: 80
        [08:10:53] Assembling transcripts and initializing abundances for multi-read correction.
        > Processed 154768 loci.                       [*************************] 100%
        [08:48:16] Loading reference annotation and sequence.
        Error (GFaSeqGet): subsequence cannot be larger than 384
        Error getting subseq for social.2.1 (1..385)!


        • #5
          for me, at least, removing the -b <in>.fasta 'solves' the problem. I'd really like to use the -b option however.

          This is the same fasta file that was used in mapping--for building the bowtie index..


          • #6
            I am getting the same error with or without the -b option in cufflinks..
            I mapped the reads with hg19.fa UCSC using samtools.
            Then removed duplicate using picard...and now I again sorted and indexed the data based using samtools.
            Finally I used cufflinks..1st part works only without -b option, then I tried cuffmerge and it fails with :
            Error (GFaSeqGet): subsequence cannot be larger than 16571
            Error getting subseq for CUFF.42374.1 (2..16614)!

            Any help is appreciated....


            • #7
              I got the same cuffmerge error too.

              I mapped reads to genome with tophat 2.0.6, then assemble transcripts with cufflinks 2.0.2. All the above steps were successful.

              however, when i tried to merge transcript.gtf files from all my samples with cuffmerge 2.0.2, it failed with error messages:

              Error (GFaSeqGet): subsequence cannot be larger than 100
              Error getting subseq for CUFF.63509.1 (1..103)!
              Error: could not execute cuffcompare

              Strangely, the CUFF.63509.1 transcript locates at chromosome 8, which is way longer than 100 bp (148491826 bp)..

              8 Cufflinks transcript 58753100 58756101 1000 - . gene_id "CUFF.63509"; transcript_id "CUFF.63509.1"; FPKM "0.3200324464"; frac "0.180108"; conf_lo "0.246484"; conf_hi "0.393581"; cov "5.392457";
              8 Cufflinks exon 58753100 58756101 1000 - . gene_id "CUFF.63509"; transcript_id "CUFF.63509.1"; exon_number "1"; FPKM "0.3200324464"; frac "0.180108"; conf_lo "0.246484"; conf_hi "0.393581"; cov "5.392457";

              chromosome 8 info:

              >8 dna:chromosome chromosome:Sscrofa10.2:8:1:148491826:1 REF

              Did anyone have an solution to this problem? Any help is appreciated. Thanks.
              Last edited by johnwu; 02-28-2013, 03:49 PM.


              • #8

                Just to add weight to this - I got the same cuffmerge error too. I mapped my reads back to the my reference as usual - but now I get this error.

                Has anyone found a solution yet?



                • #9
                  I am guessing no one has found a solution? I also have the same problem...


                  • #10
                    I found this post on biostar if it helps anyone. I think the problem might be, for me at least, is that I aligned to my RNA-seq libraires to a different fasta file than what I am passing into cufflinks


                    • #11
                      I found that for some reason cufflinks would assemble some frags/transcripts/contigs that are longer than chromosome length.

                      After removing/modifying those records from transcript.gtf generated by cufflinks, cuffmerge could proceed without any problem.

                      Here's an example from my project:

                      chromosome/scaffold/contig name : GL893313.2
                      chromosome/scaffold/contig length : 161573
                      exon coordinate: 161578 ( > chromosome length )

                      GL893313.2 Cufflinks exon 161457 161578 1000 + . gene_id "CUFF.77262"; transcript_id "CUFF.77262.1"; exon_number "3"; FPKM "1.1077759277"; frac "1.000000"; conf_lo "1.008891"; conf_hi "1.206661"; cov "18.665712";


                      GL893313.2 Cufflinks exon 161457 161573 1000 + . gene_id "CUFF.77262"; transcript_id "CUFF.77262.1"; exon_number "3"; FPKM "1.1077759277"; frac "1.000000"; conf_lo "1.008891"; conf_hi "1.206661"; cov "18.665712";

                      In my case, it seems that cufflinks only generated longer frags/contigs when processing assembly on genome sequence contig (not chromosome).


                      • #12

                        So I have been troubleshooting my problem with Geo Pertea and basically we found the problem was arising from the fact that CLC (which I mapped my reads with) only soft clipped reads when they mapped past the end of the reference contig.

                        Take for example this (partial) SAM record:

                        502_1735_1931_F3 16 scaffold_10212 558 0 36S39M [etc.]

                        CLC aligned only 39 bases of this read to the end of this short contig (596 bases), the rest of 36 nt of the read are hanging beyond the contig boundary and are thus reported soft clipped (which makes sense). Unfortunately it looks like Cufflinks didn't exclude the soft clipped part from further consideration when determining the boundaries of the transfrag. The Tuxedo pipeline (specifically TopHat) does not normally deal with soft clipped alignments so I guess that's why we didn't get to test and make Cufflinks work properly with such alignments.


                        • #13
                          Courtesy of Alex Dobin, this might be useful to those dealing with this problem.


                          Latest Articles


                          • seqadmin
                            Best Practices for Single-Cell Sequencing Analysis
                            by seqadmin

                            While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                            06-06-2024, 07:15 AM
                          • seqadmin
                            Latest Developments in Precision Medicine
                            by seqadmin

                            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                            Somatic Genomics
                            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                            05-24-2024, 01:16 PM





                          Topics Statistics Last Post
                          Started by seqadmin, Today, 06:54 AM
                          0 responses
                          Last Post seqadmin  
                          Started by seqadmin, 06-14-2024, 07:24 AM
                          0 responses
                          Last Post seqadmin  
                          Started by seqadmin, 06-13-2024, 08:58 AM
                          0 responses
                          Last Post seqadmin  
                          Started by seqadmin, 06-12-2024, 02:20 PM
                          0 responses
                          Last Post seqadmin