Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks GTF files incompatible with cuffmerge?

    Dear user community,

    I'm getting the error message pasted below when running cuffcompare on a set of 15 GTF files created by cufflinks. The latter part of the message seems to indicate that the cuffmerge error is due to incorrect formatting in the SAM files that cuffmerge is making from the Cufflinks GTF files. The cufflinks user manual, does not state that any modifications/sorting need to be made to the Cufflinks output (transcripts.gtf) file prior to use with cuffmerge. Can anyone provide any insight into what might be triggering the error?

    Thanks for your help,

    Shurjo

    The cuffmerge options I used:

    Code:
     
    #!/bin/bash
    ~/bin/cuffmerge -p 4 -s /home/sensh/bin/bowtie-0.12.7/indexes/hg18_inclusive.fa riboF.gtf.manifest.txt
    The error message:
    Code:
    sensh@kirk cuffmerge]$ less sh.cuffmerge.e767326
    [Mon May 30 20:21:16 2011] Beginning transcriptome assembly merge
    -------------------------------------------
    [Mon May 30 20:21:16 2011] Preparing output location ./merged_asm/
    Warning: no reference GTF provided!
    [Mon May 30 20:21:52 2011] Converting GTF files to SAM
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:21:52] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:21:56] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:00] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:04] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:08] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:12] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:16] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:20] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:24] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:28] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:31] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:35] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:39] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:43] Loading reference annotation.
    gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
    [20:22:47] Loading reference annotation.
    [Mon May 30 20:22:55 2011] Assembling transcripts
    cufflinks: /usr/lib64/libz.so.1: no version information available (required by cufflinks)
    Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
    [bam_header_read] EOF marker is absent.
    File ./merged_asm/tmp/mergeSam_fileiIWPXL doesn't appear to be a valid BAM file, trying SAM...
    [20:23:16] Inspecting reads and determining fragment length distribution.
    Error: this SAM file doesn't appear to be correctly sorted!
            current hit is at chr11:116987, last one was at chr10:135333670
    Cufflinks requires that if your file has SQ records in
    the SAM header that they appear in the same order as the chromosomes names
    in the alignments.
    If there are no SQ records in the header, or if the header is missing,
    the alignments must be sorted lexicographically by chromsome
    name and by position.
            [FAILED]
    Error: could not execute cufflinks

  • #2
    improperly sorted SAM !

    you have to revise the merge script manually. Please give the sam file a head.
    for version cufflinks_1.0.2, the line 544, you can revise like that:
    header = '''@SQ SN:chr1....
    '''


    I hope this will help you!

    Comment


    • #3
      Hi fgh1124,

      Many thanks for your advice. I am not much of an expert in Python, so it would be great if you could provide a bit more guidance on the edit. If I understand you correctly, this is the part of the cuffmerge script that needs editing (around line 544):
      Code:
       # Merge the primary assembly SAMs into a single input SAM file
                  merged_sam_filename = merge_sam_inputs(sam_input_files, header)
                  # Run cufflinks on the primary assembly transfrags to generate a meta-assembly
                  cufflinks(output_dir, merged_sam_filename, params.min_isoform_frac, params.ref_gtf)
                  compare_meta_asm_against_ref(params.ref_gtf, params.fasta, output_dir+"/transcripts.gtf")
      What you are suggesting is to define "header" with the correct string for the @SQ part of the SAM header, right? If so, can you reproduce the relevant part from your "fixed" cuffmerge script here?

      Once again, thanks for your assistance,

      Shurjo

      Comment


      • #4
        Hi there,

        I get the same error as Shurjo, and am not able to fix it. Please can someone give more elaborate instructions about what to change the source to? Do I also have to give the gtf_to_sam files that are now in the tmp directory headers manually? Sorry, a bit clueless at the moment.

        Thanks,

        Anelda

        Comment


        • #5
          The cuffmerge script appends a header after it merges all the gtf-to-sam converted files. The header is not sorted so it gives you that error.

          Here is what I did to fix it:

          In the def header_for_chrom_info(chrom_info) function in the cuffmerge script, take out:
          Code:
              for chrom, limits in chrom_info.iteritems():
                  line = "@SQ\tSN:%s\tLN:\t%d" % (chrom, limits[1])
                  header_strs.append(line)
          replace with:
          Code:
              chromSort = chrom_info.keys()
              chromSort.sort()
          
              for key in chromSort:
                  chrom = key
                  limits = chrom_info[key]
          	line = "@SQ\tSN:%s\tLN:\t%d" % (chrom, limits[1])
                  header_strs.append(line)
          Make sure the indentation spacing is correct. Python delimits code with spacing.

          Comment


          • #6
            Hi damiankao,

            Thanks for this. Unfortunately, I still get the same error. I'm wondering if it's got something to do with my GTF reference file? My reference GTF file is not sorted according to chromosome.

            The header in the merged file from cuffmerge/tmp looks like this:

            @HD VN:1.0 SO:coordinate
            @SQ SN:Mt LN: 565665
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:Pt LN: 140361
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:UNKNOWN LN: 7035778
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:chr1 LN: 301330722
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:chr10 LN: 150112509
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:chr2 LN: 237039364
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:chr3 LN: 232096453
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:chr4 LN: 241455206
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:chr5 LN: 217850392
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:chr6 LN: 169128727
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:chr7 LN: 176764628
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:chr8 LN: 175787611
            @PG ID:cuffmerge VN:1.0.0

            @SQ SN:chr9 LN: 156743307
            @PG ID:cuffmerge VN:1.0.0


            The first chromosome after the header is chr1 and not Mt. The order of the rest of the SAM file is chr1, chr10, chr2, etc, followed by Mt, Pt, Unknown. The error message comes when the software reads chr2 after chr10.

            This may be obvious, but I just can't get it to work and has been on it trying all kinds of hacks for the last 24 hours.

            Thanks again!

            [Edit] I changed the headers to follow the order of the chromosomes in the mapping part of the SAM file, i.e. 1,10,2-9,Mt,Pt,Unknown, but still same problem.

            [Edit] Incorrect Python indentation (when I altered the source) caused the header to be different from damiankao's. I've fixed the indentation
            Last edited by Anelda; 06-01-2011, 12:34 AM. Reason: Rerun some analysis

            Comment


            • #7
              Is that exactly how your headers look? My headers look something like this:

              @HD VN:1.0 SO:coordinate
              @SQ SN:scaff000001 LN: 1129481
              @SQ SN:scaff000002 LN: 9211615
              @SQ SN:scaff000003 LN: 8085218
              @SQ SN:scaff000004 LN: 5217226
              @PG ID:cuffmerge VN:1.0.0

              There is no @PG line after every @SQ line.

              Are you sure those MT,Pt, Unknown reference contigs are correctly formatted in the gtf file?

              Comment


              • #8
                That's copying and pasting from the file - exactly.
                BTW I got everything running after (1) changing the code as you suggested, (2) using a gtf reference file that was sorted in the same way as the merged file, (3) changing the merged file's header to have the same order as the sequences below the header (even though the @PG still appears everytime after the @SQ.

                Now I have a new problem. The largest FPKM I get ==1 and most FPKM ==0 in the new transcripts.gtf file??

                Comment


                • #9
                  I dont't think cuffmerge will merge your fpkms. Its really just for merging the annotations.

                  Comment


                  • #10
                    I'm a bit surprised when I look at the run.log. cuffmerge seems to be running cufflinks after sorting the merged file.

                    "cufflinks -o R.last.try/ -F 0.05 -g /home/anelda/working_dir/ZmB73_5a_WGS.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 -p 7 R.last.try/tmp/mergeSam_file0fzuAM"

                    I was under the impression that cuffmerge "also handles running cuffcompare for you" but I don't see cuffcompare being called. (If I understand the workflow correctly?)

                    My FPKMs has definitely changed from the original files. Do you know what cuffmerge will do with FPKM if an entry was found in all 3 files in the manifest for a single gene? How does it handle the 3 different FPKMs for the different samples? I may not understand the way Cuffmerge work?

                    Thanks.

                    [Edit] I just realized that because I change the merge file in /tmp manually halfway through and then run the commands from the log, I actually never run cuffmerge like it should be run. Now changing cuffmerge source to do my header sort in the same way as what the reads appear in the rest of the file. I know this is a horrible way to fix my problem. Please let me know if someone has better ideas?
                    Last edited by Anelda; 05-31-2011, 10:31 PM.

                    Comment


                    • #11
                      Hi damiankao,

                      Your fix worked for me as well. The ending of the STDERR message is a bit abrupt, though it does not hint at any fatal exceptions. I have pasted the output below. Would you be able to tell me if this is normal? Also, if it is not too hard, could you explain the changes in the code that are made by your fix? I'm not up to speed with Python yet.

                      Thanks again,

                      Shurjo

                      Code:
                      [sensh@kirk cuffmerge]$ less sh.cuffmerge.e767629
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:09] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:13] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:17] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:21] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:25] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:29] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:32] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:36] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:40] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:44] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:48] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:52] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:55] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:06:59] Loading reference annotation.
                      gtf_to_sam: /usr/lib64/libz.so.1: no version information available (required by gtf_to_sam)
                      [11:07:03] Loading reference annotation.
                      [Tue May 31 11:07:10 2011] Quantitating transcripts
                      cufflinks: /usr/lib64/libz.so.1: no version information available (required by cufflinks)
                      Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
                      [bam_header_read] EOF marker is absent.
                      File ./merged_asm/tmp/mergeSam_fileFf8Txm doesn't appear to be a valid BAM file, trying SAM...
                      [11:07:31] Loading reference annotation.
                      [11:07:35] Inspecting reads and determining fragment length distribution.
                      Processed 19880 loci.
                      Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
                      > Map Properties:
                      >       Total Map Mass: 528465.00
                      >       Read Type: 70637bp single-end
                      >       Fragment Length Distribution: Truncated Gaussian (default)
                      >                     Default Mean: 200
                      >                  Default Std Dev: 80
                      [11:07:49] Assembling transcripts and estimating abundances.
                      Processed 19880 loci.
                      [Tue May 31 11:14:00 2011] Comparing against reference file RefSeqHG18_gene_level.gtf
                      Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
                      Cuffcompare prefix for output files: tmp_meta_asm
                      [Tue May 31 11:15:08 2011] Comparing against reference file RefSeqHG18_gene_level.gtf
                      Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).
                      Cuffcompare prefix for output files: tmp_meta_asm

                      Comment


                      • #12
                        Hi Shurjo,

                        I am by no means an expert on cufflinks or how it works so I can't tell you if that's what you are supposed to get. But I can say the log I got looked similar to yours. Did you get transcripts.gtf and merge.gtf in the output directory? If you did, I think that means it worked.

                        And about the fix I made. The cuffmerge code seems to prepare the gtf files by converting each of your gtf files into sam and then merging all the sams together into one file for further processing.

                        At the merging step, it concats the .sam files and then sorts it. It then appends a header that it previously hashed from all the gtf file's reference contig columns. Then it iterates through each header in the hash and outputs a @SQ line. But hashes aren't sorted in python.

                        So to fix that, I sorted the hash by its keys and then output the @SQ lines by the sorted keys so the @SQ headers are sorted in the same order as the sam mapping entries.

                        Comment


                        • #13
                          Great, thanks!

                          Comment


                          • #14
                            For me the error was due to the collation sequence in linux sort seemed to be different from python's sort - set the environment variable LC_ALL to C fixed the issue.
                            e.g. LC_ALL=C;export LC_ALL

                            Comment


                            • #15
                              I am running cuffmerge to combine some of samples GTF files. And error is showing as follow. Would you please tell me what I should do to fix the problem? Thank you so much!


                              $ /usr/local/cufflinks-1.0.3/bin/cuffmerge -s /reference/hg19.fasta assembly_GTF_list.txt.txt

                              [Fri Jul 1 11:33:03 2011] Beginning transcriptome assembly merge
                              -------------------------------------------

                              [Fri Jul 1 11:33:03 2011] Preparing output location ./merged_asm/
                              Warning: no reference GTF provided!
                              [Fri Jul 1 11:33:03 2011] Converting GTF files to SAM
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [11:33:03] Loading reference annotation.
                              [Fri Jul 1 11:33:03 2011] Assembling transcripts
                              cufflinks: /usr/lib64/libz.so.1: no version information available (required by cufflinks)
                              [bam_header_read] EOF marker is absent.
                              File ./merged_asm/tmp/mergeSam_fileaw7mto doesn't appear to be a valid BAM file, trying SAM...
                              [11:33:03] Inspecting reads and determining fragment length distribution.
                              Processed 121 loci. > Map Properties:
                              > Total Map Mass: 968.00
                              > Read Type: 173bp single-end
                              > Fragment Length Distribution: Gaussian (default)
                              > Estimated Mean: 246.97
                              > Estimated Std Dev: 52.49
                              [11:33:03] Assembling transcripts and estimating abundances.
                              Processed 121 loci. [Fri Jul 1 11:33:05 2011] Comparing against reference file None
                              Traceback (most recent call last):
                              File "/usr/local/cufflinks/bin/cuffmerge", line 573, in ?
                              sys.exit(main())
                              File "/usr/local/cufflinks/bin/cuffmerge", line 556, in main
                              compare_meta_asm_against_ref(params.ref_gtf, params.fasta, output_dir+"/transcripts.gtf")
                              File "/usr/local/cufflinks/bin/cuffmerge", line 410, in compare_meta_asm_against_ref
                              f_tmap = open(tmap)
                              IOError: [Errno 2] No such file or directory: './merged_asm/tmp_meta_asm.transcripts.gtf.tmap'

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Non-Coding RNA Research and Technologies
                                by seqadmin




                                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                Nobel Prize for MicroRNA Discovery
                                This week,...
                                10-07-2024, 08:07 AM
                              • seqadmin
                                Recent Developments in Metagenomics
                                by seqadmin





                                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                09-23-2024, 06:35 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 10-02-2024, 04:51 AM
                              0 responses
                              101 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-01-2024, 07:10 AM
                              0 responses
                              110 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-30-2024, 08:33 AM
                              1 response
                              114 views
                              0 likes
                              Last Post EmiTom
                              by EmiTom
                               
                              Started by seqadmin, 09-26-2024, 12:57 PM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X