Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffmerge error: loading reference annotation failed?

    Hi,

    I am trying to run cuffmerge for quite some time now but so far I always get the same message:

    Code:
    [Thu Apr 19 18:41:47 2012] Beginning transcriptome assembly merge
    -------------------------------------------
    
    [Thu Apr 19 18:41:47 2012] Preparing output location ./merged_asm/
    [Thu Apr 19 18:41:55 2012] Converting GTF files to SAM
    [18:41:55] Loading reference annotation.
    [18:42:01] Loading reference annotation.
    [Thu Apr 19 18:42:13 2012] Quantitating transcripts
    You are using Cufflinks v1.3.0, which is the most recent release.
    Command line:
    cufflinks -o ./merged_asm/ -F 0.05 -g Homo_sapiens.GRCh37.62.filter1.sorted.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 12 ./merged_asm/tmp/mergeSam_file54ILK4 
    [bam_header_read] EOF marker is absent. The input is probably truncated.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    File ./merged_asm/tmp/mergeSam_file54ILK4 doesn't appear to be a valid BAM file, trying SAM...
    [18:42:13] Loading reference annotation.
            [FAILED]
    Error: could not execute cufflinks
    So, reading in the forum here I applied two strategies to try to solve the error:

    I applied the change in the python script for the header_for_chrom_info function as mentioned by damiankao in another threat:
    Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


    Furthermore, I sorted the reference GTF, so that they have all the same order as the merged SAM file.

    I followed so far the tuxedo protocol given by Cole Trapnell et al., my version is:

    cufflinks-1.3.0.

    I guess that I am still missing something but I wonder if there is a script now that works for reference gtf files ( In my case Homo_sapiens.GRCh37.62.gtf where I changed the Chromosome IDs to chr1, chr 2, ... chrM, chrX, chrY.

    Here is the header of the mergeSam file:

    Code:
    @HD     VN:1.0  SO:coordinate
    @SQ     SN:chr1 LN:     249231242
    @SQ     SN:chr10        LN:     135516024
    @SQ     SN:chr11        LN:     134945793
    @SQ     SN:chr12        LN:     133815135
    @SQ     SN:chr13        LN:     115099423
    @SQ     SN:chr14        LN:     107288019
    @SQ     SN:chr15        LN:     102519298
    @SQ     SN:chr16        LN:     90237391
    @SQ     SN:chr17        LN:     81188573
    @SQ     SN:chr18        LN:     78005429
    @SQ     SN:chr19        LN:     59111168
    @SQ     SN:chr2 LN:     243102304
    @SQ     SN:chr20        LN:     62959443
    @SQ     SN:chr21        LN:     48111157
    @SQ     SN:chr22        LN:     51239737
    @SQ     SN:chr3 LN:     197955247
    @SQ     SN:chr4 LN:     191013476
    @SQ     SN:chr5 LN:     180899431
    @SQ     SN:chr6 LN:     171055065
    @SQ     SN:chr7 LN:     159026067
    @SQ     SN:chr8 LN:     146281416
    @SQ     SN:chr9 LN:     141150148
    @SQ     SN:chrM LN:     16023
    @SQ     SN:chrX LN:     155257848
    @SQ     SN:chrY LN:     59002251
    @PG     ID:cuffmerge    VN:1.0.0
    So, I wonder if it is possible, given a bowtie index with chr1, chr2. etc. entries instead of redoing the bowtie index with the ensemble identifiers. Would be great if someone would know the missing link

    Marc

  • #2
    no luck with cuffdiff either

    So, when using cuffdiff and say :

    cuffdiff -N -o diff_out -p 1 -L C1,C2 -u Homo_sapiens.GRCh37.62.gtf ~/s_1_sequence_gtf_tophat_accepted_hits.bam ~/s_2_sequence_gtf_tophat_accepted_hits.bam

    I got the error:
    [11:29:30] Loading reference annotation.
    /opt/sge-6_2u4/default/spool/sunnode30/job_scripts/86547: line 12: 16032 Segmentation fault (core dumped) cuffdiff -N -o diff_out -p $NSLOTS -L C1,C2 -u Homo_sapiens.GRCh37.62.filter1.sorted3.gtf ~/s_1_sequence_gtf_tophat_accepted_hits.bam ~/s_2_sequence_gtf_tophat_accepted_hits.bam

    For some reasons, the problem always points down to my reference. Funny enough, when I leave out chrX chrY and chrM, it is running beyond the Loading reference step and probably cuffmerge would as well.
    Could anyone provide me a reference for human, which worked, given that your bowtie index is:
    bowtie-inspect hg19 --names
    chr1
    chr2
    chr3
    chr4
    chr5
    chr6
    chr7
    chr8
    chr9
    chr10
    chr11
    chr12
    chr13
    chr14
    chr15
    chr16
    chr17
    chr18
    chr19
    chr20
    chr21
    chr22
    chrX
    chrY
    chrM

    Marc

    Comment


    • #3
      ok, the cuttdiff part seems to run now for 2 sets, one having a gtf with only chrX chrY and chrM. The sorting with one gtf file was done according to the header file I got for cuffmerge but is not running? So what could go wrong...

      Given the ensembl reference, I first substitute the chromosome IDs so that they correspond to the bowtie-index. Then I sort out each chromosome:

      Code:
      for i in 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 3 4 5 6 7 8 9 M X Y; 
      do 
      grep "^chr$i[[:blank:]]" Homo_sapiens.GRCh37.62.filter1.sorted.gtf > chr${i}_Homo_sapiens.GRCh37.62.filter3.sorted.gtf;
      done
      and then I combine in the "right " order with cat.

      I am not sure yet how this could result into a corrupted gtf file.

      Comment


      • #4
        Seems like a very strange error. I've never had to resort GTF files in order to get them to work with Cufflinks/merge/diff. I'm surprised the GTF files is giving you this kind of trouble.

        I almost always use GTF files from UCSC that I edit in the official gene names to make the DE output more readable. Aside from swapping in gene names I don't change anything. I'm also working with bowtie references I built from FASTA files downloaded from UCSC.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • #5
          Running now with the newer Homo_sapiens.GRCh37.66.gtf and it is fine so far.
          I guess my old reference file was somehow corrupted.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Exploring the Dynamics of the Tumor Microenvironment
            by seqadmin




            The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
            07-08-2024, 03:19 PM
          • seqadmin
            Exploring Human Diversity Through Large-Scale Omics
            by seqadmin


            In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
            06-25-2024, 06:43 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 07-19-2024, 07:20 AM
          0 responses
          28 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-16-2024, 05:49 AM
          0 responses
          41 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-15-2024, 06:53 AM
          0 responses
          46 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 07-10-2024, 07:30 AM
          0 responses
          43 views
          0 likes
          Last Post seqadmin  
          Working...
          X