Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat -> Cufflinks: BAM header too large

    I am getting this error message when running Cufflinks on a Tophat created BAM file. Tophat version 1.3.3 and Cufflinks version 1.1.0. Bowtie 0.12.7 and Samtools 0.1.18

    Tophat command:
    Code:
    /home/matthew/tophat-1.3.3/tophat -p 16 -r 195 -z pbzip2 --mate-std-dev 50 /media/hd2/tuco/bowtie.index/tuco7 \
    /media/hd2/tuco/unshuff/sgatrim/nocontam/MDM01_index1_qualshuff.nohomo.1.fq /media/hd2/tuco/unshuff/sgatrim/nocontam/MDM01_index1_qualshuff.nohomo.2.fq
    Cufflinks:
    Code:
    /home/matthew/cufflinks/cufflinks -p16 -u -o /media/hd2/tuco/tophat/406A/cuff \
    -b /media/hd2/tuco/bowtie.index/tuco.fa --upper-quartile-norm --max-mle-iterations 20000 \
    --num-importance-samples 10000 /media/hd2/tuco/tophat/406A/tophat_out/accepted_hits.bam
    Tophat finishes without error, but Cufflinks does not..

    You are using Cufflinks v1.1.0, which is the most recent release.
    Warning: BAM header too large
    File /media/hd2/tuco/tophat/406A/tophat_out/accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
    [21:18:11] Inspecting reads and determining fragment length distribution.
    SAM error on line 25873: CIGAR op has zero length
    SAM error on line 26633: CIGAR op has zero length
    ...

  • #2
    one other thing I forgot to include above:

    1st few lines of the SAM file:

    Code:
    @HD	VN:1.0	SO:coordinate
    @SQ	SN:10000084.208.674	LN:208
    @SQ	SN:1000016.233.27383	LN:233
    @SQ	SN:10000164.283.623	LN:283
    @SQ	SN:10000188.1527.11468	LN:1527

    Comment


    • #3
      Lame... I am having the same issue and it looks like no one has responded to you. Have you figured this out yourself yet? I am wondering, are you also using this on a highly fragmented de-novo assembly with a few hundred thousand contigs/scaffolds? Maybe cufflinks doesn't work when the assembly has a large number of fragments?

      Comment


      • #4
        maybe figured this out

        Hello,
        I noticed that the same general question was posted on stack exchange and didn't have an answer there either. To summarize I modified the max header length variable in hits.cpp (line 731 in v1.3.0) to the following (was 4MB)

        Code:
        static const unsigned MAX_HEADER_LEN = 6 * 1024 * 1024; // 6 MB
        After changing that, the program appears to be proceeding normally.

        To see my full previous post on this go to the stack exchange site:



        good luck!
        Last edited by jstjohn; 01-04-2012, 11:13 AM. Reason: Modified the link to point to my answer on biostar rather than the question.

        Comment


        • #5
          Problems with warning "BAM header too large" using Cufflinks2 on Linux server

          Hi jstjohn,
          I am having a similar problem to yours when trying to run Cufflinks on my TopHat accepted_hits.bams output


          Here is the output of the log file:

          Command line:
          cufflinks -o /outfile_location -p 16 -g /gtf_file_location -v --no-update-check -u -b /ref_fasta_location --max-bundle-frags 1000000000 /accepted_hits.bam_location
          Warning: BAM header too large
          File accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
          [17:10:06] Loading reference annotation.
          GFF warning: merging adjacent/overlapping segments of ENSOANT00000031404 on Contig9854 (16061-16163, 16168-16239)
          GFF warning: merging adjacent/overlapping segments of ENSOANT00000023588 on Contig9784 (7502-7816, 7821-8557)
          GFF warning: merging adjacent/overlapping segments of ENSOANT00000023588 on Contig9784 (8582-8714, 8717-8998)


          The genome I am using is very fragmented (i.e. contains 200,000 contigs on top of the Chr) and the BAM header is around 5.5 Mb. However, I read in the Cufflinks2 manual that: " The header size limit in Cufflinks' BAM parser used to have a 4 megabyte limit. This has been removed to allow Cufflinks to be used on assemblies with many contigs. "

          I have looked online for some help regarding this issue and some people have suggested changing the source code in the hits.cpp file (line 736 : static const unsigned MAX_HEADER_LEN = 4 * 1024 * 1024; // 4 MB) for Windows version, but there does not seem to be any equivalent file in the Linux version.

          Any help will be greatly appreciated.

          Comment


          • #6
            Hi,

            I have the same problem. I am using cufflinks v.2. Anyone found a solution to this?
            I don't have the skills to change the cufflinks source code unfortunately...

            Jon

            Comment


            • #7
              I guess would anyone have a Cufflinks 2 version that they compiled themselves from source code (and that is modified to allow for larger bam headers) that they would be willing to share. I would need one to run on Linux x86_64.

              Comment


              • #8
                "BAM header too large" problem/issue is caused by the genome file, which you used to make bowtie[12] index. To resolve the issue, clean up the genome file by removing all scaffold sequences that are not shown in your GTF file.

                Comment


                • #9
                  One alternative is to use the pseudochromosome to replace the fragmented scaffolds when run tophat/cufflinks/cuffdiff.
                  I don't know the possibility and whether there is influence for the following expression calculation and differential expression measurement.

                  Is it need a try?

                  Comment


                  • #10
                    I also had this problem on a shared machine where I couldn't recompile code. My transcriptome was pretty poorly assembled so filtering out low sequence reads got the header size to 4.1 MB. I was able to remove REGEX's in the fasta titles (like Genus_sp) of the headers with sed and it bumped the header size down to 3.9 MB. I was able to reheader the accepted_hits.bam file with the truncated titles and cufflinks ran it just fine...

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      The Impact of AI in Genomic Medicine
                      by seqadmin



                      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                      02-26-2024, 02:07 PM
                    • seqadmin
                      Multiomics Techniques Advancing Disease Research
                      by seqadmin


                      New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                      A major leap in the field has
                      ...
                      02-08-2024, 06:33 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 02-23-2024, 04:11 PM
                    0 responses
                    57 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-21-2024, 08:52 AM
                    0 responses
                    67 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-20-2024, 08:57 AM
                    0 responses
                    56 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 02-14-2024, 09:19 AM
                    0 responses
                    65 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X