Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks, BAM header problem solved... for the moment

    Hi All,

    We were getting this error running cufflinks with a bam file

    [ross@bioinfo tophat_test]$ cufflinks accepted_hits.bam
    You are using Cufflinks v1.1.0, which is the most recent release.
    Warning: BAM header too large
    File accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
    [14:16:27] Inspecting reads and determining fragment length distribution.
    SAM error on line 1678: CIGAR op has zero length
    SAM error on line 2017: CIGAR op has zero length
    SAM error on line 2025: CIGAR op has zero length
    ...............
    ..........
    ....

    I have edited the hits.cpp file (line 590) in the source files from:

    static const unsigned MAX_HEADER_LEN = 4 * 1024 * 1024; // 4 MB
    to
    static const unsigned MAX_HEADER_LEN = 7 * 1024 * 1024; // 7 MB

    and ran "make" again. It seems to have fixed the problem.

    Would appreciate if anyone has a link to the BAM file format. This edit may cause a problem later on. Had a bit of a hunt around, but can't see it.

    Thanks

    Ross

  • #2
    A description of the SAM/BAM file format can be found here



    The BAM details start at section 3.

    Having a BAM header larger than 4 MB seems a bit odd, but I guess not impossible, especially if you have a lot of reference sequences. You could do a "samtools view -H", where -H means output header only, to see if the header is indeed larger than 4 MB.

    Justin

    Comment


    • #3
      Thanks Justin,
      I will take a look.

      Ross

      Comment


      • #4
        I was able to get the same type of warning messages as you (Cufflinks complains that the BAM file does not appear to be in the correct format and there are warnings about invalid or 0 length cigar operations).

        When I do a sam reheader, the warning messages go away and Cufflinks seems to operate just fine (I just pull off the header of the BAM file and give it back to itself). I know there is some redundancy with how BAM files store sequence name and length, so maybe there is something going on there.

        Justin

        Comment


        • #5
          Hi,
          I am having a similar problem. I run tophat/bowtie on my reads to generate an accepted_hits.bam file.

          When I try to run that file through cufflinks I get the following:
          .
          .
          .
          SAM error on line 25557292: CIGAR op has zero length
          SAM error on line 25596829: CIGAR op has zero length
          SAM error on line 25604145: invalid CIGAR operation
          SAM error on line 25612881: CIGAR op has zero length
          SAM error on line 25618288: CIGAR op has zero length
          > Processed 0 loci. [*************************] 100%


          I am sorry that I don't really follow how you guys (above) fixed this problem. It seems that having a lot of references can cause this problem (I have a lot! I am using a reference transcriptome to make other RNAseq data). Could somebody please explain in a bit more detail how they fixed their problem.

          Cheers,
          T

          Comment


          • #6
            This seemed to work for me, although I am not quite sure why . . .

            If A.bam is the problem file, then

            samtools view -H A.bam > header.sam
            samtools reheader header.sam A.bam > B.bam
            Now try running the Cufflinks analysis on B.bam (alternatively, you could create a SAM file and run Cufflinks on that). Both ways seemed to remove those warnings, at least in my case.

            Justin

            Comment


            • #7
              Also, to add to that - my header is less than 4 MB. Cufflinks still has a max header length of 4 MB, so if your header is larger than 4 MB, then you might need to do as rossh suggested and increase the max length and recompile the code. If this is the case, then you should get the warning "Warning: BAM header is too large".

              You might be able to instead run the analysis on a SAM file, as it does not appear that there is a maximum header length for a SAM file.

              Comment


              • #8
                @BAM

                Thanks, I converted .bam > .sam and running that now. So far I have not encountered the problem.

                Cheers,
                T

                Comment


                • #9
                  @tboothby - glad that did the trick.

                  Here is what I think is going on . . .

                  For me, the BAM file had sequence and length information but not a physical header section (there are potentially two places where BAM files store sequence and length info - a somewhat less attractive feature of the format), so Cufflinks was not getting the sequence and length information. For you, I'm guessing that the header was longer than 4 MB, which the Cufflinks BAM parser can't handle. In either case, Cufflinks was not able to get the full header information and parse the BAM file correctly. Looks like operating on the SAM file solves both of those problems.

                  Justin

                  Comment


                  • #10
                    CIGAR op has zero length

                    Edited: "PROBLEM SOLVED"

                    Hello all,

                    I am encountering the same error, "CIGAR op has zero length" when I ran the following command:

                    ./cufflinks accepted_hits.bam

                    It will display the error in several lines. I also converted the bam file to sam file and tried to run cufflinks on that sam file but it displays an error message,

                    "AS attribute not supported"

                    I also tried to change the header as it is mentioned in the thread but encountering errors. I also ran cuuflinks on the sorted .bam file but no success.

                    When the cufflinks is ran on the test_data, it will give .expr and .gtf files indicating that cufflinks is working (but only for test data )

                    I ran cufflinks on Galaxy (web-server), it ran successfully on accepted_hits.bam but command lines gives more flexibility and options and that's why I am more interested in it.

                    Hopefully the problem will be sorted

                    Thanks!
                    Last edited by AsoBioInfo; 05-07-2012, 05:36 AM. Reason: "Problem Solved"

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 04-25-2024, 11:49 AM
                    0 responses
                    19 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-24-2024, 08:47 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    62 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X