Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mlox
    Junior Member
    • Feb 2011
    • 2

    BAM header too large using cuffdiff

    Hi All,

    I tried to do expression analysis on an Illumina paired end transcriptome run.

    I prepaired my reads by tophat:

    tophat -r 150 -o tophat-1 Index reads1-1.fastQ reads1-2.fastq
    tophat -r 150 -o tophat-2 Index reads2-1.fastQ reads2-2.fastq


    Then sorted the BAM files:

    samtools sort tophat-1.bam tophat-sorted-1
    samtools sort tophat-2.bam tophat-sorted-2


    Then used cuffdiff and got a warning :

    cuffdiff transcripts.gff tophat-sorted-1.bam tophat-sorted-2.bam
    You are using Cufflinks v1.0.3, which is the most recent release.
    File tophat-sorted-1.bam doesn't appear to be a valid BAM file, trying SAM...
    Warning: BAM header too large


    All the result files from cuffdiff are empty...
    Can anybody help me on that?
    best
    mlox
  • jdjax
    Member
    • Dec 2010
    • 23

    #2
    Hello --

    When using cufflinks I receive the same error. I assembled my reads using Bowtie, converted my SAM output to BAM and then sorted the BAM file using SAMtools.

    I would appreciate if someone could help. Thanks.

    cufflinks: /usr/lib64/libz.so.1: no version information available (required by cufflinks)
    You are using Cufflinks v1.0.3, which is the most recent release.
    Warning: BAM header too large
    File trinity_n_trial_inflorescence.sorted.bam doesn't appear to be a valid BAM file, trying SAM...
    jdjax
    Ph.d. Student
    Åarhus University

    Comment

    • DZhang
      Senior Member
      • Jun 2010
      • 177

      #3
      mlox,

      The general work flow is tophat, cufflinks, cuffcompare, and cuffdiff. If you follow the manuals of tophat and cufflinks, it should give you decent results.

      Can you tell me how you came up with your current work flow?

      Comment

      • jdjax
        Member
        • Dec 2010
        • 23

        #4
        We currently do not have TopHat on our server and no one in our research group has any experience with TopHat.

        I do not have a reference genome, so after the assembly was done; I used bowtie to align the reads from my various tissue samples to the bowtie index I made from the assembly. Bowtie's output is an unsorted SAM file. So using SAMtools I first convert the file to a BAM file and then I sort it. I then take the sorted BAM file as input for cufflinks I get this error.

        I have tried using SAMtools reheader and that also did not work. Any other suggestions would be helpful.
        jdjax
        Ph.d. Student
        Åarhus University

        Comment

        • DZhang
          Senior Member
          • Jun 2010
          • 177

          #5
          Hi Jdjax,

          My experience indicates it is in general less challenging to use the work flow recommended by the author(s). (I am aware that cufflinks support bam files generated by programs other than tophat but in your case it complains that your bam file is not valid.)

          In your case, the nice part about tophat is two folds: 1) you can download the binary to your home directory and use it directly ; 2) tophat uses bowtie to align so it can re-use your index files. You may pursue fixing the header complaint or try tophat, whichever can achieve your objectives.

          Comment

          • jdjax
            Member
            • Dec 2010
            • 23

            #6
            Thanks for your input DZhang. Do you know of any other dependencies besides Bowtie that are required for TopHat?
            jdjax
            Ph.d. Student
            Åarhus University

            Comment

            • DZhang
              Senior Member
              • Jun 2010
              • 177

              #7
              Not that I am aware of. I believe you will get the results faster if you go with tophat.

              Comment

              • mlox
                Junior Member
                • Feb 2011
                • 2

                #8
                Hi DZhang,
                As I haven't a reference genome file I used a transcriptome assembly for mapping, I thought about no need to cufflnks and cuffmerge. I just generated a gtf file by my own, as all my reads came from spliced exons.
                I guess the error message results from the large number of transcripts I mapped to. I also tried bwa and for mapping and got a similar error.

                Comment

                • DZhang
                  Senior Member
                  • Jun 2010
                  • 177

                  #9
                  Hi mlox,

                  In your case, I strongly recommend using a count-based method. (If possible, I would also recommend mapping the reads to a genome, not a transcriptome.) My pick is to use HT-seq to obtain the read counts and use DESeq to identify differentially expressed genes.

                  Comment

                  • jdjax
                    Member
                    • Dec 2010
                    • 23

                    #10
                    DZhang,

                    I installed TopHat and tired using the accepted_hits.bam output from TopHat in cufflinks. But I received the same error: BAM header too large.

                    Do you have any other suggestions on what I can do?

                    Thanks.
                    jdjax
                    Ph.d. Student
                    Åarhus University

                    Comment

                    • DZhang
                      Senior Member
                      • Jun 2010
                      • 177

                      #11
                      jdjax,

                      Did you sort the sam? The bam file produced by Tophat should be used as is. Please also post your cufflink command.

                      Comment

                      • jdjax
                        Member
                        • Dec 2010
                        • 23

                        #12
                        DZhang,

                        I did not sort the sam. I am just testing these programs out so I did not use any options for tophat or cufflinks. Tophat made a file accept_hits.bam. I used that file as input for the cufflinks.

                        My cufflinks command was just: cufflinks accepted_hits.bam

                        I also want to more descriptive about errors I am recieveing in the hopes of figuring this problem. This is what the error stated:

                        cufflinks: /usr/lib64/libz.so.1 : no version information available
                        Warning: BAM header too large
                        File accepted_hits does not appear to be a valid BAM file, trying SAM
                        Inspecting reads and determining fragment length distribution.
                        SAM error on line 2880: CIGAR op has zero length
                        SAM error on line 3240: CIGAR op has zero length
                        SAM error on line 3464: CIGAR op has zero length
                        SAM error on line 5063: CIGAR op has zero length
                        SAM error on line 30750: CIGAR op has zero length
                        SAM error on line 51722: CIGAR op has zero length

                        This continues with increasing line numbers until it reaches the end of the file.
                        I have also checked /usr/lib64/libz.so.1 and it is in /usr/lib64

                        libz.so.1 -> libz.so.1.2.3

                        is what is present in on the server.

                        Again thanks for your input. I appreciate any help. =)
                        jdjax
                        Ph.d. Student
                        Åarhus University

                        Comment

                        • DZhang
                          Senior Member
                          • Jun 2010
                          • 177

                          #13
                          Hi jdjax,

                          1) Can you provide some background about your project? Type of reads, type of reference sequence, etc.
                          2) Tophat requires one mandatory parameter besides the read file(s). See below: -r/--mate-inner-dist <int> This is the expected (mean) inner distance between mate pairs. For, example, for paired end runs with fragments selected at 300bp, where each end is 50bp, you should set -r to be 200. There is no default, and this parameter is required for paired end runs.

                          How did you set -r ?

                          Comment

                          • DZhang
                            Senior Member
                            • Jun 2010
                            • 177

                            #14
                            Hi jdjax,

                            You should check the header information of your bam file. One way to do it is to convert bam to sam using samtools, then check the top portion of the sam files. (e.g., using 'more your.sam'). Let us know what you see in the header.

                            Comment

                            • jdjax
                              Member
                              • Dec 2010
                              • 23

                              #15
                              DZhang,

                              These are 50 to 200 bp single reads and the reference sequence I am using is the fasta file of contigs I got from the trinity assembly. This is for a de novo project, I do not have a full reference genome. Because of the fact that I do not have a reference genome is why I wanted to just use Bowtie, I did not think that TopHat was necessary since I do not have a full genome.

                              The option -r is only required for paired end runs.
                              jdjax
                              Ph.d. Student
                              Åarhus University

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              24 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              41 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              48 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...