Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cuffmerge: No fasta index found for genome.fa...

    Hello all,

    Before running through the TopHat Cufflinks workflow with my own data, I am trying it with Drosophila_melanogaster RNA Seq data (as in "Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks"). Everything seemed to be working until I got to the Cuffmerge step. Executing the following command gave me the following errors.

    cuffmerge -g genes.gtf -s genome.fa -p 8 assemblies.txt

    [Thu Jun 13 11:10:38 2013] Beginning transcriptome assembly merge
    -------------------------------------------

    [Thu Jun 13 11:10:38 2013] Preparing output location ./merged_asm/
    [Thu Jun 13 11:10:40 2013] Converting GTF files to SAM
    [11:10:41] Loading reference annotation.
    [11:10:41] Loading reference annotation.
    [11:10:42] Loading reference annotation.
    [11:10:43] Loading reference annotation.
    [11:10:44] Loading reference annotation.
    [11:10:45] Loading reference annotation.
    [Thu Jun 13 11:10:46 2013] Quantitating transcripts
    You are using Cufflinks v2.1.1, which is the most recent release.
    Command line:
    cufflinks -o ./merged_asm/ -F 0.05 -g genes.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 8 ./merged_asm/tmp/mergeSam_fileaFtxQb
    [bam_header_read] EOF marker is absent.
    [bam_header_read] invalid BAM binary header (this is not a BAM file).
    File ./merged_asm/tmp/mergeSam_fileaFtxQb doesn't appear to be a valid BAM file, trying SAM...
    [11:10:46] Loading reference annotation.
    [11:10:49] Inspecting reads and determining fragment length distribution.
    Processed 11337 loci.
    > Map Properties:
    > Normalized Map Mass: 69085.00
    > Raw Map Mass: 69085.00
    > Fragment Length Distribution: Truncated Gaussian (default)
    > Default Mean: 200
    > Default Std Dev: 80
    [11:10:50] Assembling transcripts and estimating abundances.
    Processed 11337 loci.
    [Thu Jun 13 11:11:54 2013] Comparing against reference file genes.gtf
    You are using Cufflinks v2.1.1, which is the most recent release.
    No fasta index found for genome.fa. Rebuilding, please wait..
    Fasta index rebuilt.
    Warning: couldn't find fasta record for '2LHet'!
    Warning: couldn't find fasta record for '2RHet'!
    Warning: couldn't find fasta record for '3LHet'!
    Warning: couldn't find fasta record for '3RHet'!
    Warning: couldn't find fasta record for 'U'!
    Warning: couldn't find fasta record for 'XHet'!
    Warning: couldn't find fasta record for 'YHet'!
    Warning: couldn't find fasta record for 'dmel_mitochondrion_genome'!
    [Thu Jun 13 11:12:07 2013] Comparing against reference file genes.gtf
    You are using Cufflinks v2.1.1, which is the most recent release.
    Warning: couldn't find fasta record for '2LHet'!
    Warning: couldn't find fasta record for '2RHet'!
    Warning: couldn't find fasta record for '3LHet'!
    Warning: couldn't find fasta record for '3RHet'!
    Warning: couldn't find fasta record for 'U'!
    Warning: couldn't find fasta record for 'XHet'!
    Warning: couldn't find fasta record for 'YHet'!
    Warning: couldn't find fasta record for 'dmel_mitochondrion_genome'!


    Checking the genome.fa with head, I found that the problem did not seem to be with the fasta file.
    jmwhitha@jmwhitha-OptiPlex-755:~/my_rnaseq_exp$ head genome.fa
    >2L
    CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGATGATAATATATTCAAGTTGCCGCTAATCAGAAATAAATTCATTGCAACGTTAAATACAGCACAATATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTAATGAGTGCCTCTCGTTCTCTGTCTTATATTACCGCAAACCCAAAAAGACAATACACGACAGAGAGAGAGAGCAGCGGAGATATTTAGATTGCCTATTAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTCTATATAATGACTGCCTCTCATTCTGTCTTATTTTACCGCAAACCCAAATCGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATGAT

    This genome.fa came from http://cufflinks.cbcb.umd.edu/igenomes.html (Drosophila_melanogaster_Ensembl_BDGP5.25.tar.gz).

    So what's the problem?

    Thank you and God bless,
    Jason

  • #2
    Index files available from iGenomes appear to have only full chromosomes in the sequences and index files. Special sequences (Heterochromatin in this case, "random" in other genomes) are not included. On the other hand the GTF file seems to contain information about these sequences. That is why you are getting that warning.

    Were you able to complete the execution of cuffmege?

    cross ref this thread from earlier today: http://seqanswers.com/forums/showthread.php?t=31072
    Last edited by GenoMax; 06-13-2013, 09:33 AM.

    Comment


    • #3
      Well, now that you mention it, a .fai file was created.

      jmwhitha@jmwhitha-OptiPlex-755:~/my_rnaseq_exp$ head genome.fa.fai
      2L 23011544 4 60 61
      2R 21146708 23395078 60 61
      3L 24543557 44894236 60 61
      3R 27905053 69846857 60 61
      4 1351857 98216998 60 61
      M 19517 99591389 60 61
      X 22422827 99611235 60 61

      Is that all I need?

      Thanks,
      Jason

      Comment


      • #4
        Oh, I'm sorry, and a merged.asm was created!

        Comment


        • #5
          But now when I do the followup command, I get:

          jmwhitha@jmwhitha-OptiPlex-755:~/my_rnaseq_exp$ cuffdiff -o diff_out -b genome.fa -p 8 -L C1,C2 -u merged_asm/merged.gtf \ ./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,./C1_R3_thout/accepted_hits.bam \ ./C2_R1_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam,./C2_R3_thout/accepted_hits.bam
          You are using Cufflinks v2.1.1, which is the most recent release.
          open: No such file or directory
          File ./C1_R1_thout/accepted_hits.bam doesn't appear to be a valid BAM file, trying SAM...
          Error: cannot open alignment file ./C1_R1_thout/accepted_hits.bam for reading

          When I "head" the C1_R1 file, it looks like an unreadable BAM file.

          This seems to be a separate issue. Is it? I can open another thread if it is.

          Thanks,
          Jason

          Comment


          • #6
            Are you in the directory where all these (*thout) directories are present when you run the cuffdiff command?

            Comment


            • #7
              Yes, I am in the directory which contains the thout directories and those contain the the .bam files.

              Comment


              • #8
                Is there a file called "merged.gtf" in the merged_asm directory?

                Can you also use the "code" tags around the actual command line that you are posting so we can see it better? (Click on the "Go advanced" button when you are editing a post. Highlight the command line and then click on the "#" icon in the edit bar at the top).

                Like this:
                Code:
                jmwhitha@jmwhitha-OptiPlex-755:~/my_rnaseq_exp$ cuffdiff -o diff_out -b genome.fa -p 8 -L C1,C2 -u merged_asm/merged.gtf \ ./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,./C1_R3_thout/accepted_hits.bam \ ./C2_R1_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam,./C2_R3_thout/accepted_hits.bam

                Comment


                • #9
                  Problem solved

                  Thank you so much! PROBLEM SOLVED! Removing the "\" solved the problem.

                  Code:
                  cuffdiff -o diff_out -b genome.fa -p 8 -L C1,C2 -u merged_asm/merged.gtf ./C1_R1_thout/accepted_hits.bam,./C1_R2_thout/accepted_hits.bam,./C1_R3_thout/accepted_hits.bam ./C2_R1_thout/accepted_hits.bam,./C2_R2_thout/accepted_hits.bam,./C2_R3_thout/accepted_hits.bam

                  Comment


                  • #10
                    Really there are two different problems here and two different solutions. Is there someway we can move the second half of this thread to another topic?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 10:49 AM
                    0 responses
                    15 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-25-2024, 11:49 AM
                    0 responses
                    23 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-24-2024, 08:47 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    62 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X