Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zach
    Junior Member
    • Jun 2010
    • 5

    Help needed for running cufflinks

    Hi All,

    I am new to this forum and NG. I have installed tophat/bowtie/cufflinks in my server and it runs ok.

    I encount the following problem when running cufflinks.

    $ cufflinks -G mm_9.gtf accepted_hits.bam
    You are using Cufflinks v1.0.3, which is the most recent release.
    [13:25:25] Loading reference annotation.
    [13:25:31] Inspecting reads and determining fragment length distribution.
    > Processing Locus chr16: 57266251-57292978 [***** ] 20%

    It stops here forever. 2 out of 3 accepted_hits.bam files from tophat stop exactly here at 57266251-57292978. The other one went through and resulted in the results as the manual of Cufflinks described.

    Checked the bam files, no difference was observed.

    Anybody had similar issues? how am I supposed to fix it????? thanks

    zach

    I looked at the size of these 3 bam files
  • DZhang
    Senior Member
    • Jun 2010
    • 177

    #2
    Here is part of cufflinks FAQ that may solve your problem:

    I'm trying to assemble a sample. Cufflinks is almost done, but it seems to be hanging at "99% complete". What's going on?

    Cufflinks spawns threads for each locus to assemble and quantitate the "bundle" of reads in that locus. Some loci may have more reads and more complicated alternative splicing than others, which requires more CPU cycles. These bundles can continue processing long after all others have completed, leading to this behavior. You may be able to decrease the number of such bundles by masking out ribosomal and mitochondrial RNA using the -M/--mask-file option described in the Manual.

    Comment

    • zach
      Junior Member
      • Jun 2010
      • 5

      #3
      Dzhang,

      thanks. I used the tRNA gene as mask file - does not work. Then I used repeatmasker file downloaded from UCSC. It worked. My question is if using repeatmasker will affect the final results since repeatmaskers can exist anywhere and for any gene (to my understanding).

      How am supposed to get ribosomal and mitochondrial RNA gtf file?

      zach

      Comment

      • DZhang
        Senior Member
        • Jun 2010
        • 177

        #4
        Hi Zach,

        I am glad the repeatmasker solved the problem for you. Usually it should not impact the final results as repetitive sequences are in general thought to contain less information thus have less impact.

        For ribo and mt RNA gtp files, what I usually do is manually create the gft file from the master gft file - it is not that difficult - just search the gene names and copy them out to a separate file.

        Hope this helps.

        Douglas

        Comment

        • zach
          Junior Member
          • Jun 2010
          • 5

          #5
          Douglas,

          The gtf file for mouse from UCSC is in the following format. No directly label of which gene is rRNA or mt RNA:

          chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134212807 134213049 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134212703 134213049 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134221530 134221650 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134221530 134221650 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134222783 134222806 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134222783 134222806 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134224274 134224425 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134224274 134224425 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene CDS 134224708 134224773 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
          chr1 mm9_ensGene exon 134224708 134224773 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";

          Comment

          • DZhang
            Senior Member
            • Jun 2010
            • 177

            #6
            Hi zach,

            You need to check which gene IDs corresponds to ribo/mt genes. For ribo genes, there are not too many and you can perform manual check. For mt genes, search the first column as they will not reside on any chromosomes.

            Comment

            • DZhang
              Senior Member
              • Jun 2010
              • 177

              #7
              go to the UCSC genome website, in the field of "position or search term", enter "ribosomal RNA" and you will get a list of genes with chr. positions. Based on that you can get the gene IDs in your gtf file. I hope there is an easier but I do not work with the Mouse genome often...

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              96 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              117 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              109 views
              0 reactions
              Last Post SEQadmin2  
              Working...