Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks no FPKM values due to gtf issue

    Hi,

    I am facing an issue with getting cufflinks results with Arabidopsis. The problem seems to be the reference gtf.

    When I ran cufflinks without a reference gtf, I got FPKM values for genes in the genes.fpkm_tracking file.

    ----------------------------------------------
    tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status

    CUFF.1 - - CUFF.1 - - Chr1:3675-3911 - - 49.5395 33.4668 65.6123 OK

    CUFF.2 - - CUFF.2 - - Chr1:3995-4272 - - 30.6876 20.7312 40.6439 OK

    CUFF.3 - - CUFF.3 - - Chr1:4467-5098 - - 21.4671 17.3165 25.6177 OK
    ----------------------------------------------

    But when I ran cufflinks with a reference gtf.
    I am not getting any FPKM values.

    ----------------------------------------------
    tracking_id class_code nearest_ref_id gene_id gene_short_name tss_id locus length coverage FPKM FPKM_conf_lo FPKM_conf_hi FPKM_status

    XLOC_000001 - - XLOC_000001 ANAC001 TSS1 1:3630-5899 - - 0 0 0 OK

    XLOC_003797 - - XLOC_003797 ARV1 TSS3983 1:5927-8737 - - 0 0 0 OK

    XLOC_003798 - - XLOC_003798 NGA3 TSS3984 1:11648-13714 - - 0 0 0 OK
    ----------------------------------------------

    I am aware that cufflinks/cufffdiff need a compatible reference genome format. So for both these cufflinks runs I used a compatible reference.gtf file, which was created as follows:
    1. Downloaded NCBI dataset for Arabidopsis from http://cufflinks.cbcb.umd.edu/igenomes.html
    2. took genes.gtf file
    3. created cuffcombined.gtf using this command: cuffcompare -s /ccmb/CoreBA/BioinfCore/Common/DATA/Cufflinks_Data/Arabidopsis/Arabidopsis_thaliana/NCBI/build9.1/Sequence/WholeGenomeFasta/genome.fa -CG -r ../arabidopsis_genes.gtf ../arabidopsis_genes.gtf
    4. Used cuffcmp.combined.gtf which was created as the reference.gtf


    I also ran cufflinks using the ncbi genes.gtf (instead of using the cuffcmp.combined.gtf), but I still got no FPKM calculations in the result file.

    I also ran cuffdiff with the cuffcmp.combined.gtf, and here again I did not get any FPKM values, and hence, I am getting a NOTEST.

    Would grealty appreciate your help in figuring out what is the problem.

    Thanks in advance,

    Ash

  • #2
    Please make sure your gtf file is compatible to the genome you aligned. Such as if chromosome name is same.

    Comment


    • #3
      Thanks.

      The problem was the .gtf had just the chromosome numbers (eg: 1,2 etc) , while my tophat output files had the chromosome numbers as : Chr1, Chr2, etc.

      Once I modified the gtf and made the chromosome numbers the same, cufflinks is running fine.

      Comment


      • #4
        Hi Ash:

        I am meeting the same problem as you did and I have not found any solutions yet.

        Could you post some graphs to explain how you did fix it? I wanna know what your original chromosome numbers in .gtf look like and what the situation in tophat output? And finally, hwo did you modify it?

        Thanks
        Tao

        Comment


        • #5
          please, explain briefly. please.

          Comment


          • #6
            As ngsbee mentioned:

            you must make consistence of chromosome name. Sometime, 1, 2,3,...22,X,Y, M used as chromosome name, but sometime chr1, chr2,chr3,...chrM were used as chromosome name. You must make them same in your gtf file and mapping file. BTW, you also need check chrM, because it is also used as MT sometime.

            Comment


            • #7
              Sorry for the late reply.
              Tao, like Imf_bill said, you must check your tophat output file (accepted_hits.bam) and check the chromosome format in your reference gtf. If they are not the same, then you have to modify one of the files (I prefer the gtf) and make sure the formats are matching.

              For example, if your tophat output has the chromosome format : Chr1, Chr2,.., ChrMt, ChrUn and your gtf file has the chromosome format : 1,2,..., Mt,Un, then modify your gtf and make the chromosome formats: Chr1, Chr2,.., ChrMt, ChrUn.
              This will ensure that cufflinks/cuffdif runs properly.

              Another issue I faced is with colons ":" in the chromosome name.
              With rice data, I did the formatting of the gtf files to match the tophat bam files. The tophat bam files had chromosome format: EG:1, EG:2, etc. So I made the gtf format the same. But I still wasn't getting any results.
              Thank to this post in seqanswers: http://seqanswers.com/forums/showthr...ghlight=colons, I was able to figure it out.

              I have summarized below the issues I faced and how to fix them :
              -------
              1) If you are creating a Bowtie build from scratch:
              * Please check the chromosome format in your fasta files
              * IMPORTANT!! IF YOUR FASTA FILE HAS COLONS( IN IT (eg: rice ensembl fasta: >EG:1) YOU MUST REMOVE COLONS FROM YOUR FASTA FILE. CUFFDIFF WON’T RUN IF COLONS ARE PRESENT IN CHR NAME!!
              * Compare the chr format in the fasta file to the reference gtf file. If they are in the same format (eg: Chr1)
              * Format Fasta file to match reference gtf format. Once you make the chromosome formats in the fasta and your gtf the same, you can proceed to create your build.
              * Run bowtie-build.
              * Making sure that the chromosome formats are uniform is a vital step to ensure that your accepted_hits.bam (tophat output) and reference gtf (required for running cuffdiff) are compatible. Only if they are compatible will you get cuffdiff result.
              * Note: Currently the chr format issue is a silent bug. Cuffdiff doesn’t handle this issue nor does it generate an error or warning.

              (2) If you already have a stable bowtie build (downloaded from bowtie website) and you have used it to run tophat:
              * Check the chr format in your reference gtf file, and make sure you format your gtf to match the chr format in the accepted_hits.bam file
              *This will ensure that your accepted_hits.bam and reference gtf are compatible and you will be able to run cuffdiff without any issues

              (3) GFF3 issues.
              * DO NOT USE GFF3 FORMAT TO CREATE reference GTFs to run cuffdiff
              * If you use gff3, CUFFCOMPARE program truncates the long string from the gene annotation column, and gene IDs are lost. Hence, when you run cuffdiff, your output file won't have gene IDs.

              (4) When you are working with “sequencing-in-progress" data:
              *IT IS BEST TO USE A STABLE VERSION of GTFs and Fastas AVAILABLE FROM REFERENCE DATABASES (eg: Ensembl) instead of getting data from independent genome sequencing groups. More formatting issues are associated with these files and formats might change in-between versions.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              9 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              67 views
              0 likes
              Last Post seqadmin  
              Working...
              X