Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks output

    i am using cufflinks for differential gene expression analysis. After mapping of illumina reads on rice genome using tophat, i got accepted_hits.bam file. This files is used for cufflinks.

    I have a control dataset and a experimental dataset. In cufflink, i ran :

    [root@nbri deepika_cuff]# cufflinks -o KT_out --GTF oryza.gtf --label KT accepted_hits.bam

    [root@nbri deepika_cuff]# cufflinks -o KC_out --GTF oryza.gtf --label KC accepted_hits.bam

    after running this command i got 4 files for each and transcript.gtf files are used for differential analysis.

    then i ran:

    cuffdiff /storage/home/cdac/deepika_top/rice_data/Oryza.gtf /storage/home/cdac/deepika_top/KT-24h_output/KT_accepted_hits.bam /storage/home/cdac/deepika_top/KC-24h_output/KC_accepted_hits.bam


    after running this command, i got all the files which are mentioned in cufflink/cuffdiff's manual but most of files are blank. i did not get this.

    In gene_exp.diff...
    test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant
    11562.pre-tRNA-Arg-1 11562.pre-tRNA-Arg-1 trnR(UCU)3 Pt:34209-35955 q1 q2 NOTEST 0 0 0 0 1 1 no
    11562.pre-tRNA-Arg-2 11562.pre-tRNA-Arg-2 trnR(ACG)1 Pt:98543-99399 q1 q2 LOWDATA 0 0 0 0 1 1 no
    11562.pre-tRNA-Arg-3 11562.pre-tRNA-Arg-3 trnR(ACG)2 Pt:116153-116227 q1 q2 NOTEST 0 0 0 0 1 1 no
    11562.pre-tRNA-Asn-1 11562.pre-tRNA-Asn-1 trnN(GUU)1 Pt:98543-99399 q1 q2 NOTEST 0 0 0 0 1 1 no
    11562.pre-tRNA-Asn-2 11562.pre-tRNA-Asn-2 trnN(GUU)2 Pt:115830-115902 q1 q2 NOTEST 0 0 0 0 1 1 no
    11562.pre-tRNA-Asp-1 11562.pre-tRNA-Asp-1 trnD(GUC) Pt:16230-16304 q1 q2 NOTEST 0 0 0 0 1 1 no
    11562.pre-tRNA-Cys-1 11562.pre-tRNA-Cys-1 trnC(GCA) Pt:18058-18129 q1 q2 NOTEST 0 0 0 0 1 1 no
    11562.pre-tRNA-Gln-1 11562.pre-tRNA-Gln-1 - Pt:6615-6687 q1 q2 NOTEST 0 0 0 0 1 1 no
    11562.pre-tRNA-Glu-1 11562.pre-tRNA-Glu-1 trnE(UCC) Pt:15649-15722 q1 q2 NOTEST 0 0 0 0 1 1 no
    11562.pre-tRNA-Gly-1 11562.pre-tRNA-Gly-1 trnG(GCC)1 Pt:12330-12401 q1 q2 NOTEST 0 0 0 0 1 1 no
    11562.pre-tRNA-His-1 11562.pre-tRNA-His-1 trnH(GUG)2 Pt:80914-81163 q1 q2 NOTEST 0 0 0 0 1 1 no


    but in cds_exp.diff :
    test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant

    i got only this line and there are so many files which are blank. i did not get this problem and when i ran cufflink then got

    Warning: Could not connect to update server to verify current version. Please check at the Cufflinks website (http://cufflinks.cbcb.umd.edu).

    is there something wrong???? then how can i solve it?

    [10:29:31] Loading reference annotation.
    Warning: No conditions are replicated, switching to 'blind' dispersion method
    [10:29:38] Inspecting maps and determining fragment length distributions.
    [10:41:12] Modeling fragment count overdispersion.
    > Map Properties:
    > Normalized Map Mass: 0.00
    > Raw Map Mass: 0.00
    > Fragment Length Distribution: Truncated Gaussian (default)
    > Default Mean: 200
    > Default Std Dev: 80
    > Map Properties:
    > Normalized Map Mass: 0.00
    > Raw Map Mass: 0.00
    > Fragment Length Distribution: Truncated Gaussian (default)
    > Default Mean: 200
    > Default Std Dev: 80
    [10:43:26] Calculating preliminary abundance estimates
    [10:43:26] Testing for differential expression and regulation in locus.
    > Processed 57165 loci. [*************************] 100%
    Performed 0 isoform-level transcription difference tests
    Performed 0 tss-level transcription difference tests
    Performed 0 gene-level transcription difference tests
    Performed 0 CDS-level transcription difference tests
    Performed 0 splicing tests
    Performed 0 promoter preference tests
    Performing 0 relative CDS output tests
    Writing isoform-level FPKM tracking
    Writing TSS group-level FPKM tracking
    Writing gene-level FPKM tracking
    Writing CDS-level FPKM tracking
    Writing isoform-level count tracking
    Writing TSS group-level count tracking
    Writing gene-level count tracking
    Writing CDS-level count tracking
    Writing isoform-level read group tracking
    Writing TSS group-level read group tracking
    Writing gene-level read group tracking
    Writing CDS-level read group tracking
    Writing read group info
    Writing run info


    please solve my problem.

  • #2
    You haven't really indicated how you added the cufflinks output to the cuffdiff input. From the looks of your cufflinks command line you are using -GTF which means no novel transcripts. In that case running cufflinks is entirely pointless, and you didn't need to do it.

    Moving onto cuffdiff. From the looks of it there is a problem with the Oryza.gtf that you are using. cuffdiff expects a number of fields, specifically transcript_id, gene_id, tss_id, and p_id to perform the various comparisons.

    So my first guess is your gtf is malformed/missing fields.
    Last edited by mikep; 09-29-2013, 07:37 PM.

    Comment


    • #3
      Thanks for your reply.....
      Actually i have paired end illumina sequencing data, one is control and other one is experimental and mapped on rice genome using tophat and the command is:

      tophat -G rice_data/Oryza.gtf -p 5 -o K1_tophat_new rice_data/Oryza rice_data/K1_R1_filter.fastq rice_data/K1_R2_filter.fastq

      tophat -G rice_data/Oryza.gtf -p 5 -o K2_tophat_new rice_data/Oryza rice_data/K2_R1_filter.fastq rice_data/K2_R2_filter.fastq

      after running this, i got the error:[2013-09-30 18:54:34] Beginning TopHat run (v2.0.9)
      -----------------------------------------------
      [2013-09-30 18:54:34] Checking for Bowtie
      Bowtie version: 2.1.0.0
      [2013-09-30 18:54:34] Checking for Samtools
      Samtools version: 0.1.19.0
      [2013-09-30 18:54:34] Checking for Bowtie index files (genome)..
      [2013-09-30 18:54:34] Checking for reference FASTA file
      [2013-09-30 18:54:34] Generating SAM header for rice_data/Oryza
      format: fastq
      quality scale: phred33 (default)
      [2013-09-30 18:54:35] Reading known junctions from GTF file
      [2013-09-30 18:54:44] Preparing reads
      left reads: min. length=57, max. length=83, 33510885 kept reads (8 discarded)
      right reads: min. length=57, max. length=83, 33509751 kept reads (1142 discarded)
      [2013-09-30 19:30:03] Building transcriptome data files..
      [2013-09-30 19:30:23] Building Bowtie index from Oryza.fa
      [FAILED]
      Error: Couldn't build bowtie index with err = 1


      But when i ran it without GTF file then it run successfully. Can anyone tell me that where i am wrong?

      and the format of gtf file of rice is:

      Un protein_coding exon 1489 1644 . - . gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "1"; transcript_name "13113.m00129"; seqedit "false";
      Un protein_coding CDS 1489 1644 . - 0 gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "1"; transcript_name "13113.m00129"; protein_id "13113.m00129";
      Un protein_coding start_codon 1642 1644 . - 0 gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "1"; transcript_name "13113.m00129";
      Un protein_coding exon 1193 1357 . - . gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "2"; transcript_name "13113.m00129"; seqedit "false";
      Un protein_coding CDS 1196 1357 . - 0 gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "2"; transcript_name "13113.m00129"; protein_id "13113.m00129";
      Un protein_coding stop_codon 1193 1195 . - 0 gene_id "13113.t00002"; transcript_id "13113.m00129"; exon_number "2"; transcript_name "13113.m00129";
      Un protein_coding exon 4230 4368 . - . gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "1"; transcript_name "13113.m00131"; seqedit "false";
      Un protein_coding CDS 4230 4368 . - 0 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "1"; transcript_name "13113.m00131"; protein_id "13113.m00131";
      Un protein_coding start_codon 4366 4368 . - 0 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "1"; transcript_name "13113.m00131";
      Un protein_coding exon 3151 3521 . - . gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "2"; transcript_name "13113.m00131"; seqedit "false";
      Un protein_coding CDS 3151 3521 . - 2 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "2"; transcript_name "13113.m00131"; protein_id "13113.m00131";
      Un protein_coding exon 2986 3001 . - . gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "3"; transcript_name "13113.m00131"; seqedit "false";
      Un protein_coding CDS 2986 3001 . - 0 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "3"; transcript_name "13113.m00131"; protein_id "13113.m00131";
      Un protein_coding exon 2314 2900 . - . gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "4"; transcript_name "13113.m00131"; seqedit "false";
      Un protein_coding CDS 2317 2900 . - 2 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "4"; transcript_name "13113.m00131"; protein_id "13113.m00131";
      Un protein_coding stop_codon 2314 2316 . - 0 gene_id "13113.t00004"; transcript_id "13113.m00131"; exon_number "4"; transcript_name "13113.m00131";
      Un protein_coding exon 5772 5785 . + . gene_id "13113.t00006"; transcript_id "13113.m00133"; exon_number "1"; transcript_name "13113.m00133"; seqedit "false";
      Un protein_coding CDS 5772 5785 . + 0 gene_id "13113.t00006"; transcript_id "13113.m00133"; exon_number "1"; transcript_name "13113.m00133"; protein_id "13113.m00133";
      Un protein_coding start_codon 5772 5774 . + 0 gene_id "13113.t00006"; transcript_id "13113.m00133"; exon_number "1"; transcript_name "13113.m00133";
      Un protein_coding exon 6126 6308 . + . gene_id "13113.t00006"; transcript_id "13113.m00133"; exon_number "2"; transcript_name "13113.m00133"; seqedit "false";

      is there something wrong in GTF file? and if yes then pls tell me where i got rice GTF file.
      Last edited by deepika123; 09-30-2013, 06:02 AM.

      Comment


      • #4
        OK, so you've listed a few problems, let's start with the one posted originally (cuffdiff)

        I am not familiar with the rice genome or what format it is in or what it's chromosomes are called. However, the GTF file posted above lists the chromsome name as "Un", which I doubt is an actual chromosome. Not sure where you got your GTF from , but the first column in the GTF must match exactly the names of the chromosomes in the fa file.

        Secondly, tophat2, unlike tophat1, builds a bowtie index for the transcriptome, you should read the manual about this, and use --transcriptome-index. It looks as though the index couldn't be built, might be a read permission error, or a path error.

        Thirdly, your tophat commandlines don't look like paired end inputs, since there's only one fq file being mentioned.

        Comment


        • #5
          hello...

          i ran tophat with GTF file of rice but i got error and i do not know how to solve it please anyone can suggest me that how can i solve this problem.

          /usr/local/bin/tophat -G rice_data/Oryza.gtf -p 5 -o KC_tophat_new --transcriptome-index=known rice_data/Oryza rice_data/KC-24Hr_R1_filter.fastq rice_data/KC-24Hr_R2_filter.fastq

          and error is:

          [2013-10-02 12:36:15] Beginning TopHat run (v2.0.9)
          -----------------------------------------------
          [2013-10-02 12:36:15] Checking for Bowtie
          Bowtie version: 2.1.0.0
          [2013-10-02 12:36:15] Checking for Samtools
          Samtools version: 0.1.19.0
          [2013-10-02 12:36:19] Checking for Bowtie index files (genome)..
          [2013-10-02 12:36:19] Checking for reference FASTA file
          [2013-10-02 12:36:19] Generating SAM header for rice_data/Oryza
          format: fastq
          quality scale: phred33 (default)
          [2013-10-02 12:36:19] Reading known junctions from GTF file
          [2013-10-02 12:36:27] Preparing reads
          left reads: min. length=56, max. length=83, 37951504 kept reads (113 discarded)
          right reads: min. length=57, max. length=83, 37949924 kept reads (1693 discarded)
          [2013-10-02 13:09:09] Building transcriptome data files..
          [FAILED]
          Error: gtf_to_fasta returned an error.

          thanks in advance

          Comment


          • #6
            If you look in the run log, you'll find the actual gtf_to_fasta command that was run. Just run that yourself to get a more informative error message.

            Comment


            • #7
              yes i saw run log and run the command:

              /usr/local/bin/gtf_to_fasta --min-anchor 8 --splice-mismatches 0 --min-report-intron 50 --max-report-intron 500000 --min-isoform-fraction 0.15 --output-dir KC_tophat_new/ --max-multihits 20 --max-seg-multihits 40 --segment-length 25 --segment-mismatches 2 --min-closure-exon 100 --min-closure-intron 50 --max-closure-intron 5000 --min-coverage-intron 50 --max-coverage-intron 20000 --min-segment-intron 50 --max-segment-intron 500000 --read-mismatches 2 --read-gap-length 2 --read-edit-dist 2 --read-realign-edit-dist 3 --max-insertion-length 3 --max-deletion-length 3 -z gzip -p5 --inner-dist-mean 50 --inner-dist-std-dev 20 --gtf-annotations known/Oryza.gff --gtf-juncs KC_tophat_new/tmp/Oryza.juncs --no-closure-search --no-coverage-search --no-microexon-search known/Oryza.gff rice_data/Oryza.fa known/Oryza.fa > KC_tophat_new/logs/g2f.out

              and the error is:
              terminate called after throwing an instance of 'std:ut_of_range'
              what(): basic_string::substr
              Aborted

              how to solve this

              Comment


              • #8
                gtf_to_fasta sometimes fails on non-standard chromosome/contig names (see issue #38). Alternatively, this might also be caused by the annotation file describing a feature that goes beyond the bounds of one of the chromosomes/contigs. Short of going through the code, I can't really say which. You might just try to make a truncated annotation file and see if that works. If so, just expand it until you hit the error again. Then you'll know where the problem is.

                Comment


                • #9
                  Thanks dpryan...
                  Ok.. i am trying. Can you send me the link from where i downloaded gtf files of plant because i am trying to download from ensemble but i am unable to download due DAS and i am so good in programming .
                  i can be possible that error is due to annotation file if you send me the link, m thankful to you.

                  Comment


                  • #10
                    Well, you already have the gff file, so just truncate that.

                    Comment


                    • #11
                      thanks dpryan.. now tophat run successfully

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM
                      • seqadmin
                        Strategies for Sequencing Challenging Samples
                        by seqadmin


                        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                        03-22-2024, 06:39 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      32 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      28 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      52 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X