Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Cole Trapnell View Post
    I just wanted to announce that v0.8.2 of Cufflinks addresses the divide by zero, along with a number of other issues.
    Cole, I created a GTF file off of the latest refgene. Is this valid/viable for Cufflinks?

    Comment


    • #32
      Cufflinks questions

      Hi,

      I have some questions about cufflinks...

      -If I run cufflinks with the --quartile-normalization and --reference-seq options, can/should I also run cuffdiff with these options? I expect that it wouldn't somehow normalize and correct twice, so it should be a good idea to do this.

      -Is --mask-file superfluous or a good idea if using --quartile-normalization option? Does anyone have a hg19 version of a mask.gtf file, or even a header so I could see what it should look like? It would be great if someone had reference gtf files posted somewhere, since it isn't obvious for beginners how to get these, and there seem to be problems with eg UCSC refgene or refflat gtf files compared to ensembl files.

      Thanks,

      Vince

      Comment


      • #33
        i have been encountering problems in this run and never got it completed for once.

        This is my command................it is a single fragment solid run

        tophat -C -p 8 -r 100 -I 1200 --library-type fr-secondstrand --microexon-search spruce_est solid0179_20100226_Kuusi_3_ACC_pooli_F3.csfasta

        [Thu Nov 18 14:30:32 2010] Beginning TopHat run (v1.1.2)
        -----------------------------------------------
        [Thu Nov 18 14:30:32 2010] Preparing output location ./tophat_out/
        [Thu Nov 18 14:30:32 2010] Checking for Bowtie index files
        [Thu Nov 18 14:30:32 2010] Checking for reference FASTA file
        Warning: Could not find FASTA file spruce_est.fa
        [Thu Nov 18 14:30:32 2010] Reconstituting reference FASTA file from Bowtie index
        [Thu Nov 18 14:30:40 2010] Checking for Bowtie
        Bowtie version: 0.12.7.0
        [Thu Nov 18 14:30:40 2010] Checking for Samtools
        Samtools version: 0.1.9.0
        [Thu Nov 18 14:30:42 2010] Checking reads
        min read length: 50bp, max read length: 50bp
        format: fasta
        [Thu Nov 18 14:37:57 2010] Mapping reads against spruce_est with Bowtie
        [Thu Nov 18 15:02:17 2010] Joining segment hits
        Traceback (most recent call last):
        File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 2201, in ?
        sys.exit(main())
        File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 2160, in main
        user_supplied_juncs)
        File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 1870, in spliced_alignment
        segment_len)
        File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 1593, in split_reads
        split_record(read_name, read_seq, read_quals, output_files, offsets, color)
        File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 1526, in split_record
        read_seq_temp = convert_color_to_bp(read_seq)
        File "/v/linux26_x86_64/appl/molbio/tophat/tophat-1.1.2.Linux_x86_64/tophat", line 1500, in convert_color_to_bp
        base = decode_dic[base+ch]

        KeyError: 'CN'

        13780.227u 172.063s 1:03:32.11 365.9% 0+0k 0+0io 2pf+0w

        what is this error about please?

        Comment


        • #34
          Have you tried with TopHat version 1.1.4? It is possible the bug you encountered has been fixed in the recent releases.

          Comment


          • #35
            Cleaning up a combined gtf file

            Originally posted by Cole Trapnell View Post
            So the basic workflow we recommend is:

            1) Assemble each sample with cufflinks
            2) Run cuffcompare on the sample transfrags all at the same time, providing a reference annotation if you want to classify your transfrags according to known, novel, etc.
            3) Give the stdout.combined.gtf to cuffdiff, along with your original SAM alignments from the samples. Cuffdiff will re-estimate the abundances of the transfrags in the GTF using the alignments in each sample, and do the differential expression testing at the same time.

            Optionally, you may wish to clean up the stdout.combined.gtf before running cuffdiff, to remove partial transfrags that resulted from low depth of sequencing coverage in one of the samples. We like to perform differential testing only on transcripts that are either already known to annotation or that we've assembled in two different samples independently.

            d.
            Hi Cole,

            In this post you mention cleaning up the combined gtf file - can you (or anyone else) be more specific on what flags we should filter the file on? When I use a combined gtf file with cuffdiff I end up with the same gene, same co-ordinates etc but different XLOC number reported many times in the gene expression file (currently running it through Galaxy). I assume this is in part because I've not cleaned the combined gtf file.

            Cheers
            David

            Comment


            • #36
              Hi, has anyone had any luck on filtering the combined gtf file to remove partial transfrags? how can these be detected?

              D.

              Comment


              • #37
                Originally posted by Lesley View Post
                Thanks for the info on the reference gtf. I downloaded both fasta and gtf from ensembl and ran into the chr problem. However, now when I run the cuffcompare on the reference genome I get tss_ids but no p_ids and the original gtf has CDS information.

                I also had the following error when running cuffcompare on cufflinks output and the fixed gtf file that I guess has something to do with the cufflinks gtf files since there are two of them.

                Warning: found 26695 transcripts with undetermined strand.
                Warning: found 44851 transcripts with undetermined strand.

                Cuffcompare then exits.

                Any help on moving forward with cufflinks will be greatly appreciated.

                Cheers,
                Lesley
                Hi Lesley,
                I am running the same issue, however I could get p_ids using -s option....still could not get tss_ids..
                would appreciate any advise.
                Thanks

                Comment


                • #38
                  Originally posted by Cole Trapnell View Post
                  Without tss_id and p_id attributes, Cufflinks will simply test for differential expression of transcripts and genes. You can attach these attributes to your own GTF file, but for convenience, cuffcompare now outputs a single file containing the "union" of all transfrags assembled you give it. So the basic workflow we recommend is:

                  1) Assemble each sample with cufflinks
                  2) Run cuffcompare on the sample transfrags all at the same time, providing a reference annotation if you want to classify your transfrags according to known, novel, etc.
                  3) Give the stdout.combined.gtf to cuffdiff, along with your original SAM alignments from the samples. Cuffdiff will re-estimate the abundances of the transfrags in the GTF using the alignments in each sample, and do the differential expression testing at the same time.

                  Optionally, you may wish to clean up the stdout.combined.gtf before running cuffdiff, to remove partial transfrags that resulted from low depth of sequencing coverage in one of the samples. We like to perform differential testing only on transcripts that are either already known to annotation or that we've assembled in two different samples independently.

                  As far as how cuffcompare assigns p_id and tss_id attributes:

                  * p_id is assigned just using the CDS records in the reference GTF. If there are no CDS records, there will be no p_ids. Similarly, if you run cuffcompare without a reference annotation along with your sample assemblies, there will be no p_id attributes in stdout.combined.gtf
                  * tss_id is assigned based on transfrags where the 5' ends are: two transcripts on the same strand and which share bases have the same TSS iff their 5' ends start within 100bp of each other. This threshhold is chosen based on our observation that depth of sequencing doesn't always reach to the end of the true transcript on either end. You can change it with the -d option (which I just realized is not listed in the manual - I will update it).

                  All this is to say that if you're hoping to just use a reference GTF with cuffdiff, you'll need to add those p_id and tss_id attributes yourself. You can do this with cuffcompare too, using a little hack:

                  cuffcompare -r reference.gtf reference.gtf reference.gtf

                  This will spit out a version of reference.gtf in stdout.combined.gtf that has the p_id and tss_id attributes attached.
                  Hi Cole,
                  I do get the p_ids by using your trick, still not able to get to the tss_id.
                  Please suggest the way out..
                  Cheers
                  Tani

                  Comment


                  • #39
                    Hi Cole,
                    I am aware of the Jensen-Shannon metric for the detection of differential splicing. It is nicely described in your paper for Cufflinks. But I am still not clear how do I calculate p-value for it. What I understood from the supplementary material of the paper is that "asymptotic" values are calculated for the JS metric but I am not sure exactly how to calculate them. It would be a great help if you could shed some more light on that since I am trying to implement and include that in my analysis scripts.

                    Comment


                    • #40
                      Divide by 0 error?

                      Hi everyone,
                      I tried to run cuffdiff as shown in the most recent paper (Tranell,2012). I am running cufflinks 1.1.0. This was my command:

                      cuffdiff -o mouse_diff_out -b genome.fa -p 8 -L KO,WT -u merged_asm/merged.gtf \./KO1_thout/accepted_hits.bam,./KO2_thout/accepted_hits.bam,./KO3_thout/accepted_hits.bam \./WT1_thout/accepted_hits.bam,./WT2_thout/accepted_hits.bam,./WT3_thout/accepted_hits.bam


                      It has worked this way with another sample set before, but this time it came up with an error (which I belive is a divide by 0 error...).

                      15:05:06] Inspecting maps and determining fragment length distributions.
                      > Map Properties:
                      > Total Map Mass: 6136.40
                      > Number of Multi-Reads: 2847 (with 7697 total hits)
                      > Read Type: 0bp single-end
                      > Fragment Length Distribution: Truncated Gaussian (default)
                      > Default Mean: 200
                      > Default Std Dev: 80
                      > Map Properties:
                      > Total Map Mass: 7789.56
                      > Number of Multi-Reads: 4369 (with 14182 total hits)
                      > Read Type: 0bp single-end
                      > Fragment Length Distribution: Truncated Gaussian (default)
                      > Default Mean: 200
                      > Default Std Dev: 80
                      > Map Properties:
                      > Total Map Mass: 691124.82
                      > Number of Multi-Reads: 653163 (with 2156382 total hits)
                      > Read Type: 0bp single-end
                      > Fragment Length Distribution: Truncated Gaussian (default)
                      > Default Mean: 200
                      > Default Std Dev: 80
                      > Map Properties:
                      > Total Map Mass: 546.92
                      > Number of Multi-Reads: 213 (with 629 total hits)
                      > Read Type: 0bp single-end
                      > Fragment Length Distribution: Truncated Gaussian (default)
                      > Default Mean: 200
                      > Default Std Dev: 80
                      [15:05:28] Modeling fragment count overdispersion.
                      > Map Properties:
                      > Total Map Mass: 6435.42
                      > Number of Multi-Reads: 4202 (with 12421 total hits)
                      > Read Type: 0bp single-end
                      > Fragment Length Distribution: Truncated Gaussian (default)
                      > Default Mean: 200
                      > Default Std Dev: 80
                      > Map Properties:
                      > Total Map Mass: 328384.74
                      > Number of Multi-Reads: 190518 (with 592361 total hits)
                      > Read Type: 0bp single-end
                      > Fragment Length Distribution: Truncated Gaussian (default)
                      > Default Mean: 200
                      > Default Std Dev: 80
                      [15:05:46] Modeling fragment count overdispersion.
                      [15:05:46] Calculating initial abundance estimates for bias and multi-read correction.
                      > Processed 13207 loci. [*************************] 100%
                      [15:08:30] Learning bias parameters.
                      [15:08:58] Testing for differential expression and regulation in locus.
                      > Processing Locus 1:25124320-25886552 [ ] 0%terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::domain_error> >'
                      what(): Error in function boost::math:df(const normal_distribution<d>&, d): Random variate x is nan, but must be finite!
                      Abort



                      Does anyone know what causes this?
                      K

                      Comment


                      • #41
                        It was fixed at a later time. Use version 1.3.0

                        Comment


                        • #42
                          Thanks for your response! That solved the problem:-)

                          Comment


                          • #43
                            Hi everyone,

                            I was running on cufflinks, but I got this error message as below:

                            Code:
                            $ cufflinks -p 8 -M ./ref/tb427.mask.gff -g ./ref/tb427.genes.gff -s ./ref/Bowtie2index/tb427.genome -u -o ./KO/PCF_427_WT.th.cl ./KO/PCF_427_WT.th/accepted_hits.bam
                            You are using Cufflinks v2.0.2, which is the most recent release.
                            [16:34:33] Loading reference annotation.
                            [16:34:34] Loading reference annotation.
                            [16:34:34] Inspecting reads and determining fragment length distribution.
                            > Processed 1930 loci.                         [*************************] 100%
                            terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<std::domain_error> >'
                              what():  Error in function boost::math::normal_distribution<d>::normal_distribution: Scale parameter is 0, but must be > 0 !
                            The RNAseq data is two replications for T.brucei species. The tb427.genome and tb427.genes.gff downloaded from TriTryDB. In order to speed up the assembly, the tb427.mask.gff is a file I would like to exclude to assemble expect for CDS and exon regions.

                            I did some serveries. It seems people had the same problem as I did but it occurred when ran on cuffdiff. As far as I know it has fixed by Cole.

                            I have no idea what happened to me. Does anyone know what's going on and guide me how to do?

                            Thanks.

                            Comment


                            • #44
                              I am sorry for confusing everyone.
                              I solved this problem by myself due to a misprint of parameter.

                              Code:
                              cufflinks -p 8 -M ./ref/tb427.mask.gff -g ./ref/tb427.genes.gff [COLOR="Red"]-s ./ref/Bowtie2index/tb427.genome[/COLOR] -u -o ./KO/PCF_427_WT.th.cl ./KO/PCF_427_WT.th/accepted_hits.bam
                              
                              changed to
                              
                              cufflinks -p 8 -M ./ref/tb427.mask.gff -g ./ref/tb427.genes.gff [COLOR="Red"]-b ./ref/Bowtie2index/tb427.genome.fa[/COLOR] -u -o ./KO/PCF_427_WT.th.cl ./KO/PCF_427_WT.th/accepted_hits.bam
                              However, I've got a warning message as below:

                              Code:
                              Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
                              > Map Properties:
                              >       Normalized Map Mass: 45525182.00
                              >       Raw Map Mass: 45525182.00
                              >       Fragment Length Distribution: Truncated Gaussian (default)
                              >                     Default Mean: 200
                              >                  Default Std Dev: 80
                              I ran TopHat on paired-end data. I was expecting that the Cufflinks can estimate the mean and s.d. from paired-end data. I think I don't have to set the these two parameters as mentioned by Cufflinks manual as "Note: Cufflinks now learns the fragment length mean for each SAM file, so using this option is no longer recommended with paired-end reads."

                              I've read some previous records. One of answers is it's (maybe) caused by wrong annotation. The annotation downloaded from TriTryDB, I didn't modify it (only remove the fasta format). So I don't expected it's due to wrong annotation.

                              I think I probably have something wrong to set parameters. My TopHat parameters as below:

                              Code:
                              tophat -p 8 -G ./ref/Bowtie2index/tb427.genes.gff -o ./KO/PCF_427_WT.th ./ref/Bowtie2index/tb427.genome ./KO/PCF_427_WT1 ./KO/PCF_427_WT2
                              Dose anyone one explain a little bit to me?
                              Thanks a lots.

                              Comment


                              • #45
                                Hi all,

                                According to the description of the output file genes.fpkm_tracking of cuffdiff, the value of *_FPKM is larger than *_conf_lo and smaller than *_conf_hi. But in my results, 12257 of 91991 transcripts in one sample have FPKM larger than both conf_lo and conf_hi. Is it normal?

                                The cufflinks version 2.1.1 was used.
                                Last edited by pengchy; 04-23-2013, 10:10 PM. Reason: add version information

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 11:49 AM
                                0 responses
                                2 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 08:47 AM
                                0 responses
                                16 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X