Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi fabrice,

    Here are a few things for you to consider when trouble shooting:

    1) use fastqc to check the quality of your reads
    2) use fastx toolkit to check the length distribution of your trimmed reads. One thing to find out is whether most of the trimming is from 3'-end. If it is, you may trim the reads by fixed number of nts so you have uniform read length.
    3) BWA alignment results are useful but I would use Bowtie to align the reads and check the mapping statistics, as Tophat uses Bowtie to map so the results are more relevant
    4) Start with Tophat default and mandatory parameters to see if you can get decent results

    Hope this helps.

    Douglas

    Comment


    • #17
      Hi Douglas,

      Thanks for your helpful suggestions.

      1) use fastqc to check the quality of your reads
      Yes. I have done these. I have trimed reads which quality less than 10. After trimed, the base quality is nice. All above 10.

      2) use fastx toolkit to check the length distribution of your trimmed reads. One thing to find out is whether most of the trimming is from 3'-end. If it is, you may trim the reads by fixed number of nts so you have uniform read length.

      After trim, the reads are not uniform. I did not trim the reads by fixed number of nts. I also trim the adaptor sequences and remove N at both side.

      3) BWA alignment results are useful but I would use Bowtie to align the reads and check the mapping statistics, as Tophat uses Bowtie to map so the results are more relevant

      If I used the output bam file from bwa, then used this bam file feed into cufflinks, do you think this will have some potential problems? I just think bwa will be better than bowtie (In my data, I found using bwa can get more properly paired reads).
      After I want to just use Tophat to output junctions.bed, insertions.bed and deletions.bed.
      It means using bwa->cufflinks to get the expression values. Tophat to estimate the junctions.

      I hope there is not potential problem or you have better suggestions.

      4) Start with Tophat default and mandatory parameters to see if you can get decent results.

      When I donot do the trim and using Tophat, it works. So I just think the problem is caused by trimming the reads. This let the -r/--mate-inner-dist parameters are not correct.

      Thank you very much for your time.

      Originally posted by DZhang View Post
      Hi fabrice,

      Here are a few things for you to consider when trouble shooting:

      1) use fastqc to check the quality of your reads
      2) use fastx toolkit to check the length distribution of your trimmed reads. One thing to find out is whether most of the trimming is from 3'-end. If it is, you may trim the reads by fixed number of nts so you have uniform read length.
      3) BWA alignment results are useful but I would use Bowtie to align the reads and check the mapping statistics, as Tophat uses Bowtie to map so the results are more relevant
      4) Start with Tophat default and mandatory parameters to see if you can get decent results

      Hope this helps.

      Douglas
      www.contigexpress.com

      Comment


      • #18
        Hi fabrice,

        1) It is a well known fact to me that given the same reference sequence(s) and the same set of reads, BWA in general maps more reads than Bowtie. I probably should have mentioned this earlier.

        2) I do not believe Tophat will take BWA-produced BAM.

        3) My suggestion is if a big portion of reads have low quality scores at the 3', I'd trim the 3' with fixed number of nts so you keep the read length uniform. Or go ahead with untrimmed reads and proceed to see whether the results make sense.

        Douglas

        Comment


        • #19
          Hi,

          I am working on Solid data and its not paired-end. Even I get this warning message:
          Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges. It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
          > Map Properties:
          > Upper Quartile: 242.20
          > Read Type: 50bp single-end
          > Fragment Length Distribution: Truncated Gaussian (default)
          > Default Mean: 200
          > Default Std Dev: 80

          Is it alright to ignore this ?

          Pinki

          Comment


          • #20
            Douglas,

            Why you think that cufflinks cannnot take BWA-produced BAM? In cufflinks website, they said cufflinks can take bam file from others mapping. At this moment, I just run into another problem for sort the bam file.

            Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc



            Originally posted by DZhang View Post
            Hi fabrice,

            1) It is a well known fact to me that given the same reference sequence(s) and the same set of reads, BWA in general maps more reads than Bowtie. I probably should have mentioned this earlier.

            2) I do not believe Tophat will take BWA-produced BAM.

            3) My suggestion is if a big portion of reads have low quality scores at the 3', I'd trim the 3' with fixed number of nts so you keep the read length uniform. Or go ahead with untrimmed reads and proceed to see whether the results make sense.

            Douglas
            www.contigexpress.com

            Comment


            • #21
              Pinki,

              I think you need to set these parameters. Because you used single-end.

              -m/--frag-len-mean <int> This is the expected (mean) fragment length. The default is 200bp.
              Note: Cufflinks now learns the fragment length mean for each SAM file, so using this option is no longer recommended with paired-end reads.
              -s/--frag-len-std-dev <int> The standard deviation for the distribution on fragment lengths. The default is 80bp.
              Note: Cufflinks now learns the fragment length standard deviation for each SAM file, so using this option is no longer recommended with paired-end reads.

              Originally posted by pinki999 View Post
              Hi,

              I am working on Solid data and its not paired-end. Even I get this warning message:
              Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges. It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.
              > Map Properties:
              > Upper Quartile: 242.20
              > Read Type: 50bp single-end
              > Fragment Length Distribution: Truncated Gaussian (default)
              > Default Mean: 200
              > Default Std Dev: 80

              Is it alright to ignore this ?

              Pinki

              Comment


              • #22
                Hi,

                I should have been more careful in my wording. You are right that Cufflinks accepts SAM/BAM files generated by other programs. Both BWA and Tophat are mappers. Tophat considers splicing in its mapping and BWA does not. If you use BWA-generated BAM for differential expression analysis, you basically throw out the reads that overlap with the splicing junctions and I do not think it is a good idea.

                Douglas


                Originally posted by fabrice View Post
                Douglas,

                Why you think that cufflinks cannnot take BWA-produced BAM? In cufflinks website, they said cufflinks can take bam file from others mapping. At this moment, I just run into another problem for sort the bam file.

                http://seqanswers.com/forums/showthread.php?t=12939

                Comment


                • #23
                  Douglas,

                  Thank you for your calrification and helpful suggestions.

                  Something I am still confused in my RNA-seq data analysis.

                  1, Tophat considers splicing in its mapping and BWA does not. It seems that we expect Tophat will get more properly paired reads. But the fact is that bwa get more. In my analysis, I just want to get the quantitative expression of each gene in samples. I just think it is always better considers splicing in mapping. You said that for differential expression analysis it is better to considers splicing in mapping. Is there some case (or in my case ) that BWA
                  mapping is acceptable? Or BWA is not suitable for RNA-seq mapping to genome?

                  2, If not for novel junction analysis, is it better to mapping RNA-seq to transcriptome, not genome? eg, for differential expression analysis. Mapping to transcriptome also have problems because one gene have serverl isofroms. This will let the reads have mutiple hits.

                  Thank you very much for your time.

                  Originally posted by DZhang View Post
                  Hi,
                  I should have been more careful in my wording. You are right that Cufflinks accepts SAM/BAM files generated by other programs. Both BWA and Tophat are mappers. Tophat considers splicing in its mapping and BWA does not. If you use BWA-generated BAM for differential expression analysis, you basically throw out the reads that overlap with the splicing junctions and I do not think it is a good idea.

                  Douglas
                  www.contigexpress.com

                  Comment


                  • #24
                    About the Cufflinks / Cuffdiff problem

                    Hi all,

                    I was able to run:

                    cuffdiff -o ./cuffdiff refGene_chr1.GTF B6341/hg19_chr1_seg/accepted_hits.bam 4242/hg19_chr1_seg/accepted_hits.bam


                    the refGene_chr1.GTF was downloaded from UCSC-> select
                    Group: Gene and Gene Prediction Tracks
                    Track: RefSeq Genes
                    Table: refGene

                    I only select the Chr1 and cuffdiff result is like:

                    Performed 3204 isoform-level transcription difference tests
                    Performed 0 tss-level transcription difference tests
                    Performed 3179 gene-level transcription difference tests
                    Performed 0 CDS-level transcription difference tests
                    Performed 0 splicing tests
                    Performed 0 promoter preference tests
                    Performing 0 relative CDS output tests
                    Writing isoform-level FPKM tracking
                    Writing TSS group-level FPKM tracking
                    Writing gene-level FPKM tracking
                    Writing CDS-level FPKM tracking

                    I'm not sure if this makes any sense.
                    =====================================================
                    So the overall question is should we run cuffdiff directly or run cuffcompare first and then cuffdiff.


                    I welcome any further discussion.

                    fangquan

                    Comment


                    • #25
                      Hi Dario,

                      You are right. But if you don't go through compare step, you are still able to get some results from cuffdiff like this:

                      Performed 3204 isoform-level transcription difference tests
                      Performed 0 tss-level transcription difference tests
                      Performed 3179 gene-level transcription difference tests
                      Performed 0 CDS-level transcription difference tests
                      Performed 0 splicing tests
                      Performed 0 promoter preference tests
                      Performing 0 relative CDS output tests


                      It's no surprise there are some zero files because "Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use."


                      fangquan




                      Originally posted by Dario1984 View Post
                      Hi everyone,

                      The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.



                      I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

                      I agree that this is quite obscure and hard to find, especially since the argument description states The other source implies that a standard GTF file from UCSC should work, but this is misleading.

                      --------------------------------------
                      Dario Strbenac
                      Research Assistant
                      Cancer Epigenetics
                      Garvan Institute of Medical Research
                      Darlinghurst NSW 2010
                      Australia

                      Comment


                      • #26
                        The GTF file from UCSC browser is not compatible with cuffdiff. You must download the annotations from the iGenomes project.

                        Comment


                        • #27
                          Hi,

                          Sorry but I don't understand what do you mean "incompatible". I used the annotation from UCSC, the cuffdiff does give me some differential test results, and there is not big error reported.

                          On the other hand, the galaxy exercises examples also use the UCSC annotation gtf file. Though they run the cuffcompare first.

                          Thanks for your information. I will try the iGenomes soon.

                          Keep in discussion.

                          fangquan








                          Originally posted by Dario1984 View Post
                          The GTF file from UCSC browser is not compatible with cuffdiff. You must download the annotations from the iGenomes project.
                          Last edited by fangquan; 08-16-2011, 11:33 PM. Reason: typo

                          Comment


                          • #28
                            Hi fabrice,
                            I think I figured out the problem and solution for the warning you get when you run cufflinks with paired end trimmed reads.

                            (Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges. It is recommended that correct paramaters (--frag-len-mean and --frag-len-std-dev) be provided.)

                            I think when you trim the paired end reads for adapters and low quality you need to have the both read mates in correct order after trimming. Cutadapt can't order the reads after trimming. You need to use a script which can trim adapters from the paired end reads and order the reads after trimming. I have used a program called 'flexible adapter remover'-far to trim the adapters and to order the paired reads after trimming. Similarly I used a script 'trim-fastq.pl' from 'Popoolation' to trim the paired end reads for low quality. This script corrects the order between the paired reads after trimming.

                            I used the correctly ordered and trimmed fastq files in tophat to produce a bam file. When I used this in cufflinks I did not get the warning message and it used the estimated the fragment length mean in the analysis.

                            I hope this helps others dealing this issue.

                            Comment


                            • #29
                              is cuffmerge necessary?

                              Hi Dario,

                              just a quick question: if I understood correctly this:
                              2) <outprefix>.combined.gtf

                              Cuffcompare reports a GTF file containing the "union" of all transfrags in each sample. If a transfrag is present in both samples, it is thus reported once in the combined gtf.
                              One runs cufflinks, then cuffcompare and the output already contains a reference GFT file to use with cuffdiff and thus cuffmerge in this case is redundant, right?

                              Cheers.

                              Originally posted by Dario1984 View Post
                              Hi everyone,

                              The answer is found in the cufflinks documentation. You need to run cuffcompare, even if you are using a known annotation, because cuffcompare adds a couple of columns that cuffdiff critically depends on.



                              I did this without the -s option, as I didn't want any of the genes filtered, so you don't need the -s option, if you don't want it.

                              I agree that this is quite obscure and hard to find, especially since the argument description states The other source implies that a standard GTF file from UCSC should work, but this is misleading.

                              --------------------------------------
                              Dario Strbenac
                              Research Assistant
                              Cancer Epigenetics
                              Garvan Institute of Medical Research
                              Darlinghurst NSW 2010
                              Australia

                              Comment


                              • #30
                                It is redundant to merge the transcripts if you use a reference GTF, but the command adds extra columns that are needed to run the isoform estimation step in cuffdiff, so it's also necessary.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Best Practices for Single-Cell Sequencing Analysis
                                  by seqadmin



                                  While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                                  06-06-2024, 07:15 AM
                                • seqadmin
                                  Latest Developments in Precision Medicine
                                  by seqadmin



                                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                  Somatic Genomics
                                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                  05-24-2024, 01:16 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 06-17-2024, 06:54 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 06-14-2024, 07:24 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 06-13-2024, 08:58 AM
                                0 responses
                                17 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 06-12-2024, 02:20 PM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X