Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • leifive
    Member
    • Mar 2013
    • 10

    #31
    Originally posted by fangquan View Post
    Hi Dario,

    You are right. But if you don't go through compare step, you are still able to get some results from cuffdiff like this:

    Performed 3204 isoform-level transcription difference tests
    Performed 0 tss-level transcription difference tests
    Performed 3179 gene-level transcription difference tests
    Performed 0 CDS-level transcription difference tests
    Performed 0 splicing tests
    Performed 0 promoter preference tests
    Performing 0 relative CDS output tests


    It's no surprise there are some zero files because "Cuffdiff requires that transcripts in the input GTF be annotated with certain attributes in order to look for changes in primary transcript expression, splicing, coding output, and promoter use."


    fangquan
    Hi, fangquan.
    Could you tell me how you solve the problem. I am facing almost the same puzzle. I used merged.gtf from cuffmerge and combined.gtf from cuffcompare as input alternatively, but cuffdiff performed 0 splicing/promoter preference /relative CDS output tests all the time. Thanks.

    Comment

    • pengchy
      Senior Member
      • Feb 2009
      • 116

      #32
      Hi all,

      I have run tophat2/cufflinks2.1.1/cuffmerge successfully. But when I run cuffdiff2 with merged gtf file, all the output *fpkm_tracking files have zero rpkm value, and the message of cuffdiff2 is:

      Code:
      [11:41:15] Loading reference annotation and sequence.
      Warning: No conditions are replicated, switching to 'blind' dispersion method
      [11:42:42] Inspecting maps and determining fragment length distributions.
      Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
      Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
      [11:53:13] Modeling fragment count overdispersion.
      > Map Properties:
      >       Normalized Map Mass: 0.50
      >       Raw Map Mass: 0.12
      >       Number of Multi-Reads: 70 (with 71 total hits)
      >       Fragment Length Distribution: Truncated Gaussian (default)
      >                     Default Mean: 200
      >                  Default Std Dev: 80
      > Map Properties:
      >       Normalized Map Mass: 0.50
      >       Raw Map Mass: 1.00
      >       Number of Multi-Reads: 154 (with 158 total hits)
      >       Fragment Length Distribution: Truncated Gaussian (default)
      >                     Default Mean: 200
      >                  Default Std Dev: 80
      [11:55:34] Calculating preliminary abundance estimates
      I have test the gtf file produced by cuffcompare, the results same. Could anyone tell me the reason?

      Thank you.

      Comment

      • pengchy
        Senior Member
        • Feb 2009
        • 116

        #33
        I have found the reason for this problem. Because the coordination in the bam
        files is not consistent with the gtf.

        Thank you.

        Originally posted by pengchy View Post
        Hi all,

        I have run tophat2/cufflinks2.1.1/cuffmerge successfully. But when I run cuffdiff2 with merged gtf file, all the output *fpkm_tracking files have zero rpkm value, and the message of cuffdiff2 is:

        Code:
        [11:41:15] Loading reference annotation and sequence.
        Warning: No conditions are replicated, switching to 'blind' dispersion method
        [11:42:42] Inspecting maps and determining fragment length distributions.
        Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
        Warning: Using default Gaussian distribution due to insufficient paired-end reads in open ranges.  It is recommended that correct parameters (--frag-len-mean and --frag-len-std-dev) be provided.
        [11:53:13] Modeling fragment count overdispersion.
        > Map Properties:
        >       Normalized Map Mass: 0.50
        >       Raw Map Mass: 0.12
        >       Number of Multi-Reads: 70 (with 71 total hits)
        >       Fragment Length Distribution: Truncated Gaussian (default)
        >                     Default Mean: 200
        >                  Default Std Dev: 80
        > Map Properties:
        >       Normalized Map Mass: 0.50
        >       Raw Map Mass: 1.00
        >       Number of Multi-Reads: 154 (with 158 total hits)
        >       Fragment Length Distribution: Truncated Gaussian (default)
        >                     Default Mean: 200
        >                  Default Std Dev: 80
        [11:55:34] Calculating preliminary abundance estimates
        I have test the gtf file produced by cuffcompare, the results same. Could anyone tell me the reason?

        Thank you.

        Comment

        • Charitra
          Member
          • Feb 2013
          • 57

          #34
          So, does it mean that all reference files must be from same origin (ensembl or UCSC) ?
          Is it okay to ignore this warning (Warning: No conditions are replicated, switching to 'blind' dispersion method) and just let the cuffdiff continue. What impact will it give ?
          I have ignored Warning and my cuffdiff is finished. I have everything in my data such as genes and diff expression data without error. I used all ensemble ref but still got the same error ?

          Expecting your kind reply..

          Thank you.


          Originally posted by pengchy View Post
          I have found the reason for this problem. Because the coordination in the bam
          files is not consistent with the gtf.

          Thank you.

          Comment

          • nsl
            Member
            • Jan 2011
            • 28

            #35
            If the gtf file is incompatible you will know, as your cufflinks output will show "0 FPKM" for every gene. As mentioned in the posts above, get it from igenomes just to be safe.

            If cuffdiff went back to the 'blind' setting that means that it is assuming you only have 1 replicate per treatment, and

            "This method works well when you expect the samples to have very few differentially expressed genes. If there are many differentially expressed genes, Cuffdiff will construct an overly conservative model and you may not get any significant calls. In this case, you will need more replicates in your experiment."

            Check the last paragraph on the cufflinks manual.

            If you have more than 1 replicate but it is still running it blind, could be that you didn't comma separate your replicates correctly.

            Hope this helps

            Comment

            • Charitra
              Member
              • Feb 2013
              • 57

              #36
              Thank you so much for your expert comments.
              I have some confusions/questions but searching the answers in previous posts. I will drop my questions here, if I can not find the answers.

              However, there is something which I like to ask you: I am sorry if this is too much disturbing you but I really need move on. Please answer if possible. Thank you:
              1. After the tophat alignment, I run cufflinks using tophat produced .bam file and then cufflinks stated "Warning: doesnt appear to be a .bam file, trying .sam...OK.." then it continued. Do you think this might has something to do with cuffdiff going to blind ?
              2. can you check please these cuffdiff; I used igenome(Ensemble) ref.
              Warning: couldn't find fasta record for 'HSCHR9_3_CTG35'!
              This contig will not be bias corrected.
              Warning: No conditions are replicated, switching to 'blind' dispersion method
              [17:12:12] Inspecting maps and determining fragment length distributions.
              [17:25:54] Modeling fragment count overdispersion.
              > Map Properties:
              > Normalized Map Mass: 21977740.46
              > Raw Map Mass: 23001324.33
              > Number of Multi-Reads: 493145 (with 1171488 total hits)
              > Fragment Length Distribution: Empirical (learned)
              > Estimated Mean: 233.43
              > Estimated Std Dev: 32.95
              > Map Properties:
              > Normalized Map Mass: 21977740.46
              > Raw Map Mass: 20859001.11
              > Number of Multi-Reads: 430276 (with 1094508 total hits)
              > Fragment Length Distribution: Empirical (learned)
              > Estimated Mean: 242.99
              > Estimated Std Dev: 19.76
              [17:27:41] Calculating preliminary abundance estimates
              > Processed 38664 loci. [*************************] 100%
              [19:01:04] Learning bias parameters.
              [19:24:10] Testing for differential expression and regulation in locus.
              > Processed 38664 loci. [*************************] 100%
              Performed 61095 isoform-level transcription difference tests
              Performed 41310 tss-level transcription difference tests
              Performed 18315 gene-level transcription difference tests
              Performed 28507 CDS-level transcription difference tests
              Performed 0 splicing tests
              Performed 0 promoter preference tests
              Performing 0 relative CDS output tests
              Writing isoform-level FPKM tracking
              Writing TSS group-level FPKM tracking
              Writing gene-level FPKM tracking
              Writing CDS-level FPKM tracking
              Writing isoform-level count tracking
              Writing TSS group-level count tracking
              Writing gene-level count tracking
              Writing CDS-level count tracking
              Writing isoform-level read group tracking
              Writing TSS group-level read group tracking
              Writing gene-level read group tracking
              Writing CDS-level read group tracking
              Writing read group info
              Writing run info

              3. For the cuffdiff of 5 samples,
              3.1 without merging:-
              CuffSet instance with:
              5 samples
              62149 genes
              273794 isoforms
              146887 TSS
              82429 CDS
              621490 promoters
              1468870 splicing
              192450 relCDS
              diff_expressed_gene_significant: 3183
              3.2 divided the data into 2 categories. First category, merged two .gtf (1 + 2) and, in second, three .gtf (3+4+5). Then run cuffdiff and got following details from cummeRbund:
              CuffSet instance with:
              2 samples
              62149 genes
              273794 isoforms
              146887 TSS
              82429 CDS
              62149 promoters
              146887 splicing
              19245 relCDS
              diff_expressed_gene_significant: 95
              (FPKM expression plot Image attached)
              does it indicate good cuffdiff process by your experience (even though used blind method) ?
              4. I think, for 3.1 cuffdiff (1 replicate), and for 3.2 First (2 replicates) and second (3 replicates). It is right ?

              Thank you in advance with
              Originally posted by nsl View Post
              If the gtf file is incompatible you will know, as your cufflinks output will show "0 FPKM" for every gene. As mentioned in the posts above, get it from igenomes just to be safe.

              If cuffdiff went back to the 'blind' setting that means that it is assuming you only have 1 replicate per treatment, and

              "This method works well when you expect the samples to have very few differentially expressed genes. If there are many differentially expressed genes, Cuffdiff will construct an overly conservative model and you may not get any significant calls. In this case, you will need more replicates in your experiment."

              Check the last paragraph on the cufflinks manual.

              If you have more than 1 replicate but it is still running it blind, could be that you didn't comma separate your replicates correctly.

              Hope this helps
              Attached Files

              Comment

              • nsl
                Member
                • Jan 2011
                • 28

                #37
                Hi Charitra,
                I'm learning on the job like many and am not an expert.
                1. isn't a problem. I've experienced it too. But am not sure why that msg pops up.
                2. Could it be that there is no FASTA record b/c it is a pseudo gene? Not sure on this.
                3. I am not quite sure about what you are asking as I don't know the design of your experiment. Nevertheless, when you run them as 5 separate samples you have 3183 differentially expressed genes. and this number is reduced drastically to 95 when you run it as replicates. This indicates that there is a lot of variation. You really need to have replicates to make any conclusions about your data.
                4. Do your barplots represent 2 different genes? Either way the FPKM values are low and I would not concentrate on genes with very low FPKMs ( unless of course you have a prior knowledge and have reason to). Also, the error overlap terribly, so there is no significance.

                hope this help

                Comment

                • Charitra
                  Member
                  • Feb 2013
                  • 57

                  #38
                  Dear nls
                  Thank you so much for your expert comments . I got the point and thank you again for your help.
                  I like to write details on your comments no. 3. and 4. :
                  3. My first two sample (1. and 2.) are of sensitive group, so, I merged them. Sample (3., 4. and 5.) are of resistant group, so, i merged them. Now, I have two conditions, Sensitive vs Resistant. Thereafter, I run cuffdiff and got 93 diff genes. I got questions now:
                  a). Sensitive and Resistant have 2 and 3 replicates, respectively. It is true in this case ?
                  b). If the above condition is true (2 replicates in sensitive and 3 in resistant), then should I put the replicate number in when running cuffdiff/cuffmerge because, (as you may remember, it was going to blind method) ?
                  c). does cuffmerge/cuffdiff consider replicates automatically and switch to blind (Warning: No conditions are replicated, switching to 'blind' dispersion method) Or a command must be provided indicating number of replicates ?
                  4. In the attachment, ID XLOC_006036 is cuffdiff ID because cuffdiff does not give name of the gene. So, it is a single gene named CYP2C9 with cuffdiff ID XLOC_006036. How much FPKM value would you consider considered good enough or very low to count diff expression, just your point of view / experience ?

                  the most important question for me is, I think there are not enough replicates as it should be 3 at least and now the experiments are already done. Is there any way to get something out of these data which can be significant ? what would you like to recommend ?

                  Thank you in advance.


                  Originally posted by nsl View Post
                  Hi Charitra,
                  I'm learning on the job like many and am not an expert.
                  1. isn't a problem. I've experienced it too. But am not sure why that msg pops up.
                  2. Could it be that there is no FASTA record b/c it is a pseudo gene? Not sure on this.
                  3. I am not quite sure about what you are asking as I don't know the design of your experiment. Nevertheless, when you run them as 5 separate samples you have 3183 differentially expressed genes. and this number is reduced drastically to 95 when you run it as replicates. This indicates that there is a lot of variation. You really need to have replicates to make any conclusions about your data.
                  4. Do your barplots represent 2 different genes? Either way the FPKM values are low and I would not concentrate on genes with very low FPKMs ( unless of course you have a prior knowledge and have reason to). Also, the error overlap terribly, so there is no significance.

                  hope this help

                  Comment

                  • jp.
                    Senior Member
                    • Jul 2013
                    • 142

                    #39
                    Please somebody give me answer of my problem.
                    My RNAseq (PE) was conducted for 2 samples (antibiotic resistant and sensitive) without thinking of replication.
                    Is it possible to publish the differential gene, splicing in the journal. Most of the researcher said it is not possible
                    I want answer from this forum. What it is you think I should do .....?
                    Many thanks

                    Comment

                    • nsl
                      Member
                      • Jan 2011
                      • 28

                      #40
                      jp,

                      I'm afraid that is fact. no replication would not allow you a stand alone publication

                      Comment

                      • jp.
                        Senior Member
                        • Jul 2013
                        • 142

                        #41
                        One more thing,
                        what about, if I try to get duplicates (1 more seq for each of two, biological replicate), duplicates will be okay as minimum or not ?


                        Originally posted by nsl View Post
                        jp,

                        I'm afraid that is fact. no replication would not allow you a stand alone publication

                        Comment

                        • nsl
                          Member
                          • Jan 2011
                          • 28

                          #42
                          jp,

                          I've been dealing with ngs data for a short 3 yrs and not an expert. I started in 2010 with 1 replicate and after being exposed to the seqanswer and other bioinformatics communities realized the folly of my ways...we would never rely on no replication for bench work and same goes for this stuff. I went on to have 4 replicates each, and did one set a at a different time. I see quite a bit of variation in the samples that i did 6 months later. However, I am dealing with a very dynamic stage in development and variations may be showing the actual biology. long story short.... 2 reps better than 1. but also be mindful of the biology you are going after. cells, tissues, developmental stages can all show true variation at the rna level and the last thing you want is false positives due to library prep and sample handling.

                          Comment

                          • jp.
                            Senior Member
                            • Jul 2013
                            • 142

                            #43
                            Dear nls,
                            Thank you for your valuable advice. Your knowledge and experience is much higher than me. I really appreciate your help. However, it will be very kind of you, if you please answer few more of my questions below:
                            What is your opinion:
                            1. Which library size is better for human sample to study diff_exp, transcript discovery, splicing for PE seq Illumina (150bp or 50bp) (short / longer) or ..?
                            2. What if I for single cell sequencing ?
                            3. If single cell seq better than, can it be done on the same sequencer (PE Illumina 2000/2500) ?
                            4. If possible, plz write something about single cells vs normal PE sequencing differences in procedure (just few will be okay)
                            5. May I get your contact number so that I can call you with prior appointment. my e-mail id (med dot rdgmc at g mail dot com)
                            I have read enough but get confusion always, your opinion will help me a lot.
                            My english is not good enough..sorry

                            Thanks in advance



                            Originally posted by nsl View Post
                            jp,

                            I've been dealing with ngs data for a short 3 yrs and not an expert. I started in 2010 with 1 replicate and after being exposed to the seqanswer and other bioinformatics communities realized the folly of my ways...we would never rely on no replication for bench work and same goes for this stuff. I went on to have 4 replicates each, and did one set a at a different time. I see quite a bit of variation in the samples that i did 6 months later. However, I am dealing with a very dynamic stage in development and variations may be showing the actual biology. long story short.... 2 reps better than 1. but also be mindful of the biology you are going after. cells, tissues, developmental stages can all show true variation at the rna level and the last thing you want is false positives due to library prep and sample handling.

                            Comment

                            Latest Articles

                            Collapse

                            • seqadmin
                              Pathogen Surveillance with Advanced Genomic Tools
                              by seqadmin




                              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                              Yesterday, 11:48 AM
                            • seqadmin
                              New Genomics Tools and Methods Shared at AGBT 2025
                              by seqadmin


                              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                              The Headliner
                              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                              03-03-2025, 01:39 PM
                            • seqadmin
                              Investigating the Gut Microbiome Through Diet and Spatial Biology
                              by seqadmin




                              The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                              02-24-2025, 06:31 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by seqadmin, 03-20-2025, 05:03 AM
                            0 responses
                            34 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-19-2025, 07:27 AM
                            0 responses
                            42 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-18-2025, 12:50 PM
                            0 responses
                            34 views
                            0 reactions
                            Last Post seqadmin  
                            Started by seqadmin, 03-03-2025, 01:15 PM
                            0 responses
                            190 views
                            0 reactions
                            Last Post seqadmin  
                            Working...