Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Just add luciferase to your reference genome. You could also just align the unmapped reads to the luciferase sequence, but that might have decreased accuracy (in practice, the accuracy change is likely minor).

    Comment


    • Originally posted by dpryan View Post
      Just add luciferase to your reference genome. You could also just align the unmapped reads to the luciferase sequence, but that might have decreased accuracy (in practice, the accuracy change is likely minor).
      Thank you D!
      My questions:

      Q1:How to do that? So far I only have the luciferase coding sequence in fasta format. The specie is human.

      Q2:So far I use Tophat mapping with default parameter (on human genome), but I know if we change the paramter if might have different results.
      Q2.1 Besides some 'most significant' parameters e.g. single end or pair end reads or library-type (unstranded, firststrand), What parameter are most significant to set?
      for example
      in mapping the parameter they have min(max) intron length, max mutihits, read -mismatches
      Q2.2 How do I know what parameter I use? from species (e.g. Bacteria don't have introns) ? or what?
      Last edited by super0925; 07-03-2014, 01:42 AM.

      Comment


      • 1. If the luciferase is integrated into the genome then just try to match however that was done. If it's in a plasmid, just create the plasmid sequence and add that to the reference.

        2.1. The defaults are usually acceptable. Have a look at the alignments and alignment statistics and if they seem unacceptable then try to determine why and what parameters might remedy things. There's no boiler-plate solution that can be given for this.

        2.2. If you're doing RNAseq in bacteria then just use bowtie2. Tophat is only useful when there's splicing.

        Comment


        • Originally posted by dpryan View Post
          1. If the luciferase is integrated into the genome then just try to match however that was done. If it's in a plasmid, just create the plasmid sequence and add that to the reference.

          2.1. The defaults are usually acceptable. Have a look at the alignments and alignment statistics and if they seem unacceptable then try to determine why and what parameters might remedy things. There's no boiler-plate solution that can be given for this.

          2.2. If you're doing RNAseq in bacteria then just use bowtie2. Tophat is only useful when there's splicing.
          Q1:
          I am sorry I don't get totally what you mean.
          I have an indenpedent fasta file like
          >Luciferase
          ATGGAAGACGCCAAAAACATAAAGAAAGGCCCGGCGCCATTCTATCCGCTGGAAGATGGA
          Do you mean add this to the human genome? (e.g. genome.fa which has >chr1,>chr2,....)
          How about annotation file (i.e. .gtf file) and bowtie2 index?

          Q2.1:
          Thank you I got it.

          Q2.2:
          How about virus?

          Comment


          • Originally posted by super0925 View Post
            Q1:
            I am sorry I don't get totally what you mean.
            Ask a local biologist, it'll be quicker to explain with a quick little drawing on a
            white board.

            I have an indenpedent fasta file like
            >Luciferase
            ATGGAAGACGCCAAAAACATAAAGAAAGGCCCGGCGCCATTCTATCCGCTGGAAGATGGA
            Do you mean add this to the human genome? (e.g. genome.fa which has >chr1,>chr2,....)
            That's how you'd add it if it's coming from a plasmid (though you should add the sequence of the entire construct), yes. You would need to redo the index. You can just add the appropriate lines to your annotation (again, ask a local biologist to help with this if it's unclear what's actually important).

            Q2.2:
            How about virus?
            If the virus is infecting a eukaryote, then the host transcriptome would be spliced anyway, so tophat would make sense there (even though the virus is rather unlikely to produce any spliced reads). If it's infecting a prokaryote, then bowtie2 would make more sense.

            Comment


            • Originally posted by dpryan View Post
              Ask a local biologist, it'll be quicker to explain with a quick little drawing on a
              white board.


              That's how you'd add it if it's coming from a plasmid (though you should add the sequence of the entire construct), yes. You would need to redo the index. You can just add the appropriate lines to your annotation (again, ask a local biologist to help with this if it's unclear what's actually important).



              If the virus is infecting a eukaryote, then the host transcriptome would be spliced anyway, so tophat would make sense there (even though the virus is rather unlikely to produce any spliced reads). If it's infecting a prokaryote, then bowtie2 would make more sense.


              Hi D
              Another question,
              When I use DESeq2 on the one of my data. The PCA plot and heatmap are like those which listed in the attached figure.
              As you see , I have 6 samples , but it is not obviously separated from condition 1 (C1) and condition 2(C2). So what could I do? remove outlier sample?
              Cheers
              Attached Files

              Comment


              • Given how the math works for PCA, I wouldn't expect conditions to always be nicely separated. It's always best to be very careful when excluding a sample. While one of the C2 samples clusters alone, it doesn't appear to be an outlier. Presuming you have additional samples that you plan to use for validation, you can always get results with and without the possible outlier sample then see which validates better (just choose a few non-overlapping hits from each result set).

                Comment


                • Originally posted by dpryan View Post
                  Given how the math works for PCA, I wouldn't expect conditions to always be nicely separated. It's always best to be very careful when excluding a sample. While one of the C2 samples clusters alone, it doesn't appear to be an outlier. Presuming you have additional samples that you plan to use for validation, you can always get results with and without the possible outlier sample then see which validates better (just choose a few non-overlapping hits from each result set).
                  How could I know which sample is outlier? The heatmap and PCA plot don't give the subtitle or label. Thx!

                  Comment


                  • @super0925

                    FactoMineR is far superior to DESeq2's plotPCA() function.
                    Look how much more informative the plot in attachment, generated with FactoMineR, is than the plot generated with DESeq2's plotPCA() function.
                    Amongst other advantages, the samples are clearly labeled,

                    You can easily identify the samples in the heatmap though, as illustrated in the attached example. You need to check your R code.
                    Attached Files
                    Last edited by blancha; 07-14-2014, 10:19 AM.

                    Comment


                    • Originally posted by blancha View Post
                      @super0925

                      FactoMineR is far superior to DESeq2's plotPCA() function.
                      Look how much more informative the plot in attachment, generated with FactoMineR, is than the plot generated with DESeq2's plotPCA() function.
                      Amongst other advantages, the samples are clearly labeled,

                      You can easily identify the samples in the heatmap though, as illustrated in the attached example. You need to check your R code.

                      Thank you, It is what I want.
                      In FactoMineR function, I have 2 questions:
                      I found that in PCA function there are quanti.sup, quali.sup parameters, what are these?
                      Could you pls give me some suggestions or command. Thank you!
                      Last edited by super0925; 07-15-2014, 03:00 AM.

                      Comment


                      • As blancha said, you don't have to use DESeq2's plotting functions. In fact, they're pretty simple to just modify to include sample labels (I've modified them previously to include various batch effects without much effort).

                        Comment


                        • heatmap.2 is actually not a function of FactoMineR or DESeq2, but a function of the package gplots.
                          You just need to set the columns names of the matrix given in input to heatmap.2 to the samples names for the samples labels to appear on the plot.

                          Here is the R code to make a PCA plot with FactoMineR.
                          You don't need to use quanti.sup or quali.sup.

                          Code:
                          library(DESeq2)
                          library(FactoMineR)
                          library(RColorBrewer)
                          
                          # dds is the DESeqDataSet object created with DESeq2.
                          # rlog: "Regularized" log transformation
                          rld <- rlog(dds)
                          
                          # Transpose of the matrix of the count data.
                          # It's important to remember to give into input to FactoMineR the transpose of the matrix.
                          assay.rld.t <- t(assay(rld))
                          
                          ##################
                          # FactoMineR PCA #
                          ##################
                          pca <- PCA(assay.rld.t, graph=FALSE) 
                          
                          # Colors. You'll have to adjust this to the number of conditions and replicates in your experiment.
                          # I highly recommend using the brewer palettes.
                          colors.brewer <- brewer.pal(n=4, name="Set1")
                          colors <- c(rep(colors.brewer[1], 3),
                                      rep(colors.brewer[2], 3),
                                      rep(colors.brewer[3], 3),
                                      rep(colors.brewer[4], 3))
                          
                          # FactoMineR PCA plot           
                          pdf(file.path(outputDirectoryPlots, "PCA_with_colors.pdf"))
                          plot.PCA(pca, habillage="ind", col.hab=colors)
                          dev.off()

                          Comment


                          • Originally posted by dpryan View Post
                            As blancha said, you don't have to use DESeq2's plotting functions. In fact, they're pretty simple to just modify to include sample labels (I've modified them previously to include various batch effects without much effort).

                            Hi D
                            I have two samples (i.e. two .fastq files) from bovine. If I want to see whether a sequence (supposed named "GQ2") is expressed in bovine cells. (I have got the FASTA of this sequence) The sequence is unannotated in the current bovine genome, so might not have been tested in the analyses thus far.
                            Q1: How could I do it?
                            Q2: My solution (I don't know is it correct)
                            firstly I generate the bowtie/bowtie2 index of this "GQ2" based on the FASTA sequence, and then map my fastq file to that "GQ2" genome. is it correct?
                            Thank you!

                            Comment


                            • Originally posted by super0925 View Post
                              Hi D
                              I have two samples (i.e. two .fastq files) from bovine. If I want to see whether a sequence (supposed named "GQ2") is expressed in bovine cells. (I have got the FASTA of this sequence) The sequence is unannotated in the current bovine genome, so might not have been tested in the analyses thus far.
                              We should setup a consultation contract

                              Q1: How could I do it?
                              If you just want a quick and dirty check then just mapping reads to that gene should suffice. Just tweak your settings to only permit perfect or near perfect matches.

                              Q2: My solution (I don't know is it correct)
                              firstly I generate the bowtie/bowtie2 index of this "GQ2" based on the FASTA sequence, and then map my fastq file to that "GQ2" genome. is it correct?
                              Thank you!
                              That should suffice. If you need to know exact read numbers or you need the alignments for SNP calling, then this method isn't ideal. In those cases, you would really need to map to the entire genome so as to not bias alignments (this is also why I suggested only accepting near-perfect matches above).

                              Comment


                              • Originally posted by dpryan View Post
                                We should setup a consultation contract



                                If you just want a quick and dirty check then just mapping reads to that gene should suffice. Just tweak your settings to only permit perfect or near perfect matches.



                                That should suffice. If you need to know exact read numbers or you need the alignments for SNP calling, then this method isn't ideal. In those cases, you would really need to map to the entire genome so as to not bias alignments (this is also why I suggested only accepting near-perfect matches above).

                                Thank you soooo much! You are not only my consultant, but also my teacher
                                I will try to do it.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                8 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                49 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                67 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X