Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • super0925
    replied
    Originally posted by dpryan View Post
    As blancha said, you don't have to use DESeq2's plotting functions. In fact, they're pretty simple to just modify to include sample labels (I've modified them previously to include various batch effects without much effort).

    Hi D
    I have two samples (i.e. two .fastq files) from bovine. If I want to see whether a sequence (supposed named "GQ2") is expressed in bovine cells. (I have got the FASTA of this sequence) The sequence is unannotated in the current bovine genome, so might not have been tested in the analyses thus far.
    Q1: How could I do it?
    Q2: My solution (I don't know is it correct)
    firstly I generate the bowtie/bowtie2 index of this "GQ2" based on the FASTA sequence, and then map my fastq file to that "GQ2" genome. is it correct?
    Thank you!

    Leave a comment:


  • blancha
    replied
    heatmap.2 is actually not a function of FactoMineR or DESeq2, but a function of the package gplots.
    You just need to set the columns names of the matrix given in input to heatmap.2 to the samples names for the samples labels to appear on the plot.

    Here is the R code to make a PCA plot with FactoMineR.
    You don't need to use quanti.sup or quali.sup.

    Code:
    library(DESeq2)
    library(FactoMineR)
    library(RColorBrewer)
    
    # dds is the DESeqDataSet object created with DESeq2.
    # rlog: "Regularized" log transformation
    rld <- rlog(dds)
    
    # Transpose of the matrix of the count data.
    # It's important to remember to give into input to FactoMineR the transpose of the matrix.
    assay.rld.t <- t(assay(rld))
    
    ##################
    # FactoMineR PCA #
    ##################
    pca <- PCA(assay.rld.t, graph=FALSE) 
    
    # Colors. You'll have to adjust this to the number of conditions and replicates in your experiment.
    # I highly recommend using the brewer palettes.
    colors.brewer <- brewer.pal(n=4, name="Set1")
    colors <- c(rep(colors.brewer[1], 3),
                rep(colors.brewer[2], 3),
                rep(colors.brewer[3], 3),
                rep(colors.brewer[4], 3))
    
    # FactoMineR PCA plot           
    pdf(file.path(outputDirectoryPlots, "PCA_with_colors.pdf"))
    plot.PCA(pca, habillage="ind", col.hab=colors)
    dev.off()

    Leave a comment:


  • dpryan
    replied
    As blancha said, you don't have to use DESeq2's plotting functions. In fact, they're pretty simple to just modify to include sample labels (I've modified them previously to include various batch effects without much effort).

    Leave a comment:


  • super0925
    replied
    Originally posted by blancha View Post
    @super0925

    FactoMineR is far superior to DESeq2's plotPCA() function.
    Look how much more informative the plot in attachment, generated with FactoMineR, is than the plot generated with DESeq2's plotPCA() function.
    Amongst other advantages, the samples are clearly labeled,

    You can easily identify the samples in the heatmap though, as illustrated in the attached example. You need to check your R code.

    Thank you, It is what I want.
    In FactoMineR function, I have 2 questions:
    I found that in PCA function there are quanti.sup, quali.sup parameters, what are these?
    Could you pls give me some suggestions or command. Thank you!
    Last edited by super0925; 07-15-2014, 03:00 AM.

    Leave a comment:


  • blancha
    replied
    @super0925

    FactoMineR is far superior to DESeq2's plotPCA() function.
    Look how much more informative the plot in attachment, generated with FactoMineR, is than the plot generated with DESeq2's plotPCA() function.
    Amongst other advantages, the samples are clearly labeled,

    You can easily identify the samples in the heatmap though, as illustrated in the attached example. You need to check your R code.
    Attached Files
    Last edited by blancha; 07-14-2014, 10:19 AM.

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    Given how the math works for PCA, I wouldn't expect conditions to always be nicely separated. It's always best to be very careful when excluding a sample. While one of the C2 samples clusters alone, it doesn't appear to be an outlier. Presuming you have additional samples that you plan to use for validation, you can always get results with and without the possible outlier sample then see which validates better (just choose a few non-overlapping hits from each result set).
    How could I know which sample is outlier? The heatmap and PCA plot don't give the subtitle or label. Thx!

    Leave a comment:


  • dpryan
    replied
    Given how the math works for PCA, I wouldn't expect conditions to always be nicely separated. It's always best to be very careful when excluding a sample. While one of the C2 samples clusters alone, it doesn't appear to be an outlier. Presuming you have additional samples that you plan to use for validation, you can always get results with and without the possible outlier sample then see which validates better (just choose a few non-overlapping hits from each result set).

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    Ask a local biologist, it'll be quicker to explain with a quick little drawing on a
    white board.


    That's how you'd add it if it's coming from a plasmid (though you should add the sequence of the entire construct), yes. You would need to redo the index. You can just add the appropriate lines to your annotation (again, ask a local biologist to help with this if it's unclear what's actually important).



    If the virus is infecting a eukaryote, then the host transcriptome would be spliced anyway, so tophat would make sense there (even though the virus is rather unlikely to produce any spliced reads). If it's infecting a prokaryote, then bowtie2 would make more sense.


    Hi D
    Another question,
    When I use DESeq2 on the one of my data. The PCA plot and heatmap are like those which listed in the attached figure.
    As you see , I have 6 samples , but it is not obviously separated from condition 1 (C1) and condition 2(C2). So what could I do? remove outlier sample?
    Cheers
    Attached Files

    Leave a comment:


  • dpryan
    replied
    Originally posted by super0925 View Post
    Q1:
    I am sorry I don't get totally what you mean.
    Ask a local biologist, it'll be quicker to explain with a quick little drawing on a
    white board.

    I have an indenpedent fasta file like
    >Luciferase
    ATGGAAGACGCCAAAAACATAAAGAAAGGCCCGGCGCCATTCTATCCGCTGGAAGATGGA
    Do you mean add this to the human genome? (e.g. genome.fa which has >chr1,>chr2,....)
    That's how you'd add it if it's coming from a plasmid (though you should add the sequence of the entire construct), yes. You would need to redo the index. You can just add the appropriate lines to your annotation (again, ask a local biologist to help with this if it's unclear what's actually important).

    Q2.2:
    How about virus?
    If the virus is infecting a eukaryote, then the host transcriptome would be spliced anyway, so tophat would make sense there (even though the virus is rather unlikely to produce any spliced reads). If it's infecting a prokaryote, then bowtie2 would make more sense.

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    1. If the luciferase is integrated into the genome then just try to match however that was done. If it's in a plasmid, just create the plasmid sequence and add that to the reference.

    2.1. The defaults are usually acceptable. Have a look at the alignments and alignment statistics and if they seem unacceptable then try to determine why and what parameters might remedy things. There's no boiler-plate solution that can be given for this.

    2.2. If you're doing RNAseq in bacteria then just use bowtie2. Tophat is only useful when there's splicing.
    Q1:
    I am sorry I don't get totally what you mean.
    I have an indenpedent fasta file like
    >Luciferase
    ATGGAAGACGCCAAAAACATAAAGAAAGGCCCGGCGCCATTCTATCCGCTGGAAGATGGA
    Do you mean add this to the human genome? (e.g. genome.fa which has >chr1,>chr2,....)
    How about annotation file (i.e. .gtf file) and bowtie2 index?

    Q2.1:
    Thank you I got it.

    Q2.2:
    How about virus?

    Leave a comment:


  • dpryan
    replied
    1. If the luciferase is integrated into the genome then just try to match however that was done. If it's in a plasmid, just create the plasmid sequence and add that to the reference.

    2.1. The defaults are usually acceptable. Have a look at the alignments and alignment statistics and if they seem unacceptable then try to determine why and what parameters might remedy things. There's no boiler-plate solution that can be given for this.

    2.2. If you're doing RNAseq in bacteria then just use bowtie2. Tophat is only useful when there's splicing.

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    Just add luciferase to your reference genome. You could also just align the unmapped reads to the luciferase sequence, but that might have decreased accuracy (in practice, the accuracy change is likely minor).
    Thank you D!
    My questions:

    Q1:How to do that? So far I only have the luciferase coding sequence in fasta format. The specie is human.

    Q2:So far I use Tophat mapping with default parameter (on human genome), but I know if we change the paramter if might have different results.
    Q2.1 Besides some 'most significant' parameters e.g. single end or pair end reads or library-type (unstranded, firststrand), What parameter are most significant to set?
    for example
    in mapping the parameter they have min(max) intron length, max mutihits, read -mismatches
    Q2.2 How do I know what parameter I use? from species (e.g. Bacteria don't have introns) ? or what?
    Last edited by super0925; 07-03-2014, 01:42 AM.

    Leave a comment:


  • dpryan
    replied
    Just add luciferase to your reference genome. You could also just align the unmapped reads to the luciferase sequence, but that might have decreased accuracy (in practice, the accuracy change is likely minor).

    Leave a comment:


  • super0925
    replied
    Originally posted by dpryan View Post
    In general, you'll need an annotation file with rRNA, tRNA, etc. in there. Then you can count according to that. I believe that both htseq-count and featureCounts should allow that. For htseq-count, you'd probably need to change the -t option, I don't recall what the equivalent is in featureCounts.
    Hi another quick question, suppose my library contains 300 human genes and luciferase mRNA, could I check the expression level of this luciferase ? Thank you!

    Leave a comment:


  • dpryan
    replied
    In general, you'll need an annotation file with rRNA, tRNA, etc. in there. Then you can count according to that. I believe that both htseq-count and featureCounts should allow that. For htseq-count, you'd probably need to change the -t option, I don't recall what the equivalent is in featureCounts.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM
  • seqadmin
    Techniques and Challenges in Conservation Genomics
    by seqadmin



    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

    Avian Conservation
    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
    03-08-2024, 10:41 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:37 PM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 06:07 PM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-22-2024, 10:03 AM
0 responses
49 views
0 likes
Last Post seqadmin  
Started by seqadmin, 03-21-2024, 07:32 AM
0 responses
67 views
0 likes
Last Post seqadmin  
Working...
X