Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • mgolo
    replied
    Originally posted by Xi Wang View Post
    Hi Maria

    1&2. The methods for DEG detection and the normalization beforehand should depend on how your data distributed. You may try all of them and choose the best one.

    3. For biological replicates, it's better not to pool them together.

    4. Raw read counts have nothing to do with gene annotation. In our documents, the opposite of 'raw read counts' is RPKM vaules. For the unannotated non-RNAs, you'd better analyze the gene structure first and then the DEGs.


    Btw, we are working a new version of DEGseq, which will be more suitable for biological replicates.
    Thanks for your reply Xi

    I'll try all the methods when i have my annotation file. But, what are the criteria to know which one is the best?

    Looking forward to your new version of DEGseq!

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by mgolo View Post
    Hi all!

    I´m new to the NGS business, and right now i have a lot of doubts about DE analysis.

    I have RNA-sequenced a bacterial transcriptome in 2 growth conditions, and I have 3 biological replicates for each condition:

    Condition A : Replicate 1A, Replicate 2A, Replicate 3A
    Condition B : Replicate 1B, Replicate 2B, Replicate 3B

    I have the bam an pileup files for each replicate.

    Now, my aim is compare the expression of non-annotated non-coding RNAs in my conditions A and B (so i will use a custom annotation file).

    I have read about DEGseq and i would like to use it for my DE analysis. But i have a number of questions about it:

    1. What method would suit my analysis best? I have thought of using MARS...

    2. How do I normalize my replicates? Should i use loess or median? What´s the difference between them?

    3. What is better: to pool the 3 replicates of each condition or to analyze DE without pooling them?

    4. Since my transcripts are not annotated i will have to use expression values based on raw read counts, right? Can i use the rawCount argument with the DEGseq function or is it only valid with the DEGexp function? If i use the MARS method is it automatically set to analyze raw counts?

    Thanks in advance for your help!

    Maria
    Hi Maria

    1&2. The methods for DEG detection and the normalization beforehand should depend on how your data distributed. You may try all of them and choose the best one.

    3. For biological replicates, it's better not to pool them together.

    4. Raw read counts have nothing to do with gene annotation. In our documents, the opposite of 'raw read counts' is RPKM vaules. For the unannotated non-RNAs, you'd better analyze the gene structure first and then the DEGs.


    Btw, we are working a new version of DEGseq, which will be more suitable for biological replicates.

    Leave a comment:


  • mgolo
    replied
    DEGseq and expression of novel small RNAs

    Hi all!

    I´m new to the NGS business, and right now i have a lot of doubts about DE analysis.

    I have RNA-sequenced a bacterial transcriptome in 2 growth conditions, and I have 3 biological replicates for each condition:

    Condition A : Replicate 1A, Replicate 2A, Replicate 3A
    Condition B : Replicate 1B, Replicate 2B, Replicate 3B

    I have the bam an pileup files for each replicate.

    Now, my aim is compare the expression of non-annotated non-coding RNAs in my conditions A and B (so i will use a custom annotation file).

    I have read about DEGseq and i would like to use it for my DE analysis. But i have a number of questions about it:

    1. What method would suit my analysis best? I have thought of using MARS...

    2. How do I normalize my replicates? Should i use loess or median? What´s the difference between them?

    3. What is better: to pool the 3 replicates of each condition or to analyze DE without pooling them?

    4. Since my transcripts are not annotated i will have to use expression values based on raw read counts, right? Can i use the rawCount argument with the DEGseq function or is it only valid with the DEGexp function? If i use the MARS method is it automatically set to analyze raw counts?

    Thanks in advance for your help!

    Maria

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by newbietonextgen View Post
    Hi Xi,

    I finally figured out what the problem was with DEGseq execution. The R installation in mac does not come with the Tcl/Tk libraries. Once i down loaded it, it ran fine, as far loading all the needed libararies.

    > library(DEGseq)
    Loading required package: qvalue
    Loading Tcl/Tk interface ... done
    Loading required package: ShortRead
    Loading required package: IRanges

    Attaching package: 'IRanges'

    The following object(s) are masked from 'package:base':

    cbind, eval, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int,
    rbind, rep.int, table

    Loading required package: GenomicRanges
    Loading required package: Biostrings
    Loading required package: lattice
    Loading required package: Rsamtools
    Loading required package: samr
    Loading required package: impute

    Now i run into another problem. Please read the output below. First the mapresults don't show any path as per the example. But i am not sure if it happens in all operating systems. Further down it shows that it cannot read the input file. I am not sure about it. All i did was take a sorted BAM file and convert it to BED format using BEDtools. Does it need any other input? Any help is appreciated.


    Thnaks

    Please wait...

    mapResultBatch1:

    mapResultBatch2:

    file format: bed
    refFlat:
    Ignore the strand information when count the reads mapped to genes!
    Count the number of reads mapped to each gene ...
    This will take several minutes, please wait patiently!
    Please wait...

    does not exist!
    SampleFiles:
    Count the number of reads mapped to each gene.
    This will take several minutes.
    Please wait ...
    cannot open input file
    There is something wrong!
    Please check !
    There is something wrong!Please check...
    Error in file(file, "rt") : cannot open the connection
    In addition: Warning message:
    In file(file, "rt") :
    cannot open file '/var/folders/Bl/BlOaI4RVFYyvhEI-W+aTz++++TI/-Tmp-//RtmpuyIAOK/DEGseqExample/group1.exp': No such file or directory
    Hi,

    Please show me your R script to run DEGseq. You can email me: [email protected] , if you don't want to put the details here.

    Thanks.
    Last edited by Xi Wang; 12-11-2010, 10:12 AM.

    Leave a comment:


  • newbietonextgen
    replied
    Hi Xi,

    I finally figured out what the problem was with DEGseq execution. The R installation in mac does not come with the Tcl/Tk libraries. Once i down loaded it, it ran fine, as far loading all the needed libararies.

    > library(DEGseq)
    Loading required package: qvalue
    Loading Tcl/Tk interface ... done
    Loading required package: ShortRead
    Loading required package: IRanges

    Attaching package: 'IRanges'

    The following object(s) are masked from 'package:base':

    cbind, eval, Map, mapply, order, paste, pmax, pmax.int, pmin, pmin.int,
    rbind, rep.int, table

    Loading required package: GenomicRanges
    Loading required package: Biostrings
    Loading required package: lattice
    Loading required package: Rsamtools
    Loading required package: samr
    Loading required package: impute

    Now i run into another problem. Please read the output below. First the mapresults don't show any path as per the example. But i am not sure if it happens in all operating systems. Further down it shows that it cannot read the input file. I am not sure about it. All i did was take a sorted BAM file and convert it to BED format using BEDtools. Does it need any other input? Any help is appreciated.


    Thnaks

    Please wait...

    mapResultBatch1:

    mapResultBatch2:

    file format: bed
    refFlat:
    Ignore the strand information when count the reads mapped to genes!
    Count the number of reads mapped to each gene ...
    This will take several minutes, please wait patiently!
    Please wait...

    does not exist!
    SampleFiles:
    Count the number of reads mapped to each gene.
    This will take several minutes.
    Please wait ...
    cannot open input file
    There is something wrong!
    Please check !
    There is something wrong!Please check...
    Error in file(file, "rt") : cannot open the connection
    In addition: Warning message:
    In file(file, "rt") :
    cannot open file '/var/folders/Bl/BlOaI4RVFYyvhEI-W+aTz++++TI/-Tmp-//RtmpuyIAOK/DEGseqExample/group1.exp': No such file or directory

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by newbietonextgen View Post
    No. I have tried both formats: giving the path to the file and then setting up the working dir and then naming the file. I am using a 64 bit R and i am nots sure if it a problem with it.

    This is how the console looks:
    >library(DEGseq)
    Loading required package: qvalue
    Loading Tcl/Tk interface
    > sample A <- "path to the file (bed.txt)"
    |

    So there was no screen message after i hit return...
    I found that you didn't use the most updated version of DEGseq.
    Please download the newest version from :
    DEGseq is an R package to identify differentially expressed genes from RNA-Seq data.


    And second, in R, variables can't have space in them; And you should tell it where is your file, but not the sentence.
    E.g.,
    Code:
    sample_A <- "/home/username/data.bed"
    Last edited by Xi Wang; 12-06-2010, 08:43 AM.

    Leave a comment:


  • newbietonextgen
    replied
    No. I have tried both formats: giving the path to the file and then setting up the working dir and then naming the file. I am using a 64 bit R and i am nots sure if it a problem with it.

    This is how the console looks:
    >library(DEGseq)
    Loading required package: qvalue
    Loading Tcl/Tk interface
    > sample A <- "path to the file (bed.txt)"
    |

    So there was no screen message after i hit return...

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by newbietonextgen View Post
    Thanks Xi for the quick reply. It was a BED format file. I converted using the samTobed tools.
    I just saw you updated the message.
    Were there any screen display?

    Leave a comment:


  • newbietonextgen
    replied
    Thanks Xi for the quick reply. It was a BED format file. I converted using the samTobed tools.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by newbietonextgen View Post
    Hello all,

    I have a 1.0 GB data file and was wondering how long it would take for the program to load this data? All i get after showing the path to sample A, is a spinning ball (mac) that keeps going on for half hour. I just kill the process thinking some thing is wrong. Do i have to be patient ? The computer has 8 gb ram if that help. So please let me know. Thanks
    What kind of data file you fed to DEGseq, BED, BAM? Usually, it couldn't need to take so much time to load 1GB data.

    Leave a comment:


  • newbietonextgen
    replied
    help With DEGseq

    Hello all,

    I have a 1.0 GB data file and was wondering how long it would take for the program to load this data? All i get after showing the path to sample A, is a spinning ball (mac) that keeps going on for half hour. I dont get the R prompt again and I just kill the process thinking some thing is wrong. Do i have to be patient ? The computer has 8 gb ram if that help. So please let me know.

    Sample bed format file using the samtobed script

    chr1 15562 15637 ILLUMINA-927B2F_0001:1:110:7901:1208#0/1 10 +
    chr1 15564 15636 ILLUMINA-927B2F_0001:1:92:5422:11873#0/1 10 +
    chr1 15564 15636 ILLUMINA-927B2F_0001:1:117:10103:16792#0/1 10 +
    chr1 16084 16159 ILLUMINA-927B2F_0001:1:3:3987:6468#0/1 10 -

    So please let me know if its the format or i just need the patience.
    Last edited by newbietonextgen; 12-06-2010, 08:07 AM.

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by wdt View Post
    Many thanks for your quick replies about the DEGseq.

    Once BED files are provided, does DEGseq internally compute "raw counts" that are used for differential exp analysis?

    Is there a way to output those raw counts (or equivalent numbers) per sample?

    Thanks a lot!
    you can use the script below.

    Code:
    refFlat <- "refFlat.txt"
    mapResultBatch = c("sample1","sample2","sample3","...") # replace the file names accordingly
    geneExpr <- "geneExpr.txt"   # you may specify the file name to save the gene expresion values
    getGeneExp(mapResultBatch, refFlat=refFlat, output=geneExpr)

    Leave a comment:


  • wdt
    replied
    Many thanks for your quick replies about the DEGseq.

    Once BED files are provided, does DEGseq internally compute "raw counts" that are used for differential exp analysis?

    Is there a way to output those raw counts (or equivalent numbers) per sample?

    Thanks a lot!

    Leave a comment:


  • Xi Wang
    replied
    Originally posted by wdt View Post
    I have RNA-seq data analyzed using tophat that generated bam files for each sample.
    Each group (cases/controls) has 5 samples each.
    Would the following be correct way to use DEGseq
    1. Convert BAMs to BED using sam2bed.pl
    2. Use DEGseq samWrapper to test 5 samples in one group with 5 samples in the other
    to identify diff expressed genes?

    Thanks a lot!
    Agreed. But please note that you need first convert BAM to SAM using samtools.

    Leave a comment:


  • wdt
    replied
    I have RNA-seq data analyzed using tophat that generated bam files for each sample.
    Each group (cases/controls) has 5 samples each.
    Would the following be correct way to use DEGseq
    1. Convert BAMs to SAM to BED using samtools + sam2bed.pl
    2. Use DEGseq samWrapper to test 5 samples in one group with 5 samples in the other
    to identify diff expressed genes?

    Thanks a lot!
    Last edited by wdt; 11-23-2010, 09:20 PM.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin


    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
27 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
31 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
27 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
52 views
0 likes
Last Post seqadmin  
Working...
X