Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Brian Bushnell
    replied
    You can't determine copy number of genes from RNA-seq data, as the expression is not really determined by the copy-number (and is highly variable anyway). The best way is probably to map WGS DNA reads to the genome and then use bedtools to extract coverage over coding regions, and look for regions with different-than-expected coverage. This will remove a lot of the bias caused by mapping DNA reads to a transcriptome. Also, note that exome-capture will not yield useful quantitative results, either, due to extreme bias, without extensive calibration for a given kit.

    Alternately, it would probably be a lot cheaper to run a SNP chip, or qPCR, or a probe array, or something like that, rather than doing sequencing. I can't remember which one is best for this but at least one is designed to do copy-count quantification without sequencing, across a genome.

    Leave a comment:


  • JamesJoyce
    replied
    Update: Results still have high standard deviations, even after only keeping the longest isoforms of the sequences found in the reference transcript.

    I really like all the features BBMap holds, and as such I would like to keep using BBMap for a foreseeable future. Is there any way I could use BBMap to do this sort of analysis (determining copy number of certain genes within an individual organism), but instead by using RNA-Seq read sequences? Thanks again for your time guys.

    Leave a comment:


  • JamesJoyce
    replied
    Thank you all for your replies. I will be trying some of the things you have all suggested.
    As for running samtools depth with BBMap's outputs, I get 0 coverage and Standard Deviation values. I first convert BBMap's sam output to bam, and then proceed to sort it using samtools. It is then that I try using samtools depth.

    Edit:
    I tried using one read to verify if there was a problem with the random selection of sites for ambiguously-mapped paired reads. However, the results were very similar, leading me to think that I should be focusing on extracting the longest isoform per gene and seeing how that looks like. However, I am unable to produce coverage and STDV directly from BBMap, as it tells me that "hist=" is an unknown command. Thank you again Brian.
    Last edited by JamesJoyce; 10-06-2015, 06:39 AM.

    Leave a comment:


  • GenoMax
    replied
    It is a strange analysis and with all those caveats going to be of limited real world use for eukaryotic genomes.

    Leave a comment:


  • Brian Bushnell
    replied
    Well... it is kind of a strange analysis... and a normal transcriptome file, with many isoforms per gene, will not very useful. But, if you had exactly one transcript per gene in your reference, then theoretically, genes with 2 copies should get double the coverage of genes with 1 copy, if you map uniform WGS data. Using the "local" flag would be helpful because wherever reads go into introns they will not match the transcriptome. Also, pseudogenes will cause confusion. So I'm not sure how much signal there will be compared to the noise, but with those assumptions (one transcript per gene, local alignments, fairly uniform coverage, no pseudogenes), it should work.
    Last edited by Brian Bushnell; 10-05-2015, 11:07 AM.

    Leave a comment:


  • GenoMax
    replied
    @Brian: How is this analysis is going to give information about number of copies of a gene to @DarthMapper/@JamesJoyce? Am I missing something obvious?

    Leave a comment:


  • Brian Bushnell
    replied
    First off, you should expect a high standard deviation in this scenario. If one gene has 10 isoforms and another has 1 isoform, the gene with 1 isoform will have (on average) 10x the coverage of the isoforms from the gene with 10, when ambiguously-mapped reads are placed randomly. You need to use exactly one isoform per gene (preferably the longest).

    However, it's also possible that there's a problem with the random selection of sites for ambiguously-mapped paired reads. Would you mind trying this (mapping with read 1 only):

    bbmap.sh build=1 in1=catreads/read_1.fastq ambig=random local=t out=mRNA.sam covstats=covstats.txt hist=histogram.txt

    Note that bbmap can internally generate coverage information and does not need a second pass from pileup. As for why samtools depth is not working, I have no idea. Does it give some kind of error message?

    Leave a comment:


  • JamesJoyce
    replied
    Thanks for your reply Brian.
    I am using BBMap to align WGS reads to a set of mRNA sequences extracted from the organism's NCBI transcriptome. This is being done with different organisms, such as mammals, reptiles, and fungi. So far, I have tested a few mammals, including Felix Catus (cat) and Rattus Norvegicus (Rat). Again, I was instructed to find the copy number of certain genes within a single organism. My PI told me to map WGS sequences to the transcriptome's mRNA sequences as reference before mapping to individual gene sequences, but we stopped once we saw the results.

    The results for both mammals are very similar, so I will only post one example

    Here is the commandline I am using and the results:

    Index Build
    bbmap.sh ref=mRNA.fa build=1

    Alignment
    bbmap.sh build=1 in1=catreads/read_1.fastq in2=catreads/read_2.fastq ambig=random local=t out=mRNA.sam

    Coverage and Standard Deviation:
    pileup.sh in=mRNA.sam out=covstats.txt hist=histogram.txt

    P.S. I have tried to use samtools depth to calculate coverage and STDV with BBMap's sam outputs but it does not let me - even when I have converted the sam's to bam, and sorted the bam. How can this be achieved?

    CoverageResults
    Set USE_COVERAGE_ARRAYS to true
    Set USE_BITSETS to false
    Note: Coverage capped at 65535

    Average coverage: 15.29
    Standard deviation: 384.14
    Percent scaffolds with any coverage: 99.81
    Percent of reference bases covered: 69.32

    Is there any way to lower that STDV?

    Leave a comment:


  • Brian Bushnell
    replied
    Hi James,

    Can you give me the exact command line you are using?

    Also, the complete output printed to the screen would be helpful, as well as the name of the organism.
    Last edited by Brian Bushnell; 10-04-2015, 05:27 PM.

    Leave a comment:


  • JamesJoyce
    replied
    Hello Brian;
    I am trying to do something similar to what DarthMapper was suggesting.
    My PI had instructed me to determine the copy number of certain genes in a single organism by mapping WGS reads to said gene sequences as reference. I then am supposed to compare coverage values of these individual genes with that obtained from mapping the WGS reads to the whole set of mRNA sequences present in the studied organism's transcriptome. The problem is that I am also getting strange standard deviations when using BBMap.
    What is wrong with my methodology?

    Leave a comment:


  • GenoMax
    replied
    One would need BBMap to create the alignment file that can then be used with bedtools

    Leave a comment:


  • Brian Bushnell
    replied
    I think you can do that with Bedtools and a gene annotation file.

    Leave a comment:


  • DarthMapper
    replied
    Well, usually we map the WGS reads to the genome assembly for quality testing. But, how exactly would we extract the information corresponding to the transcriptome? Would I be able to use BBMap for this?

    Leave a comment:


  • HESmith
    replied
    You should be mapping to the genome, then extracting the information that corresponds to the transcriptome. I'm not surprised that your standard deviations are wonky. It's probably due to repeat elements in the genome that are under-represented in the transcriptome, which would produce much higher-than-expected coverage. You can check by BLASTing a few examples of your high-coverage regions against the genome, and see how many hits you get.

    Leave a comment:


  • DarthMapper
    replied
    I'm so sorry; I was editing my last reply when you responded.

    I forgot to mention that we are trying to determine copy number via read depth analysis, so we need to know the average and SD read mapping to all genes present in the genome (to compare against the ambiguous genes in question).

    We are using WGS reads, which are DNA, and we are mapping these to the transcriptome.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 11:49 AM
0 responses
10 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X