BBMap (aligner for DNA/RNAseq) is now open-source and available for download.

Brian Bushnell replied

10-08-2015, 09:25 AM
You can't determine copy number of genes from RNA-seq data, as the expression is not really determined by the copy-number (and is highly variable anyway). The best way is probably to map WGS DNA reads to the genome and then use bedtools to extract coverage over coding regions, and look for regions with different-than-expected coverage. This will remove a lot of the bias caused by mapping DNA reads to a transcriptome. Also, note that exome-capture will not yield useful quantitative results, either, due to extreme bias, without extensive calibration for a given kit.

Alternately, it would probably be a lot cheaper to run a SNP chip, or qPCR, or a probe array, or something like that, rather than doing sequencing. I can't remember which one is best for this but at least one is designed to do copy-count quantification without sequencing, across a genome.
Leave a comment:
JamesJoyce replied

10-08-2015, 07:08 AM
Update: Results still have high standard deviations, even after only keeping the longest isoforms of the sequences found in the reference transcript.

I really like all the features BBMap holds, and as such I would like to keep using BBMap for a foreseeable future. Is there any way I could use BBMap to do this sort of analysis (determining copy number of certain genes within an individual organism), but instead by using RNA-Seq read sequences? Thanks again for your time guys.
Leave a comment:
JamesJoyce replied

10-05-2015, 02:45 PM
Thank you all for your replies. I will be trying some of the things you have all suggested.
As for running samtools depth with BBMap's outputs, I get 0 coverage and Standard Deviation values. I first convert BBMap's sam output to bam, and then proceed to sort it using samtools. It is then that I try using samtools depth.

Edit:
I tried using one read to verify if there was a problem with the random selection of sites for ambiguously-mapped paired reads. However, the results were very similar, leading me to think that I should be focusing on extracting the longest isoform per gene and seeing how that looks like. However, I am unable to produce coverage and STDV directly from BBMap, as it tells me that "hist=" is an unknown command. Thank you again Brian.

Last edited by JamesJoyce; 10-06-2015, 06:39 AM.
Leave a comment:
GenoMax replied

10-05-2015, 11:22 AM
It is a strange analysis and with all those caveats going to be of limited real world use for eukaryotic genomes.
Leave a comment:
Brian Bushnell replied

10-05-2015, 10:59 AM
Well... it is kind of a strange analysis... and a normal transcriptome file, with many isoforms per gene, will not very useful. But, if you had exactly one transcript per gene in your reference, then theoretically, genes with 2 copies should get double the coverage of genes with 1 copy, if you map uniform WGS data. Using the "local" flag would be helpful because wherever reads go into introns they will not match the transcriptome. Also, pseudogenes will cause confusion. So I'm not sure how much signal there will be compared to the noise, but with those assumptions (one transcript per gene, local alignments, fairly uniform coverage, no pseudogenes), it should work.

Last edited by Brian Bushnell; 10-05-2015, 11:07 AM.
Leave a comment:
GenoMax replied

10-05-2015, 10:47 AM
@Brian: How is this analysis is going to give information about number of copies of a gene to @DarthMapper/@JamesJoyce? Am I missing something obvious?
Leave a comment:
Brian Bushnell replied

10-05-2015, 10:42 AM
First off, you should expect a high standard deviation in this scenario. If one gene has 10 isoforms and another has 1 isoform, the gene with 1 isoform will have (on average) 10x the coverage of the isoforms from the gene with 10, when ambiguously-mapped reads are placed randomly. You need to use exactly one isoform per gene (preferably the longest).

However, it's also possible that there's a problem with the random selection of sites for ambiguously-mapped paired reads. Would you mind trying this (mapping with read 1 only):

bbmap.sh build=1 in1=catreads/read_1.fastq ambig=random local=t out=mRNA.sam covstats=covstats.txt hist=histogram.txt

Note that bbmap can internally generate coverage information and does not need a second pass from pileup. As for why samtools depth is not working, I have no idea. Does it give some kind of error message?
Leave a comment:
JamesJoyce replied

10-04-2015, 07:04 PM
Thanks for your reply Brian.
I am using BBMap to align WGS reads to a set of mRNA sequences extracted from the organism's NCBI transcriptome. This is being done with different organisms, such as mammals, reptiles, and fungi. So far, I have tested a few mammals, including Felix Catus (cat) and Rattus Norvegicus (Rat). Again, I was instructed to find the copy number of certain genes within a single organism. My PI told me to map WGS sequences to the transcriptome's mRNA sequences as reference before mapping to individual gene sequences, but we stopped once we saw the results.

The results for both mammals are very similar, so I will only post one example

Here is the commandline I am using and the results:

Index Build
bbmap.sh ref=mRNA.fa build=1

Alignment
bbmap.sh build=1 in1=catreads/read_1.fastq in2=catreads/read_2.fastq ambig=random local=t out=mRNA.sam

Coverage and Standard Deviation:
pileup.sh in=mRNA.sam out=covstats.txt hist=histogram.txt

P.S. I have tried to use samtools depth to calculate coverage and STDV with BBMap's sam outputs but it does not let me - even when I have converted the sam's to bam, and sorted the bam. How can this be achieved?

CoverageResults
Set USE_COVERAGE_ARRAYS to true
Set USE_BITSETS to false
Note: Coverage capped at 65535

Average coverage: 15.29
Standard deviation: 384.14
Percent scaffolds with any coverage: 99.81
Percent of reference bases covered: 69.32

Is there any way to lower that STDV?
Leave a comment:
Brian Bushnell replied

10-04-2015, 04:54 PM
Hi James,

Can you give me the exact command line you are using?

Also, the complete output printed to the screen would be helpful, as well as the name of the organism.

Last edited by Brian Bushnell; 10-04-2015, 05:27 PM.
Leave a comment:
JamesJoyce replied

10-04-2015, 01:23 PM
Hello Brian;
I am trying to do something similar to what DarthMapper was suggesting.
My PI had instructed me to determine the copy number of certain genes in a single organism by mapping WGS reads to said gene sequences as reference. I then am supposed to compare coverage values of these individual genes with that obtained from mapping the WGS reads to the whole set of mRNA sequences present in the studied organism's transcriptome. The problem is that I am also getting strange standard deviations when using BBMap.
What is wrong with my methodology?
Leave a comment:
GenoMax replied

09-24-2015, 09:51 AM
One would need BBMap to create the alignment file that can then be used with bedtools
Leave a comment:
Brian Bushnell replied

09-24-2015, 09:49 AM
I think you can do that with Bedtools and a gene annotation file.
Leave a comment:
DarthMapper replied

09-24-2015, 08:11 AM
Well, usually we map the WGS reads to the genome assembly for quality testing. But, how exactly would we extract the information corresponding to the transcriptome? Would I be able to use BBMap for this?
Leave a comment:
HESmith replied

09-23-2015, 03:01 PM
You should be mapping to the genome, then extracting the information that corresponds to the transcriptome. I'm not surprised that your standard deviations are wonky. It's probably due to repeat elements in the genome that are under-represented in the transcriptome, which would produce much higher-than-expected coverage. You can check by BLASTing a few examples of your high-coverage regions against the genome, and see how many hits you get.
Leave a comment:
DarthMapper replied

09-23-2015, 11:04 AM
I'm so sorry; I was editing my last reply when you responded.

I forgot to mention that we are trying to determine copy number via read depth analysis, so we need to know the average and SD read mapping to all genes present in the genome (to compare against the ambiguous genes in question).

We are using WGS reads, which are DNA, and we are mapping these to the transcriptome.
Leave a comment:

Previous 1 22 29 30 31 32 33 34 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 10 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News