Seqanswers Leaderboard Ad

**blancha** · 11-15-2015, 09:42 AM

I only work occasionally with prokaryotes, but here is my attempt to answer your questions.
Don't have absolute trust in what I post.
I do make mistakes.

Question 1.

Seems fine. 20,000,000 reads is sufficient for a human RNA-Seq experiment, so one would think over 3,000,000 reads would be sufficient for a bacteria. I do have a recent E. coli RNA-Seq experiment somewhere, but I'm too lazy to go look up the sequencing depth we used.

There is a way to verify if your coverage is sufficient. It's imperfect and requires a bit of work though. You could randomly select a lower number of reads, and verify if the correlation between the replicates decreases as you decrease the number of reads.

I would think you actually have too many reads, and could sequence at a lower sequencing depth. If the correlation between the replicates does not decrease as you decrease the number of reads, this would confirm that you could sequence at an even lower depth, and cut costs.

I'm too lazy to search for articles, which I'm sure exist, discussing the optimal sequencing depth for bacteria.

It also depends on the level of expression of the genes you are interested in.

Question 2.

Just remove all the reads that map to the rRNA first.
After removing these reads, yes, you can compare the reads from the transcriptome directly, after performing the usual normalization steps relative to the library size, and perhaps the gene length.

Sequencing rRNA is a waste of money, though.
You should remove them before sequencing, if possible.

Question 3.

No. It is not necessary to normalize to a housekeeping gene in RNA-Seq, as opposed to qPCR. You are measuring the relative level of expression of one gene relative to all the other genes. You are measuring a proportion. No other form of normalisation is required, relative to the total amount of cells or relative to a housekeeping gene.

It's also always interesting to see in RNA-Seq experiments just how much variation there is between housekeeping genes. It's a wonder anyone can use them to do normalization.

There is one important caveat.
No normalization is required only if the total amount of RNA produced per cell in each condition is the same.
If this is not the case, you do need to normalize relative to the total amount of RNA produced per cell.
This is an exceptional case, but it does exist.

**szy0931** · 11-19-2015, 02:28 PM

Thank you very much!
Which tool should I use to remove rRNA?

**blancha** · 11-20-2015, 10:10 AM

Just use the same aligner you were going to use to align the reads on the genome.
Pick your favorite: BWA, Bowtie2, ...
I use Bowtie2.

An alternative strategy is to align on a reference genome that includes the ribosomal RNA, and then exclude the reads aligning on rRNA from the differential expression analysis.
I prefer completely removing the rRNA reads first by aligning them on the rRNA sequences first.

Just download a FASTA file containing the rRNA sequences, index the FASTA file, and align all the reads. Keep only the reads that do not align on rRNA for the rest of the analysis.

To build the index: bowtie-build
To align: bowtie (with the option --un-conc)

**szy0931** · 11-20-2015, 03:27 PM

Thank you.
There are many ribosomal copies (8x5S, 7x16S and 7x23S) in the reference genome. So, I combined the 22 rRNA in one fasta file and used the file as the reference to mapping my reads using bowtie. My purpose is to save the unaligned reads into separate files and then can use the files for mapping. However, the step of mapping reads to the fasta file containing the 22 rRNA is very slow. Maybe it did not run at all. Do you think combing the 22 rRNA into one file can cause the problem?

**blancha** · 11-20-2015, 05:14 PM

I do exactly the same thing.

The best way to speed up the alignment is to use multiple cores.
There is a linear relationship between the number of cores used and the speed of alignment.
So doubling the number of cores will just about halve the run time.

Of course, you need to have the cores available on the computer on which you are running the analysis.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

basic questions about gene expression compassion

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News