Seqanswers Leaderboard Ad

**Blahah404** · 09-11-2013, 08:11 AM

I used NCBI BLAST to check the identity of the first two sequences. The first one is mammalian ribosomal RNA...

Code:

TPA: Mus musculus ribosomal DNA, complete repeating unit	99.6	99.6	100%	1e-18	100%	BK000964.3
Chain 5, Structure Of The H. Sapiens 60s Rrna	99.6	99.6	100%	1e-18	100%	3J3F_5

The second is Arabidopsis mRNA:

Code:

Arabidopsis thaliana clone 2531 mRNA, complete sequence	99.6	99.6	100%	1e-18	100%	AY086470.1

What species are you sequencing?

**foehn** · 09-11-2013, 06:59 PM

Hi Blahah404, it is rat, so I didn't search other species. It is quite a surprise to learn there may be mouse and Arabidopsis mixed in.

**Blahah404** · 09-12-2013, 12:58 AM

It's not that unusual to get contamination, either at the wetlab stage, or at the sequencing centre. If you don't work on Arabidopsis you might want to check some more of the overrepresented sequences in NCBI, and if there's a significant amount of contamination you can filter it out using bowtie2 against the Arabidopsis transcriptome. Same for rRNA using the Silva rRNA database.

**foehn** · 09-12-2013, 11:17 PM

I've checked it with our sequencing experimentalist, it is certain now there are contaminants from other species. According to the blast results of ~ top 20 overrepresented sequences, there are at least Arabidopsis, human, and mouse. Filtering against Arabidopsis genome may work, but for the human and mouse pollutions, would doing similar alignments filter out rat genome as well due to the mammalian homology? Any advice, thanks.

**ewels** · 09-12-2013, 11:36 PM

It might be worth having a look at the data with FastQ Screen - we routinely use this along with FastQC to check for potential contamination.

In addition to showing you what other species you have contamination from, it will show you whether the reads matching those species are unique. If so, you can safely ignore them and just map against the reference genome you're interested in. If they come up red (matching multiple genomes) then you'll need to filter them out.

Final plug - we use Trim Galore! to remove adapter contamination. If your contaminants are only a few sequences, it's relatively easy to get Trim Galore! to remove these from your library as well.

**foehn** · 09-13-2013, 12:15 AM

Hi tallphil, thanks for the software introduced. The problem is this sample is not simply contaminated by adapters (only ~2%), there is a huge amount (>40%) of foreign species pollutants including human and mouse which may share homology with rat, so it is difficult to decide what to remove.

**fkrueger** · 09-13-2013, 01:05 AM

Hi Foehn,

Do you have any idea where all these contaminants are coming from? In any case, expanding of what Phil has recommeded I'd like to suggest the following strategy:

Running FastQ Screen is normally a good idea to get a quick idea if you've got contaminating species, however this does have the limitation that it doesn't normally work for bisulfite converted sequences unless you use especially prepared genomes (and even then you would get problems with methylated seqyuences). Looking at some of the sequences in the list it would appear however that your contaminating sequence are not bisulfite converted, in which case FastQ screen should work just fine. Since normal genomic sequences look like fully methylated sequences it is all the more important to remove these sequences since they could potentially affect the conclusions you draw from your experiment later on.

Here is what I would do:
1) Identify contaminating species using FastQ screen or similar things (you have already identified human, mouse and Arabidopsis)
2) Align sequences against the contaminating genomes with Bismark using the option --unmapped. This will then write out FastQ files of all sequence that did not map against the contaminant, in other words remove sequences that align to the contaminants.
3) Repeat step 2) for all contaminants
4) Use the remaining unmapped FastQ files to align against the Rat genome and see if the results make any sense

**foehn** · 09-13-2013, 01:46 AM

Hi fkrueger, no idea about the source, nor do the sequencing stuff know clearly; they only told me the pollution may be brought in after library preparation or during sequencing.

**Blahah404** · 09-13-2013, 02:56 AM

Foehn,

In addition to the good advice given by others above, because you've got over 40% contamination, I would consider asking the sequencing centre to resequence that sample free of charge. We usually get some consideration from the sequencing centre in these cases.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

RRBS overrepresented sequences

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News