Seqanswers Leaderboard Ad

**gringer** · 09-15-2015, 09:16 PM

FastQC frequently worries people when there's no need to worry, and doesn't always point out the things that are most important. I've got a few questions:

Are these RNA reads?
What is the expected GC fraction of your target genome?
How much DNA was present in the sample?
Have spike-ins (e.g. ERCC, lambda) been used?
What are the overrepresented sequences?

In a best-case scenario, the double peak in the GC graph and the over-represented sequences could be explained by a spike-in taking up a large proportion of the reads, which would happen if the DNA hadn't been accurately quantified. Alternatively, a targeted sequencing of multiple genes might produce a similar effect.

**Saeideh** · 09-15-2015, 09:32 PM

These are cDNA reads (made from RNA)
I don't know the expected GC fraction of target genome (The data is for someone else and I should analyze it and enhance it).
No spike-ins were used.
There are three overrepresented sequences:

CGCTCGCCGCTACTACGGGAATCGCTTTTGCTTTCTTTTCCTCTGGCTAC
GATACCTAGGTACCCAGAGACGAGGAAGGGCGTAGCAAGCGACGAAATGC
TGGATACCTAGGTACCCAGAGACGAGGAAGGGCGTAGCAAGCGACGAAAT

**gringer** · 09-15-2015, 10:09 PM

Well, a BLAST of all those sequences returns 100% identity matches to chloroplast genomes (probably rice).

My guess is that what you're seeing here is cDNA reads that haven't been properly depleted for high-abundance transcripts, so there is a large amount of contaminant sequences in the data. My ball-park assumption from looking at the GC graph would be that there is about 30% chloroplast sequence in there.

If at all possible, I'd recommend that your collaborator re-sequences these samples including a RiboZero preparation:

Ribosomal RNA (rRNA) Depletion Selection Guide | Compare rRNA removal kits

http://www.illumina.com/products/ribo-zero-rrna-removal-plant.html

Compare key features of ribosomal RNA (rRNA) and globin mRNA depletion kits. View sample type compatibility and the rRNA types removed by each kit.

Otherwise, run a mapping only to the chloroplast sequence of the target (e.g. Oryza sativa) and exclude those sequences (e.g. HISAT2 has "--un-conc" and "--un" options for doing precisely that), then re-run FastQC to see if it changes things. Even with that 30% contamination (assuming it's expected), you still should get reasonable results.

**Saeideh** · 09-16-2015, 02:04 AM

Your answer surprised me. Yeap it's for rice and Oryza sativa. And the way you found the source of contamination made me excited. Smart answers

So now I should find for rice chloroplast sequence and then exclude that from reads. but I don't know how to do it with HISAT as you mentioned. I have to learn it first.

Thank you~Thank you~Thank you

**gringer** · 09-16-2015, 02:30 AM

Originally posted by Saeideh View Post

And the way you found the source of contamination made me excited.

Yes, BLAST is very useful. I'm glad that NCBI still provides a service for "where is this sequence from", despite all the newer locally-faster search tools that are available.

I don't know how to do it with HISAT as you mentioned. I have to learn it first.

Learning HISAT2 would be a good idea, as it's the latest in a new generation of ultra-fast mappers, and has almost identical command-line parameters to Bowtie2. Another option would be STAR, which has a really great manual and might be easier to pick up and use as a naive high-throughput sequencing bioinformatician.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Sequence Duplication Levels failure

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News