Seqanswers Leaderboard Ad

**ZoeG** · 07-24-2013, 09:24 AM

A little bit more details,

20,000+ reads with >40C at the heads were only found in the left reads.
only a few of this kind of reads were found in the right one.
And only happened to C, not A, T, G.

**rskr** · 07-24-2013, 09:37 AM

Those look like low complexity reads. Sometimes they get marked as low complexity, or too many multimaps, if they aren't marked you can take some of the reads and run blast to figure out where they are mapping. They are probably real data, there are regions in genomes like that, many regions in fact.

**ZoeG** · 07-24-2013, 10:07 AM

Originally posted by rskr View Post

Those look like low complexity reads. Sometimes they get marked as low complexity, or too many multimaps, if they aren't marked you can take some of the reads and run blast to figure out where they are mapping. They are probably real data, there are regions in genomes like that, many regions in fact.

Yes, I found miRNAs could have long and continuous C.
But for these 100 bps reads, I tried UCSC blat and NCBI blast, it seems these reads matched nothing.

Another question, if it is real, I found "CCCC....C" in the left reads, should I find symmetrical reads in the corresponding right reads ? Or it is not necessary?

**rskr** · 07-24-2013, 10:34 AM

Originally posted by ZoeG View Post

Yes, I found miRNAs could have long and continuous C.
But for these 100 bps reads, I tried UCSC blat and NCBI blast, it seems these reads matched nothing.

Another question, if it is real, I found "CCCC....C" in the left reads, should I find symmetrical reads in the corresponding right reads ? Or it is not necessary?

Did you turn off low complexity filtering on BLAST and BLAT?

**ZoeG** · 07-24-2013, 11:21 AM

Originally posted by rskr View Post

Did you turn off low complexity filtering on BLAST and BLAT?

After turning off complexity filtering, blastn found no significant similar by searching database Mouse G+T using Megablast; using database Nucleotide collection (nr/nt), it gave a list, with one record for mouse, Mus musculus BAC clone RP24-289J17 from chromosome 14, complete sequence, coverage 52%, score 84.2, ident 96%.

Seems confusing to me..

**swbarnes2** · 07-24-2013, 03:02 PM

Let's start with the obvious...what's the quality string look like? I bet it's all just noisy garbage.

**rskr** · 07-24-2013, 06:18 PM

Originally posted by ZoeG View Post

After turning off complexity filtering, blastn found no significant similar by searching database Mouse G+T using Megablast; using database Nucleotide collection (nr/nt), it gave a list, with one record for mouse, Mus musculus BAC clone RP24-289J17 from chromosome 14, complete sequence, coverage 52%, score 84.2, ident 96%.

Seems confusing to me..

so you are saying you BLASTed it, but it didn't return anything, then you turned off low complexity filtering for BLAST, and BLASTing it did return something significant with 96% identity. Which part matched? Maybe you found a missing chunk of the mouse genome!

**Richard Finney** · 07-24-2013, 06:50 PM

Coverage 52%, though, most of it was probably the stretch of Cs.
You can blast the various genomes at NCBI blast and that's as good as you'll get.
It's likely just a junk read. The 90% mapped is a good enough run. Don't worry about the junk, it's normal. Sometime these unmapped reads to go to contaminating bacteria or viruses but in your case it's probably just junk.

**ZoeG** · 07-26-2013, 11:25 AM

The matched part is the stretch. Those 'C' was miserably threw out.
Yes, it seems these reads are just junk. The quality strings of this kind of reads show a lot of '#'.
Thanks, all.
It is funny that the machine loves only 'C', not A, T or G.

Topics	Statistics	Last Post
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, Today, 12:17 PM	0 responses 7 views 0 likes	Last Post by seqadmin Today, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, Yesterday, 10:49 AM	0 responses 18 views 0 likes	Last Post by seqadmin Yesterday, 10:49 AM
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 24 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 21 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM

Seqanswers Leaderboard Ad

Announcement

interesting unmapped reads

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News