Seqanswers Leaderboard Ad

**apfejes** · 03-10-2008, 08:59 AM

I don't work with small rna data from Illumina runs (I'm more familiar with the typical single and paired end runs), but if you give me a couple of lines, showing the header and what the tag sequence is, I might be able to figure this out.

**beelu** · 03-10-2008, 06:25 PM

Thanks alot! I have extracted some lines from the file: The number indicates the number of detection. Each sequence is 33 base long. The adaptor sequence is TCGTATGCCGTCTTCTGCTTG. I want to know the the difference between adaptor sequence location (3" and 5") and also, what happens if adaptor sequence is not there? this means the small rna sequence is longer than 33 base?

277718 TGAGGTAGTAGATTGTATAGTTTCGTATGCCGT
241250 TGAGGTAGTAGGTTGTATGGTTTCGTATGCCGT
166087 TGAGGTAGTAGGTTGTATAGTTTCGTATGCCGT
54345 AGAGGTAGTAGGTTGCATAGTTTCGTATGCCGT
53950 TGAGGTAGTAGTTTGTACAGTTTCGTATGCCGT
35 TCGTATGCCGTCTTCTGCTTGAAAANNNAAAAN
35 TCGTATGCCGTCTTCTGCTTGAAAAAAAAAATA
1 AAAAAAAAAAAAAAAAAAAAAAAACCCATCCCC
1 AAAAAAAAAAAAAAAAAAAAAAAACCCAACCCC
1 AAAAAAAAAAAAAAAAAAAAAAAACCATTTCCT
1 AAAAAAAAAAAAAAAAAAAAAAAACCATTCCCG
1 AAAAAAAAAAAAAAAAAAAAAAAACCATCTTCT
1 AAAAAAAAAAAAAAAAAAAAAAAACCATCCTCT
1 AAAAAAAAAAAAAAAAAAAAAAAACCATCCCCT

**apfejes** · 03-10-2008, 07:27 PM

Ok, I don't know what your protocol is, but some of what you're seeing is caused by the protocol you're using.

First off, take a look at:

54345 AGAGGTAGTAGGTTGCATAGTTTCGTATGCCGT

If you look closely at this particular sequence, you'll see that this is actually the sequence of two concatenated adapter sequences. Thus, there's no tag here. This particular sequence is probably just garbage.

For the sequences that look like:

1 AAAAAAAAAAAAAAAAAAAAAAAACCCATCCCC

You're probably fitting adaptors to the poly-A of some sheared up RNA. Unfortunately, this is likely just a consequence of there being RNA in your sample. I suppose this could be something you're interested in, but most likely, it's also just garbage.

Finally, as for the 3' and 5' sequenced location of the tag, I'll take a wild shot at this and guess a bit about your protocol. If you're ligating adapters to your tags in high concentration, you've probably got excess tag in your reaction. Thus, you likely end up with

Adapter-tag-adapter

configurations. Thus, if the first adapter is used as the sequencing primer (?), and the tag is < 32 bases, you'll end up running into the second adapter, and get it's sequence. Why you'd get them on the 5' end, I'm not so sure - but I don't see any examples other than the one where you're sequencing an adapter dimer.

Hopefully that's helpful.

Cheers,

**biliards** · 04-11-2008, 06:54 AM

wow

hi all,
I have TCGTATGCCGTCTTC adaptor in small RNA solexa experiments database.
I obtain that 37,7% solexa sequence database aligns with 100% about 15-9nt about adaptor. Only 37,7% about my solexa database is egual beelu examples but 60% about solexa sequence database is:
GGCGGATGTAGCCCCGCGGNTCGCCTCCCGTCC
GACTCTCGGCAACGGCTCTCGTACGCCCCCCCC
GTTTTCTGAATGAGCCGCGCGTACTCGTCTGCC
GAGTGTTTTGACGATCGGGCCTACCGCCTGCCG
GTGCTTGTAGTCGTTGCTCCCTGGTCGCCTGCC
GTCCCTGCTGTCGCCGCCCCCGTCCGCCGNCTT
GGGACGCTGGTGTGGCCCGGTTGGTCGCCCGCC
GTATTTTGTGTAGGTCGTCCGNCGTCGCANGCC
GAACTGTGAAACTGCGCCTGGCTCCCCCGCCCC
GACGCCGTAATTTGTCGCAGCGGGTCCCCTCCC
GCGCCTGTAGCCCAGCGGAACTCGTCTCCCGTC
GCGTCTGTAGTCCCCCGGNTCCGCTTCCCCCGC
GTTGGTTGAATAGTATGGTTTATTTCGTCTGCC
GAGTTGGATGAAAGAGCCGCGGAGTCGCCTGCC
GGGGATCTGGCGAACCCCGNCTGCCCCCCTCCG

First sequence aligns adaptor (and other sequence):
Q: 1 TCGTATGCCGTCTTC 15
S: 19 TCGTACGCCCCCCCC 33

Others, with one align program, align part of adaptor but not at 5' or 3'.
It is normal?
I know that microRNA is 20-21nt, so I think that I would find one adaptor, is it rigth?

**beelu** · 04-19-2008, 01:57 AM

Hi billiards,

What I found after doing computational analysis, is that not all of them have adaptors, and for any sequence with tag-sequence configuration is also automatically discarded (it should be sequence-tag configuration). For identification of adaptor sequence, I used the constraint that:
(1) an adaptor sequence has to be at least 5 bases long
(2) the adaptor sequence has at least 70% identity with the original adaptor sequence.

I find the result is not too bad with this configuration.

beelu

**biliards** · 04-21-2008, 05:22 AM

thanks for your replay

**myrna** · 05-05-2008, 01:38 PM

adapter trimming

I have worked with solexa small RNA reads quite a bit recently and have seen some of the same issues you are discussing. I have taken a slightly different approach for adapter trimming. Rather than looking for adapters up-front, I just map the full length reads against the genome and use the end of the alignment to identify where the adapter starts. I have been quite strict in what I accept as a real sequence read (there are so many reads to work with, that you can afford this). Basically, the alignment has to start at base 1 of the read and end near a sequence that can be recognized as adapter. When doing this, you only need to store the longest alignment of a given read in the genome (there may be more than one of the same length for miRNAs with identical mature sequences). By chance, sometimes the first few nt of the adapter also align to the genome, which is why I say 'near' instead of 'at'. Also, you will notice that in many cases, there is an intervening nucleotide between the end of the alignment and the start of the adapter.

Ryan

**biliards** · 05-14-2008, 02:23 AM

thanks you for your replay and your tips

**chris** · 05-16-2008, 06:39 AM

Hi Ryan,

That is an interesting solution. What do you use to map to the genome and how feasible would it be to do on larger genomes like Human?

I've been working on this also and have also added a quality score filter as well as I found many reads to be of poor quality. This, together with adaptor trimming, reduced my search set by a third.
Cheers,

Chris.

**myrna** · 05-16-2008, 08:06 AM

mapping small RNAs

Hi Chris.
I use megablast with a word size of 16. I do this routinely against the human genome, but I use a cluster and map reads in small batches so it is hard to say whether it is a good option for people without access to many CPUs.

Ryan

**chris** · 05-16-2008, 08:12 AM

OK. Thanks for the reply.

**JKing** · 06-15-2008, 11:53 AM

The SeqMan NGen effectively trims these adapters and assembles the resulting reads. This is true for small RNA or any target validation run. It's worth a look.

**Nix** · 12-15-2008, 03:38 PM

Check out SOAP for variable adaptor sequence trimming

I've been using the SOAP aligner to trim the variable length adaptor sequences. Works nicely.

**myrna** · 12-15-2008, 03:42 PM

How long does SOAP take (on average) to align a single lane of data on a single CPU?

Thanks,

Ryan

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

solexa small rna questions

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News