Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • solexa small rna questions

    Just wondering if you can help me,

    With solexa small rna data (*tag.txt file), each sequence data contains part of an adaptor sequence. For some sequences, this adaptor sequence is on the 3' end, and for others, its on the 5'end of the sequence. What is the implication of this adaptor sequence location? Thanks alot!

    beelu

  • #2
    I don't work with small rna data from Illumina runs (I'm more familiar with the typical single and paired end runs), but if you give me a couple of lines, showing the header and what the tag sequence is, I might be able to figure this out.
    The more you know, the more you know you don't know. —Aristotle

    Comment


    • #3
      Thanks alot! I have extracted some lines from the file: The number indicates the number of detection. Each sequence is 33 base long. The adaptor sequence is TCGTATGCCGTCTTCTGCTTG. I want to know the the difference between adaptor sequence location (3" and 5") and also, what happens if adaptor sequence is not there? this means the small rna sequence is longer than 33 base?

      277718 TGAGGTAGTAGATTGTATAGTTTCGTATGCCGT
      241250 TGAGGTAGTAGGTTGTATGGTTTCGTATGCCGT
      166087 TGAGGTAGTAGGTTGTATAGTTTCGTATGCCGT
      54345 AGAGGTAGTAGGTTGCATAGTTTCGTATGCCGT
      53950 TGAGGTAGTAGTTTGTACAGTTTCGTATGCCGT
      35 TCGTATGCCGTCTTCTGCTTGAAAANNNAAAAN
      35 TCGTATGCCGTCTTCTGCTTGAAAAAAAAAATA
      1 AAAAAAAAAAAAAAAAAAAAAAAACCCATCCCC
      1 AAAAAAAAAAAAAAAAAAAAAAAACCCAACCCC
      1 AAAAAAAAAAAAAAAAAAAAAAAACCATTTCCT
      1 AAAAAAAAAAAAAAAAAAAAAAAACCATTCCCG
      1 AAAAAAAAAAAAAAAAAAAAAAAACCATCTTCT
      1 AAAAAAAAAAAAAAAAAAAAAAAACCATCCTCT
      1 AAAAAAAAAAAAAAAAAAAAAAAACCATCCCCT

      Comment


      • #4
        Ok, I don't know what your protocol is, but some of what you're seeing is caused by the protocol you're using.

        First off, take a look at:

        54345 AGAGGTAGTAGGTTGCATAGTTTCGTATGCCGT

        If you look closely at this particular sequence, you'll see that this is actually the sequence of two concatenated adapter sequences. Thus, there's no tag here. This particular sequence is probably just garbage.

        For the sequences that look like:

        1 AAAAAAAAAAAAAAAAAAAAAAAACCCATCCCC

        You're probably fitting adaptors to the poly-A of some sheared up RNA. Unfortunately, this is likely just a consequence of there being RNA in your sample. I suppose this could be something you're interested in, but most likely, it's also just garbage.

        Finally, as for the 3' and 5' sequenced location of the tag, I'll take a wild shot at this and guess a bit about your protocol. If you're ligating adapters to your tags in high concentration, you've probably got excess tag in your reaction. Thus, you likely end up with

        Adapter-tag-adapter

        configurations. Thus, if the first adapter is used as the sequencing primer (?), and the tag is < 32 bases, you'll end up running into the second adapter, and get it's sequence. Why you'd get them on the 5' end, I'm not so sure - but I don't see any examples other than the one where you're sequencing an adapter dimer.

        Hopefully that's helpful.

        Cheers,
        The more you know, the more you know you don't know. —Aristotle

        Comment


        • #5
          wow

          hi all,
          I have TCGTATGCCGTCTTC adaptor in small RNA solexa experiments database.
          I obtain that 37,7% solexa sequence database aligns with 100% about 15-9nt about adaptor. Only 37,7% about my solexa database is egual beelu examples but 60% about solexa sequence database is:
          GGCGGATGTAGCCCCGCGGNTCGCCTCCCGTCC
          GACTCTCGGCAACGGCTCTCGTACGCCCCCCCC
          GTTTTCTGAATGAGCCGCGCGTACTCGTCTGCC
          GAGTGTTTTGACGATCGGGCCTACCGCCTGCCG
          GTGCTTGTAGTCGTTGCTCCCTGGTCGCCTGCC
          GTCCCTGCTGTCGCCGCCCCCGTCCGCCGNCTT
          GGGACGCTGGTGTGGCCCGGTTGGTCGCCCGCC
          GTATTTTGTGTAGGTCGTCCGNCGTCGCANGCC
          GAACTGTGAAACTGCGCCTGGCTCCCCCGCCCC
          GACGCCGTAATTTGTCGCAGCGGGTCCCCTCCC
          GCGCCTGTAGCCCAGCGGAACTCGTCTCCCGTC
          GCGTCTGTAGTCCCCCGGNTCCGCTTCCCCCGC
          GTTGGTTGAATAGTATGGTTTATTTCGTCTGCC
          GAGTTGGATGAAAGAGCCGCGGAGTCGCCTGCC
          GGGGATCTGGCGAACCCCGNCTGCCCCCCTCCG

          First sequence aligns adaptor (and other sequence):
          Q: 1 TCGTATGCCGTCTTC 15
          S: 19 TCGTACGCCCCCCCC 33

          Others, with one align program, align part of adaptor but not at 5' or 3'.
          It is normal?
          I know that microRNA is 20-21nt, so I think that I would find one adaptor, is it rigth?

          Comment


          • #6
            Hi billiards,

            What I found after doing computational analysis, is that not all of them have adaptors, and for any sequence with tag-sequence configuration is also automatically discarded (it should be sequence-tag configuration). For identification of adaptor sequence, I used the constraint that:
            (1) an adaptor sequence has to be at least 5 bases long
            (2) the adaptor sequence has at least 70% identity with the original adaptor sequence.

            I find the result is not too bad with this configuration.

            beelu

            Comment


            • #7
              thanks for your replay

              Comment


              • #8
                adapter trimming

                I have worked with solexa small RNA reads quite a bit recently and have seen some of the same issues you are discussing. I have taken a slightly different approach for adapter trimming. Rather than looking for adapters up-front, I just map the full length reads against the genome and use the end of the alignment to identify where the adapter starts. I have been quite strict in what I accept as a real sequence read (there are so many reads to work with, that you can afford this). Basically, the alignment has to start at base 1 of the read and end near a sequence that can be recognized as adapter. When doing this, you only need to store the longest alignment of a given read in the genome (there may be more than one of the same length for miRNAs with identical mature sequences). By chance, sometimes the first few nt of the adapter also align to the genome, which is why I say 'near' instead of 'at'. Also, you will notice that in many cases, there is an intervening nucleotide between the end of the alignment and the start of the adapter.

                Ryan

                Comment


                • #9
                  thanks you for your replay and your tips

                  Comment


                  • #10
                    Hi Ryan,

                    That is an interesting solution. What do you use to map to the genome and how feasible would it be to do on larger genomes like Human?

                    I've been working on this also and have also added a quality score filter as well as I found many reads to be of poor quality. This, together with adaptor trimming, reduced my search set by a third.
                    Cheers,

                    Chris.

                    Comment


                    • #11
                      mapping small RNAs

                      Hi Chris.
                      I use megablast with a word size of 16. I do this routinely against the human genome, but I use a cluster and map reads in small batches so it is hard to say whether it is a good option for people without access to many CPUs.

                      Ryan

                      Comment


                      • #12
                        OK. Thanks for the reply.

                        Comment


                        • #13
                          The SeqMan NGen effectively trims these adapters and assembles the resulting reads. This is true for small RNA or any target validation run. It's worth a look.

                          Comment


                          • #14
                            Check out SOAP for variable adaptor sequence trimming

                            I've been using the SOAP aligner to trim the variable length adaptor sequences. Works nicely.

                            Comment


                            • #15
                              How long does SOAP take (on average) to align a single lane of data on a single CPU?

                              Thanks,

                              Ryan

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              24 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              50 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X