Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

Filter ribosomal RNA

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Filter ribosomal RNA

    Hi

    Although ribosomal depletion is usually performed some sequences with ribosomal content remain and are sequenced.

    How do you filter out reads (e.g. from 454 Titanium runs) with ribosomal content?

    Many thanks for advice!

  • #2
    I usually align the raw data against a set of ribosomal sequences (and the mitochondrial genome) using ELAND, and use grep with the -v option to remove any reads that find a match. If you are using the standard Illumina pipeline you can configure Gerald to use ANALYSIS:eland_rna to do this automatically without having to do it step-by-step.

    Comment


    • #3
      Thanks for your reply, shurjo.

      I am wondering where you get the set of ribosomal sequences (and the mitochondrial genome) from?

      Did you make it yourself or download from somewhere?

      Comment


      • #4
        Here are the gi numbers for the ribosomal sequences, you can download them from Genbank.

        gi|555853|gb|U13369.1|HSU1336 Human ribosomal DNA complete repeating unit

        gi|23898|emb|X12811.1| Human 5S DNA

        To this I would add the mitochondrial genome which you can get from the UCSC Genome Browser site (unless you are interested in mitochondrial gene expression)

        HTH,

        Shurjo

        Comment


        • #5
          Hello, was wondering how well conserved ribosomal RNA, and ribosomal proteins are? How relevant is it to use the Human ribosomal units for matching waterlily datasets for example?

          Comment


          • #6
            Reply to Ikim

            You want to use the closest species you can find. Try the SILVA database: http://www.arb-silva.de/ - they have an extensive collection of small and large subunit sequences and a taxonomic browser to find what you need.

            Comment


            • #7
              Originally posted by shurjo View Post
              I usually align the raw data against a set of ribosomal sequences (and the mitochondrial genome) using ELAND, and use grep with the -v option to remove any reads that find a match. If you are using the standard Illumina pipeline you can configure Gerald to use ANALYSIS:eland_rna to do this automatically without having to do it step-by-step.
              Shurjo,

              How do you configure the grep command so that it strips the sequence IDs that mapped to rRNA database, AND the following three lines that contain the sequence, the + and the quality string?

              Thanks!
              Carmen

              Comment


              • #8
                Hi everyone!

                I am analyzing some Illumina libraries that appear to have a lot of ribosomal RNA contamination.

                I'm using Bowtie to align the reads only to a specific set of sequences, and because of the differing amount of rRNA contamination in each sample, each of them maps a different percentage of reads to the dataset (some half of what others map), ranging from 1% to 0.3%.

                I wonder if the amount of rRNA contamination in the preparation of a library can have an impact on the apparent expression level of a gene -- even though one normalizes its counts agains the total number of reads that mapped.

                What's your opinion in this subject?

                Carmen

                Comment


                • #9
                  Originally posted by carmeyeii View Post
                  Shurjo,

                  How do you configure the grep command so that it strips the sequence IDs that mapped to rRNA database, AND the following three lines that contain the sequence, the + and the quality string?

                  Thanks!
                  Carmen
                  That's not how you do it. You have the .bam, which has the sequence and what it mapped to all on one line; you filter that. You could do that pretty easily with grep.

                  Comment

                  Working...
                  X