Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • behoward
    Member
    • Mar 2009
    • 13

    454 read orientation

    Hi everyone,

    I am looking at a 454 dataset and I am wondering whether the read sequences (they are in a FASTA file) are ususually in the same direction as the original mRNAs or can they be reverse complement?

    This will determine what BLAT parameters I use during alignment. Either q=rna or q=dna. I think with standard (non-454) ESTs you don't know the orientation, so you have to use q=dna. However, this can give you unwanted duplicated alignments.
  • kmcarr
    Senior Member
    • May 2008
    • 1181

    #2
    The typical protocol for sequencing RNA with 454 is to make ds cDNA, fragment it (nebulizer, covaris, etc.) then use a standard genomic library prep kit from Roche. This means polishing (blunting) the ends and attaching the sequencing adapters in a non-directional manner. Thus the reads you get will be a mixture of both directions.

    Comment

    • behoward
      Member
      • Mar 2009
      • 13

      #3
      Thanks! I guess I have to use q=dna, then.

      The dataset I am looking at is a public 454 GS20 dataset from the paper "Sampling the Arabidopsis Transcriptome with massively parallel pyrosequencing" (Weber et al, Plant Physiology May 2007). Kmcarr, I think I remember from a previous post that you have some experience with this particular dataset.

      Do you have any guess whether the original researchers used q=rna in the BLAT alignment? I remember they had about 11% of the reads that don't map to the genome. But if I use q=dna, I get a larger percent mapping to TAIR7.

      Also, if I do use q=dna, I guess I will only want to 'count' reads once when they map to a gene and its reverse complement. However, I would want to keep both matches when a read maps to multiple genes (say paralogs, or duplicate genes) I'm not sure how to tell these two cases apart... Anyone have any suggestions?

      Comment

      • kmcarr
        Senior Member
        • May 2008
        • 1181

        #4
        Originally posted by behoward View Post
        The dataset I am looking at is a public 454 GS20 dataset from the paper "Sampling the Arabidopsis Transcriptome with massively parallel pyrosequencing" (Weber et al, Plant Physiology May 2007). Kmcarr, I think I remember from a previous post that you have some experience with this particular dataset.

        Do you have any guess whether the original researchers used q=rna in the BLAT alignment? I remember they had about 11% of the reads that don't map to the genome. But if I use q=dna, I get a larger percent mapping to TAIR7.

        Also, if I do use q=dna, I guess I will only want to 'count' reads once when they map to a gene and its reverse complement. However, I would want to keep both matches when a read maps to multiple genes (say paralogs, or duplicate genes) I'm not sure how to tell these two cases apart... Anyone have any suggestions?
        Man! That dataset just won't die. When I said I had some familiarity with the data I was understating it a bit. I was one of the authors, performing all of the bioinformatics. I used the default BLAT settings for query and target type, i.e. both -q ant -t=dna. However BLAT will only output a single alignment for a read at a given location; it will not report both the forward and reverse alignment of a read. You don't have to worry about that.

        Your are correct that you will find equally good alignments to paralogous genes. You will have to decide how you want to approach assigning or counting those reads.

        You will also find many poor alignments of reads to the genome. You should play with the pslReps program to filter your initial BLAT output. pslReps is meant to retain only the best alignment if a query sequence aligns to multiple target locations. If there are a group of alignments which are equally good (or nearly so) they will all be retained.

        Comment

        • behoward
          Member
          • Mar 2009
          • 13

          #5
          Well, thanks again

          I guess I came to the right person! I suppose the good thing about a dataset that won't die is that you must get a ton of citations.

          Cheers,
          Brian

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          12 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          24 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 12:03 PM
          0 responses
          28 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-02-2026, 11:40 AM
          0 responses
          22 views
          0 reactions
          Last Post SEQadmin2  
          Working...