Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • MAQ and short read length (DGE)

    We are currently looking into the viability of Digital Gene Expression (DGE) or mRNA-seq as a possible replacement for expression microarrays in our breast cancer studies. DGE generates reads that are only 17 bases in length, and thus allowing for even 1 mismatch is a little questionable when aligning against the human genome. MAQ doesn't seem to allow you to specify the -n flag as anything less than 1 - is this something that can be altered easily? I would love to align my short reads via MAQ but only keep those that align perfectly.

    Along those lines, if a read maps to more than 1 location, MAQ will randomly pick one of those locations for the placement of that read. Is there any way to customize this function so that it checks against a coordinate file or something like that so we can at least have MAQ select a location for that read that is only in the transcriptome to raise our chances of the placement being 'correct'?

    Thank you for your help

  • #2
    I have looked at DGE data, and even with 16/17 bp, more than 90% map to the tag sequences (all possible 16mers with the enzyme specificity).
    I am curious to see how MAQ can be modified as well.. quite a few other tools have specific tag algorithm to take care of such aspects..
    --
    bioinfosm

    Comment


    • #3
      You really see >90% mapping to "canonical" regions?
      I've been aligning with MAQ with -n set to 1, and map >99% to the genome. I then extend all reads 4bp off the 5' end and only keep reads that contain CATG (we cut with NlaIII) - we're only keeping 50% of our mapped reads at this step. Then after that we check to see the overlap with genic regions, and it is certainly not as high as you report. What do you do differently?

      Comment


      • #4
        Technically you should not be trying to align your DGE reads to the genome. The tags may not exist as contiguous sequence in the genome; they may span splice sites or polyadenylation sites. To properly interpret DGE data you should first generate a complete set of predicted tags from the genome and transcriptome and then attempt to align your reads to that. To do this you need a well annotated genome. Please see this thread linked below for the software stack created by Ariel Paulson at the Stowers Institute for creating these tag tables and then scripts to interpret the Eland alignments.

        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


        I have used this pipeline for a couple of DGE projects. In one project with Arabidopsis I was able to map 97% of my filtered reads to predicted tags. This was allowing for up to 2 mismatches in the alignment. Counting only perfect matches the hit rate was ~90%. Not all of these were mapped to annotated genes though. Roughly 63% were mapped to genes, the remainder were to intergenic or repetitive regions.
        Last edited by kmcarr; 02-23-2009, 03:29 PM. Reason: correct spelling error

        Comment


        • #5
          Originally posted by jms1223 View Post
          Along those lines, if a read maps to more than 1 location, MAQ will randomly pick one of those locations for the placement of that read. Is there any way to customize this function so that it checks against a coordinate file or something like that so we can at least have MAQ select a location for that read that is only in the transcriptome to raise our chances of the placement being 'correct'?
          If you only want to have reads mapped to your transcriptome, perhaps just make your reference sequences the transcripts themselves, rather than the genome sequence?

          --Torst

          Comment


          • #6
            kmarr and Torst answered that for me jms1223
            --
            bioinfosm

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Understanding Genetic Influence on Infectious Disease
              by seqadmin




              During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

              Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
              09-09-2024, 10:59 AM
            • seqadmin
              Addressing Off-Target Effects in CRISPR Technologies
              by seqadmin






              The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
              08-27-2024, 04:44 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 06:25 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 01:02 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-18-2024, 06:39 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-11-2024, 02:44 PM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Working...
            X