Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • short reads missed by aligners

    Anyone looking into the No Match eland reads, or reads that come off solexa that are not mapped to the reference?
    Any other kind of contamination control like eColi, etc?

    I was looking into blat on the entire nt, but would love to hear what people are using.

    sm
    --
    bioinfosm

  • #2
    Try using something like velvet to align all the unaligned reads to each other, then BLAST those contigs against nr. If they are crummy reads, they won't align to each other.

    We tested an in-house clone collection, and I found a fair bit of e.coli contamination. And I've found vector-looking things in microbial samples...stuff like that. If your reference has a biggish deletion compared to what you really sequenced, you might find it this way.

    Comment


    • #3
      Originally posted by bioinfosm View Post
      Anyone looking into the No Match eland reads, or reads that come off solexa that are not mapped to the reference?
      Any other kind of contamination control like eColi, etc?

      I was looking into blat on the entire nt, but would love to hear what people are using.

      sm
      One approach that takes a while but exhaustively looks at all the NMs is to do a blat on the genome of interest to kick out gapped hits and take what is left and then blast to nr to find contaminants. I was thinking to then take the top couple contaminants and look at the matching hits to see if there is any overlap since maybe reads from the contaminant intersect with those mapped to the genome of interest. This might be most important for SNP calling.

      Comment


      • #4
        Here's a nice comparison of the various short-read aligners, including eland.

        http://massgenomics.wordpress.com/20...nd-and-others/

        Comment


        • #5
          thanks for your inputs...

          Edena and velvet - 2 de novo assemblers using short read data gave so different outputs!

          Velvet gave 2 contigs that pointed to a fragment that was supposedly deleted out and should not have been sequenced

          edena on the other hand gave 10 or so contigs 100-120 bp long, that align perfectly to the eColi K-12!
          --
          bioinfosm

          Comment


          • #6
            Reads that aren't matched by Eland are interesting because we would suppose that they're not repeats because Eland reports the matches with multiple locations.
            I would say that gaps in a read would probably be missed by Eland, so use a short read aligner that can find gaps on these reads. I've been using novoalign (www.novocraft.com) and it can find up to 7/8 gaps in a 36bp read matching to a reference sequence, and fast on large ones. I've even tested it on simulated data with mutation rates in excess of 15% and it still finds them. Use a very high threshold e.g. -t 200 to find potentially all permutations for your read.
            I'd be interested to know how much more you may be able to match out of your Eland NM reads.

            Comment


            • #7
              Just a note from my side:

              As you know from other threads, we can map from 10bp onwards, with gaps and PMs. However, before tweaking the unmapped reads into the reference genome, look at viral genomes, vectors etc.
              We found numerous perfect matches there. Especially when working on specific cell lines, check the history of that line, how it was immortalized etc. You´ll be surprised how many good old retroviral friends you find!

              Cheers

              Klaus

              Comment


              • #8
                Interresting note, have you looked also at if you can remap the retroviral sequences with mismatches to human and if it seems to be a source of background in alignments?

                Comment


                • #9
                  Chipper,

                  more on that with HEK cells and SV40 and Adenovirus is described in our paper

                  Klaus

                  Comment


                  • #10
                    I just read the Sultan paper Kmay, nice work

                    However, I am a little confused because it says that reads were mapped with ELAND, " Illumina deep sequencing was used to generate 27-bp reads from replicate samples for each cell line. Reads were mapped to the human genome (hg18, NCBI build 36.1) using the Eland software, allowing up to two mismatches (see SOM). Of the total reads, 50% matched to unique genomic locations," (http://www.sciencemag.org/cgi/content/full/1160342/DC1)

                    And the actual read data is unavailable . So I'm assuming that you'll used the proprietary genomatix mapper in a separate study?? Where can we get this read data?

                    Comment


                    • #11
                      zee,

                      you are right. The original data were mapped with ELAND. At those days our GMS was under development. Later we looked at the ELAND non mapped reads and ran those over the viral genomes with our GMS. The actual data reads are deposited at the GEO.

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Latest Developments in Precision Medicine
                        by seqadmin



                        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                        Somatic Genomics
                        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                        05-24-2024, 01:16 PM
                      • seqadmin
                        Recent Advances in Sequencing Analysis Tools
                        by seqadmin


                        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                        05-06-2024, 07:48 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 06-03-2024, 06:55 AM
                      0 responses
                      12 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-30-2024, 03:16 PM
                      0 responses
                      24 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-29-2024, 01:32 PM
                      0 responses
                      29 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-24-2024, 07:15 AM
                      0 responses
                      215 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X