Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Concatenating paired end reads when there are missing reads

    Hi all,

    I am looking for a tool that can take Illumina fastq paired end reads (already trimmed and quality filtered, so that not all the read1 sequences are paired with read2 sequences, and vice versa), and concatenate them (by taking the reverse complement of read 2 and attaching it to the end of read1). These reads do not overlap. I'm concatenating them so that when I do database searches (e.g. BLAST), I have more information to use to determine what organism my amplicons came from (this is metagenomics work).

    Does anyone have such a tool they would be willing to share? I have zero programming experience and our bioinformatician left months ago.

    I've looked at ill2fastq.pl, but it seems to be designed for working with only pairs of reads, and can't handle unpaired reads.
    Last edited by LizBent; 10-05-2012, 05:27 AM. Reason: incomplete

  • #2
    So what you need to do is:
    1. Interlace your fastq files, grouping your paired ends by seq ID...i.e. cluster coordinates for read /1 and /2
    2.De-interlace the paired reads
    3. stitch together these two sequences.

    This can easily be done on a local instance of Galaxy (I dont think the web portal has interlacer tool installed).

    However, seeing as your paired ends come from opposite ends of sequence I don't see how stitching them together will help you in a BLAST search. You are creating an artificial sequence, and Genbank sequences are individual "real" fragments.
    If I were you I wouldn't concatenate the PEs. Use FASTX sequence collapser in Galaxy and batch BLAST your unique reads individually.

    Comment


    • #3
      Hi Jackie- actually, if you BLAST the two ends of a sequence (with or without an artificial gap in the middle), you get better matches than if you BLAST just one end at a time. It is possible you'd get a nonsense window where the two ends meet, but the best overall matches would be for the longer ends that match real sequences, so those are the hits that will come out on top.

      As for the solution you describe, I was rather hoping to find a script that would allow me to keep track of unpaired read1 and read2 sequences so I can use them as well.

      Comment


      • #4
        do you know how to use the command line at all? post the first read name from each fastq and i'll try to help you out. my solution will require python.

        Comment


        • #5
          The solution I posted tracks unpaired reads.
          otherwise check out here http://sfg.stanford.edu/quality.html
          Their PECombiner.sh has a bug in it...they may have updated this on the site?
          If they have not ask the authors to send you the working script.

          Comment


          • #6
            uses a good amount of memory since it's storing one fq in a dict, but seems to work:

            join paired-end or print unique single-end reads. GitHub Gist: instantly share code, notes, and snippets.


            edit: you didn't say anything about preserving the quals, so this prints a fasta.
            Last edited by jbrwn; 10-09-2012, 02:28 PM.

            Comment


            • #7
              Thanks so much, I will try it

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Non-Coding RNA Research and Technologies
                by seqadmin




                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                Nobel Prize for MicroRNA Discovery
                This week,...
                10-07-2024, 08:07 AM
              • seqadmin
                Recent Developments in Metagenomics
                by seqadmin





                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                09-23-2024, 06:35 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 10-02-2024, 04:51 AM
              0 responses
              101 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-01-2024, 07:10 AM
              0 responses
              110 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 09-30-2024, 08:33 AM
              1 response
              114 views
              0 likes
              Last Post EmiTom
              by EmiTom
               
              Started by seqadmin, 09-26-2024, 12:57 PM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Working...
              X