Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Paired-read analysis in MEGAN

    I am attempting to utilize blastn output from paired end files for MEGAN analysis. However, selecting the paired-read checkbox in the "import from blast" window does not seem to work. I have tried changing file names to correspond to an R1 R2 suffix scheme but nothing seems to work. Has anyone been able to get this to work? Also would there be anyway to do this through the MEGAN command-line?

    I am using MEGAN 5.1 but I also have MEGAN 5.0.3 and both behave similarly.

    Thanks!

  • #2
    Did you ever figure this out? I am having the same problem and its driving me nuts. The first issue I ran into was that the read names had spaces in them. That lead to an error when I imported the reads and blast files that the read names were not unique. I used awk to remove the spaces in the names of the reads.

    Read 1s are named now with no spaces as follows:
    HTML Code:
    >NS500476:25:HLLCNBGXX:1:11101:7199:4320_1:N:0:6/1
    GTCGGAAACGACCGGGTGCTCGGAGTGCCGGTTCTGGTCATCCTCGCCGCGATCTGTTGCATCGTACTGC
    ATTACATGCTGTCGCAGACCCGTTTCGGCCAGCACACCTATGCCATGGGCGCCAGCAAGGCCGCCGCAAG
    CCGCGCCGGCA
    Read 2s are named now with no spaces as follows:
    HTML Code:
    >NS500476:25:HLLCNBGXX:1:11101:7199:4320_1:N:0:6/2
    CCAGCGGAAAATCGGCCTGTGTACAGAACCCCCGCGATACCCGCAATGACGGCAGAGAGAATGTAGATCT
    TCAGAGTCAGAATCTTTATGTAGAAGCCTGAGCGACTTTAGGAGTACTTGCTGGCGCCGATGGCATAGGT
    GTGCTGCCAGA
    Blast tabular output R1 with no spaces are as follows:
    HTML Code:
    NS500476:25:HLLCNBGXX:1:11101:7199:4320_1:N:0:6/1 gi|652682316|ref|WP_027031178.1| 67.6 37 12 0 1 111 187 223 1.6e-05 56.2
    Blast tabular output R2 with no spaces are as follows:
    HTML Code:
    NS500476:25:HLLCNBGXX:1:11101:18020:4335_1:N:0:6/2 gi|938913364|ref|WP_054696212.1| 68.1 47 15 0 1 141 101 147 2.8e-10 72.0
    The suffix for read 1 and read 2 looks like it should just be '1' and '2' to me. However that does not work. I have also tried '/1' and '/2', and '_1:N:0:6/1' and '_1:N:0:6/2' to no avail. The end result is that no reads are ever mapped to the blast hits. What is the correct suffix to enter? Do the read names need to be in a different format with a different suffix?

    Comment


    • #3
      Well, I emailed Daniel Huson, and he was super helpful with this. It turns out that I had made an error in my tabular blast output file where it was space delimited instead of tab delimited which caused problems.

      So, for me, my original reads were named like this for read 1 of the first pair:
      NS500476:25:HLLCNBGXX:1:11101:7199:4320 1:N:0:6/1

      read 2 is similar except is has a /2 on the end. There is not supposed to be a space between the 4320 and the 1. In the tabular blast output file, the read names end where the space is. Similarly, the fasta files of the reads are not unique because the space causes both members of a pair to be named the same without the suffix part.

      Instead of re-running the blast, I filled in the spaces in the read names in the fasta files with underscores. Then, in each tabular blast file, I added '_1:N:0:6/1" to the names of the reads for read1, and '_1:N:0:6/2' to the end of the read names for read 2. This is where I screwed up and replaced the tabs with spaces in the tabular blast file. Make sure those stay as tabs.

      Now, when I import my reads, and I specify the suffix as '/1' for read 1 and '/2' for read 2, it works great. If you run into a problem, make sure your read names have no spaces in them, and if you have mucked around in your files, make sure you haven't messed up the formatting.

      Comment


      • #4
        Originally posted by A_sapidissima View Post
        Well, I emailed Daniel Huson, and he was super helpful with this. It turns out that I had made an error in my tabular blast output file where it was space delimited instead of tab delimited which caused problems.

        So, for me, my original reads were named like this for read 1 of the first pair:
        NS500476:25:HLLCNBGXX:1:11101:7199:4320 1:N:0:6/1

        read 2 is similar except is has a /2 on the end. There is not supposed to be a space between the 4320 and the 1. In the tabular blast output file, the read names end where the space is. Similarly, the fasta files of the reads are not unique because the space causes both members of a pair to be named the same without the suffix part.

        Instead of re-running the blast, I filled in the spaces in the read names in the fasta files with underscores. Then, in each tabular blast file, I added '_1:N:0:6/1" to the names of the reads for read1, and '_1:N:0:6/2' to the end of the read names for read 2. This is where I screwed up and replaced the tabs with spaces in the tabular blast file. Make sure those stay as tabs.

        Now, when I import my reads, and I specify the suffix as '/1' for read 1 and '/2' for read 2, it works great. If you run into a problem, make sure your read names have no spaces in them, and if you have mucked around in your files, make sure you haven't messed up the formatting.
        I never did end up solving this with MEGAN, so thanks for posting your solution! I ended up analyzing each member of the paired reads separately and then I just wrote my own script to combine the results by looking at which read had a lower evalue and using that assignment for the pair, if both members had differing assignments. I'm sure the way MEGAN does it is a bit more sophisticated.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM
        • seqadmin
          Techniques and Challenges in Conservation Genomics
          by seqadmin



          The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

          Avian Conservation
          Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
          03-08-2024, 10:41 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:37 PM
        0 responses
        10 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Yesterday, 06:07 PM
        0 responses
        9 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-22-2024, 10:03 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 03-21-2024, 07:32 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Working...
        X