Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bug in GAP5: Finding spanning read pairs

    Hi all,

    I've discovered that when using Find Read Pairs in GAP5 to find read pairs in an assembly which span contigs, all of the read pairs reported in the information given about the spanning read pairs by the contig comparator are ALWAYS reported as forwards and forwards direction e.g.

    Read pair:
    From contig DENOVO_c11(#2157181) at 64 reading GAPC_0042_FC:6:104:8598:9213#0/1(#2156633)
    With contig DENOVO_c224(#36154940) at 462 reading GAPC_0042_FC:6:104:8598:9213#0/2(#36162837)
    Direction of first read is forwards
    Direction of second read is forwards
    Length 35

    Read pair:
    From contig DENOVO_c190(#34488635) at 1191 reading GAPC_0042_FC:6:2:17711:1932#0/1(#34493286)
    With contig DENOVO_c354(#38868888) at 2904 reading GAPC_0042_FC:6:2:17711:1932#0/2(#38887671)
    Direction of first read is forwards
    Direction of second read is forwards

    I have three libraries. Two paired end >--< and One mate paired <---->.
    No matter which library I choose or which type of comparison I choose (end vs end, all vs all end vs all) the read pairs reported in the "Contig Comparator" are ALWAYS in the forwards forwards direction.

    When manually looking at the two examples above:
    In the first example first both directions are correct but the position reported for the read for the second contig is not at 462 but at 1847. Is it a coincidence that 462 is rounded up 1/4 of 1847?

    In the second example the reads are positioned correctly but the direction of the second read in the contig is reverse not forwards.

    So in my library of mate pairs which I want to use for scaffolding "find read pairs" finds 85000 spanning read pairs (of 650000 pairs in the library). EVERY SPANNING pair is reported in the forwards forwards direction.
    At first I thought that only forwards forwards were detected but after manually looking at 10 of the reported spanning pairs above, it seems the problem is with the reporting of direction of the second read. The direction is wrong for the second read in half the cases. Also in one case the position of the second reported read is incorrect being 1/4 of the position.
    I manually found some spanning reverse reverse reads using the template status and none of these get reported. So no reads were found by "find read pairs" that are reverse reverse even though these are represented in the assembly when manually looking for spanning reads using the template status colours.

    I've tried to report the problem on the Staden Sourceforge site but have not had a response.

    Regards
    Robert

  • #2
    Could you add a link to your SourceForge report?

    Which input file format are you giving GAP5, and how was it created (in case the error can be traced to the input data)?

    Comment


    • #3
      The input was from a caf file generated by MIRA. I used tg_index to create the database from the caf file. The caf file was from an assembly of 454 Titanium reads and Illumina mate paired and paired end reads by MIRA version 4.04 but this afternoon I check with assemblies from MIRA using versions 3.18 and 4.03 assemblies with the same result.

      The report was essentially the same email to the Staden discussion on Sourceforge via Email as this appeared to be the only way I could find submit a report about the Staden package.

      I don't think this is related to the input data. It seems to be related to reporting and "Find Read Pairs" as the reads are displayed correctly and read pairs are correct in the template status and in the editors in GAP5.

      Comment


      • #4
        Originally posted by rwillows View Post
        The report was essentially the same email to the Staden discussion on Sourceforge via Email as this appeared to be the only way I could find submit a report about the Staden package.
        Their issue tracker is here:


        Also they have a public discussion forum:


        Originally posted by rwillows View Post
        I don't think this is related to the input data. It seems to be related to reporting and "Find Read Pairs" as the reads are displayed correctly and read pairs are correct in the template status and in the editors in GAP5.
        OK - if the read pairs are displayed correctly then it probably isn't the input data.

        Comment


        • #5
          Thanks.
          I've posted a report.

          Comment


          • #6
            Originally posted by rwillows View Post
            Thanks.
            I've posted a report.
            Staden Bug 102, http://sourceforge.net/p/staden/bugs/102/

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Non-Coding RNA Research and Technologies
              by seqadmin




              Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

              Nobel Prize for MicroRNA Discovery
              This week,...
              10-07-2024, 08:07 AM
            • seqadmin
              Recent Developments in Metagenomics
              by seqadmin





              Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
              09-23-2024, 06:35 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 10-11-2024, 06:55 AM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-02-2024, 04:51 AM
            0 responses
            110 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 10-01-2024, 07:10 AM
            0 responses
            114 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-30-2024, 08:33 AM
            1 response
            121 views
            0 likes
            Last Post EmiTom
            by EmiTom
             
            Working...
            X