Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Bug in GAP5: Finding spanning read pairs

    Hi all,

    I've discovered that when using Find Read Pairs in GAP5 to find read pairs in an assembly which span contigs, all of the read pairs reported in the information given about the spanning read pairs by the contig comparator are ALWAYS reported as forwards and forwards direction e.g.

    Read pair:
    From contig DENOVO_c11(#2157181) at 64 reading GAPC_0042_FC:6:104:8598:9213#0/1(#2156633)
    With contig DENOVO_c224(#36154940) at 462 reading GAPC_0042_FC:6:104:8598:9213#0/2(#36162837)
    Direction of first read is forwards
    Direction of second read is forwards
    Length 35

    Read pair:
    From contig DENOVO_c190(#34488635) at 1191 reading GAPC_0042_FC:6:2:17711:1932#0/1(#34493286)
    With contig DENOVO_c354(#38868888) at 2904 reading GAPC_0042_FC:6:2:17711:1932#0/2(#38887671)
    Direction of first read is forwards
    Direction of second read is forwards

    I have three libraries. Two paired end >--< and One mate paired <---->.
    No matter which library I choose or which type of comparison I choose (end vs end, all vs all end vs all) the read pairs reported in the "Contig Comparator" are ALWAYS in the forwards forwards direction.

    When manually looking at the two examples above:
    In the first example first both directions are correct but the position reported for the read for the second contig is not at 462 but at 1847. Is it a coincidence that 462 is rounded up 1/4 of 1847?

    In the second example the reads are positioned correctly but the direction of the second read in the contig is reverse not forwards.

    So in my library of mate pairs which I want to use for scaffolding "find read pairs" finds 85000 spanning read pairs (of 650000 pairs in the library). EVERY SPANNING pair is reported in the forwards forwards direction.
    At first I thought that only forwards forwards were detected but after manually looking at 10 of the reported spanning pairs above, it seems the problem is with the reporting of direction of the second read. The direction is wrong for the second read in half the cases. Also in one case the position of the second reported read is incorrect being 1/4 of the position.
    I manually found some spanning reverse reverse reads using the template status and none of these get reported. So no reads were found by "find read pairs" that are reverse reverse even though these are represented in the assembly when manually looking for spanning reads using the template status colours.

    I've tried to report the problem on the Staden Sourceforge site but have not had a response.

    Regards
    Robert

  • #2
    Could you add a link to your SourceForge report?

    Which input file format are you giving GAP5, and how was it created (in case the error can be traced to the input data)?

    Comment


    • #3
      The input was from a caf file generated by MIRA. I used tg_index to create the database from the caf file. The caf file was from an assembly of 454 Titanium reads and Illumina mate paired and paired end reads by MIRA version 4.04 but this afternoon I check with assemblies from MIRA using versions 3.18 and 4.03 assemblies with the same result.

      The report was essentially the same email to the Staden discussion on Sourceforge via Email as this appeared to be the only way I could find submit a report about the Staden package.

      I don't think this is related to the input data. It seems to be related to reporting and "Find Read Pairs" as the reads are displayed correctly and read pairs are correct in the template status and in the editors in GAP5.

      Comment


      • #4
        Originally posted by rwillows View Post
        The report was essentially the same email to the Staden discussion on Sourceforge via Email as this appeared to be the only way I could find submit a report about the Staden package.
        Their issue tracker is here:


        Also they have a public discussion forum:


        Originally posted by rwillows View Post
        I don't think this is related to the input data. It seems to be related to reporting and "Find Read Pairs" as the reads are displayed correctly and read pairs are correct in the template status and in the editors in GAP5.
        OK - if the read pairs are displayed correctly then it probably isn't the input data.

        Comment


        • #5
          Thanks.
          I've posted a report.

          Comment


          • #6
            Originally posted by rwillows View Post
            Thanks.
            I've posted a report.
            Staden Bug 102, http://sourceforge.net/p/staden/bugs/102/

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 11:49 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            16 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X