Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Draft genome scaffolding with RNAseq paired-end reads

    Hello all,

    I used Tophat to map 100bp, PE Illumina transcriptome reads to a draft genome (133062 contigs).
    Our main goal was SNP mining, but I have been suggested the reads could also be used for scaffolding.

    I have no experience in genome assembly and scaffolding, but I assume that if I can find read pairs where the 2 reads are mapped to different genomic contigs, the 2 genomic contigs could then be connected.

    How can I search the BAM alignments for such read pairs?

    Alternatively I could use an assembler that can combine different types of reads such as Mira, but I thought it would take longer, and the genomic reads are not available anyway.

    Thank you!

  • #2
    Look for reads where the "rname" field and "rnext" field are different (and rnext is not "=" or "*"); those have the reads mapped to different contigs.

    Comment


    • #3
      Thank you for the info Brian,
      good starting point, saved me a lot of reading and guesswork.

      I also thought of filtering for MAPQ = 50 (should be uniquely mapped reads)
      and properly paired reads (FLAG = 83|99|147|163)

      The following command should then extract the alignments of interest:

      samtools view -q 50 accepted_hits.bam |gawk '($2 == 83 || $2 == 99 || $2 == 147 || $2 == 163) && $7 !~/[*=]/ {print $3, $7}' > output

      And thus obtain a list of joined contigs.
      However, while it is possible to determine which contigs are joined, I assume the lenght of N bases padding between them cannot.
      Not only the region may not be transcribed, but the insert size for paired reads that have a mate in a different contig appears to be always 0 (at least that is what Tablet shows).

      Or do I have other options I'm unaware of?

      Comment


      • #4
        The insert size of reads mapped to different contigs is unknown. Scaffolding tools can use the distribution of insert sizes of pairs on the same contig, or user-supplied insert size numbers, to determine how many Ns to pad.

        This might be easier if you just use a standalone scaffolding tool. There are various out there, but I don't have a recommendation. Here's a paper comparing some of them:

        Comment


        • #5
          Thank you again for the input,
          I'll check the paper and see if using the above filtered alignments can work.

          Comment


          • #6
            Maybe this program will help:



            I have not used it myself though.

            Comment


            • #7
              L-rna-scaffolder may also help.

              I have used it with varying success.

              Comment


              • #8
                Thank you all for your answers,
                I'll try some of the suggested tools, more likely those that do not have too many dependencies..

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                52 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X