Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unused reads in SOAPdenovo

    Hi all,

    I want to extract the reads that were not used in the assembly process of SOAPdenovo.
    I did not find a straight forward way of doing so. I thought about mapping the reads back to the assembly and taking the unmapped reads. But it seems like a simple output SOAPdenovo could give, and I'm not sure the unmapped reads from re-mapping are the same as the reads that were not used for the assembly.

    Does anyone know a better way of getting the unused reads?

    Thanks,
    Rachelly.

  • #2
    Originally posted by Rachelly View Post
    Does anyone know a better way of getting the unused reads?
    There usually isn't any nice way in De Bruijn assemblers - they don't generally track where the k-mers from the reads ended up.

    Unmapped reads from the re-mapping is your best bet - i think there is such an output already from the soap 'map' step.

    Comment


    • #3
      Thanks for your reply tonybloger.
      SOAPdenovo doesn't supply useful data from the "map" step, there is no way to know what reads the indices refer to..

      So I did remapping of the reads to the assembly, but got a totaly different amount of reads mapped back to the assembly, than what SOAPdenovo states in the log file.. It seems that only about 1/3 of the reads were able to re-map to the assembly when using BWA or Bowtie, while SOAPdenovo showed over 94% mapping!

      SOAPdenovo states:
      Code:
      15646393 out of 16551980 (94.5)% reads mapped to contigs
      While mapping with Bowtie gives:
      Code:
      # reads processed: 8275990
      # reads with at least one reported alignment: 862689 (10.42%)
      # reads that failed to align: 7413301 (89.58%)
      Reported 862689 paired-end alignments to 1 output stream(s)
      And BWA:
      Code:
      16551980 + 0 in total (QC-passed reads + QC-failed reads)
      0 + 0 duplicates
      6108971 + 0 mapped (36.91%:nan%)
      16551980 + 0 paired in sequencing
      8275990 + 0 read1
      8275990 + 0 read2
      3906364 + 0 properly paired (23.60%:nan%)
      4618869 + 0 with itself and mate mapped
      1490102 + 0 singletons (9.00%:nan%)
      1192368 + 0 with mate mapped to a different chr
      1188458 + 0 with mate mapped to a different chr (mapQ>=5)
      I tried to map only one end of the reads to the assembly, to see if the problem has to do with the insert size or pairing and got similar results.

      Does anyone know why is there such a big difference between the mapping of SOAPdenovo and after-assembly-mapping?

      Thanks!
      Rachelly.

      Comment


      • #4
        I also have this problem.

        From the SOAPdenovo log, it seems about 90% reads align to the contig.

        However, when I use the bowtie trying to align the raw reads to the contig file, it is also just about 1/3 reads align to the contig.

        I also don't understand why there is so much difference between SOAPdenovo log and after-assembly-mapping?

        Thanks!

        Jingjing

        Comment


        • #5
          I am going to this soon! Is it possible that the contig sequences are different in the log file and the real output(.contig) ?

          Comment


          • #6
            So, the contig file will be different from the final scafSeq file, but only because you've done scaffolding, some error correction, and probably gap filling as well. So, all else equal, you should see more or at least similar numbers of reads map to your genome in the final SOAPdenovo output than in the contig file. However, your after assembly mapping might not use the same mechanism as the SOAP map. When using bwa or bowtie, you may need to loosen the alignment parameters to obtain the same level of mapping.

            Comment


            • #7
              Thanks to Wallysb01!
              As you mean the mapping mechanism is different from the ones used by bwa or bowtie, then what parameters should be changed in the bwa/bowtie softwares? It is hard for me to give an equivalent parameterset, because I do not know any rules for mapping by SOAPdenovo.
              Thanks!

              Comment


              • #8
                Originally posted by Rachelly View Post
                Does anyone know why is there such a big difference between the mapping of SOAPdenovo and after-assembly-mapping?
                I found this more common with smaller value of K. It could be the case that SOAPdenovo counts a read "used" if only a kmer from that read is used. The read alignment algorithms require the entire read to be aligned.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Recent Advances in Sequencing Technologies
                  by seqadmin







                  Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

                  Long-Read Sequencing
                  Long-read sequencing has...
                  12-02-2024, 01:49 PM
                • seqadmin
                  Genetic Variation in Immunogenetics and Antibody Diversity
                  by seqadmin



                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                  11-06-2024, 07:24 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 12-02-2024, 09:29 AM
                0 responses
                137 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-02-2024, 09:06 AM
                0 responses
                48 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 12-02-2024, 08:03 AM
                0 responses
                38 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-22-2024, 07:36 AM
                0 responses
                69 views
                0 likes
                Last Post seqadmin  
                Working...
                X