Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Newbler runMapping via command line

    Hello everyone, I am new to this forum and this is my first post so I hope someone can help me.

    I have some 454 transcriptomic data which I am trying to analyse using Newbler mapping to the human GRCh37.61 cDNA fasta reference. I am having to run Newbler via the command line at the moment as I do not have enough RAM to launch it via Java. However, it seems to be running quite well this way so I am not too bothered about not being able to launch it via Java.

    However, I am exploring the command line options to try and improve the number of reads which are fully/partially mapping. I am still getting a large number of reads which are classified as repeats and I wondered if anyone had any tips on how to improve the quality of my mapping. (The default settings gave me 11% fully mapped, 6% partially mapped, 32% unmapped, 41% repeat, 6.5% chimeric, 3.5% too short).

    I have tried decreasing the seed length from 16 to 10 and this greatly decreased the number of unmapped reads, but increased the number of repeats (almost 50%). I have also changed the repeat score threshold from default (12) to 0 which has improved it a bit more and has greatly increased the number of contigs generated. I am now playing with the minimum overlap length but am getting more chimeric reads.

    I am really just arbitrarily changing these numbers and could sit here from now until Christmas doing this, so I wondered if anyone had any advice or tips they could give me.

    Before you ask why I am not using the assembler, well I just don't think I have enough reads to get a good assembly. My dataset contains around 50,000 reads per sample. What do you think?

    Any advice would be very much appreciated. Thank you in advance.

    Helen

  • #2
    Reads marked 'Repeat' map equally well to multiple locations in the reference. The settings you are trying are not going to change that...

    The only thing I can think of is to have more stringent alignment requirements, so that perhaps these reads start mapping uniquely (i.e. reads from different paralogues mapping to just one of the copies). This can be done by

    - increasing the minimum overlap length -ml, default is 40 bases, but you can go up to higher numbers, or even better, use '-ml 90%' to force at least 90% of the length of the read to map (or try 95%).
    - increasing the minimum overlap identity, -mi, default 90, but you could try '-mi 95' (no % here).

    On the other hand, you might get less reads mapped this way....

    Good luck anyways!

    Comment


    • #3
      Thank you for replying so quickly. I have been exploring many options with Newbler mapping.

      Unfortunately, the options you suggested did not improve the number of reads mapped. However, I think I may have worked out the problem. I am using a cDNA fasta reference as I have transcriptome reads. I have had a look at some of the reads which are 'unmapped' and a quick BLAST of a couple shows these are ribosomal RNAs (and as such will not be in my cDNA fasta file).

      I wonder if anyone else has noticed this in the past? Do you know of a fasta file containing rRNA sequences that I could concatenate with my cDNA reference to maybe annotate my 'unmapped' reads?

      Thank you
      Helen

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Best Practices for Single-Cell Sequencing Analysis
        by seqadmin



        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
        Yesterday, 07:15 AM
      • seqadmin
        Latest Developments in Precision Medicine
        by seqadmin



        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

        Somatic Genomics
        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
        05-24-2024, 01:16 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 08:18 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, Yesterday, 08:04 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-03-2024, 06:55 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-30-2024, 03:16 PM
      0 responses
      27 views
      0 likes
      Last Post seqadmin  
      Working...
      X