Hello everyone, I am new to this forum and this is my first post so I hope someone can help me.
I have some 454 transcriptomic data which I am trying to analyse using Newbler mapping to the human GRCh37.61 cDNA fasta reference. I am having to run Newbler via the command line at the moment as I do not have enough RAM to launch it via Java. However, it seems to be running quite well this way so I am not too bothered about not being able to launch it via Java.
However, I am exploring the command line options to try and improve the number of reads which are fully/partially mapping. I am still getting a large number of reads which are classified as repeats and I wondered if anyone had any tips on how to improve the quality of my mapping. (The default settings gave me 11% fully mapped, 6% partially mapped, 32% unmapped, 41% repeat, 6.5% chimeric, 3.5% too short).
I have tried decreasing the seed length from 16 to 10 and this greatly decreased the number of unmapped reads, but increased the number of repeats (almost 50%). I have also changed the repeat score threshold from default (12) to 0 which has improved it a bit more and has greatly increased the number of contigs generated. I am now playing with the minimum overlap length but am getting more chimeric reads.
I am really just arbitrarily changing these numbers and could sit here from now until Christmas doing this, so I wondered if anyone had any advice or tips they could give me.
Before you ask why I am not using the assembler, well I just don't think I have enough reads to get a good assembly. My dataset contains around 50,000 reads per sample. What do you think?
Any advice would be very much appreciated. Thank you in advance.
Helen
I have some 454 transcriptomic data which I am trying to analyse using Newbler mapping to the human GRCh37.61 cDNA fasta reference. I am having to run Newbler via the command line at the moment as I do not have enough RAM to launch it via Java. However, it seems to be running quite well this way so I am not too bothered about not being able to launch it via Java.
However, I am exploring the command line options to try and improve the number of reads which are fully/partially mapping. I am still getting a large number of reads which are classified as repeats and I wondered if anyone had any tips on how to improve the quality of my mapping. (The default settings gave me 11% fully mapped, 6% partially mapped, 32% unmapped, 41% repeat, 6.5% chimeric, 3.5% too short).
I have tried decreasing the seed length from 16 to 10 and this greatly decreased the number of unmapped reads, but increased the number of repeats (almost 50%). I have also changed the repeat score threshold from default (12) to 0 which has improved it a bit more and has greatly increased the number of contigs generated. I am now playing with the minimum overlap length but am getting more chimeric reads.
I am really just arbitrarily changing these numbers and could sit here from now until Christmas doing this, so I wondered if anyone had any advice or tips they could give me.
Before you ask why I am not using the assembler, well I just don't think I have enough reads to get a good assembly. My dataset contains around 50,000 reads per sample. What do you think?
Any advice would be very much appreciated. Thank you in advance.
Helen
Comment