Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Newbler Warning - Primer contamination

    Hello,
    I am using Newbler 2.6 to do a de novo cDNA assembly with 454 reads. The program is giving me a warning about possible primer contamination (i.e. TGTTTTTTTTTCT). I checked the assembly and found about 250 contigs (out of 20,000 in total) with the reported primer sequence. We used the MINT cDNA synthesis kit and the reported sequence seems to be part of the MINT kit primers. I could use the -vt flag with runAssembly and provide a fasta file with the primer sequences to trim the reads but would this be correct?

    Reason for not trimming:
    > the primer is part of the mRNA and therefore should not me removed - I will loose some information

    Reasons for trimming:
    > the primer sequence could lead to incorrect asemblies
    > the primer sequence might be part of the mRNA but not the protein - this region could cause false positives in blast searches

    I know that RNAseq data have this characteristic bias (e.g. random hexamer primer) but I think nobody is trimming the read because of it. I could assemble the reads without trimming and remove the contigs with the primer sequence not at the end.

    Is anybody willing to share his thoughts or experience on this? I would appreciate your help. Thanks!

  • #2
    I had something similiar, and got better assemblies removing the MINT primer.

    Comment


    • #3
      Dear maubp,

      thanks for the answer. I was wondering if you trimmed the reads or the contigs? Did you use the -v option or a different program for the read trimming? Would you mind specify "better assembly". How did you assign the quality improvement?
      Last edited by loba17; 06-14-2012, 02:22 AM.

      Comment


      • #4
        Originally posted by maubp View Post
        I had something similiar, and got better assemblies removing the MINT primer.
        same here, we had a lot of EST libs created with MINT system; we always removed primer sequences (as I would do for any other libs as well).

        Comment


        • #5
          I think I tried both the -v option (for Newbler) and trimming the reads (for Newbler and MIRA).

          Without trimming the reads I got some very strange coverage patterns where at one end of a contig there was a MINT adapter that was overly represented. For a MIRA EST example see Figure 5 in Milne et al. 2012 http://dx.doi.org/10.1093/bib/bbs012

          Comment


          • #6
            Dear sklages, dear maubp,

            thanks for your help.

            I was reading more about the trimming step. It seems that the -vt flag is the best way to proceed. People also recommend to use the -vs flag to remove rRNA sequences. For this part I could downloaded the RNAmmer fast file from the CBS website and used it with the -vs flag. This would, however, only cover prokaryotes. Any suggestion about eukaryotes rRNA sequences? I could get the ribosomal sequence from NCBI but I guess the file would be rather large.

            In addition, I also found references recommending the -urt flag. But this seems to be controversial. I tried it and it resulted in a large number of contigs ... three times more than before. Therefore I think it is best not to use it at least for my assembly. Are there other reasons (not) to use it?

            Comment


            • #7
              Regarding the '-urt' flag: it is supposed to give more complete transcripts. It could be that there are ore contigs per isogroup ('gene'), but that you would have to check. So, maybe the flag is actually usefull (I would compare data with and without it - not just looking at contig number - to make sure)

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X