Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Overrepresented sequences in Genomic DNA sequence data from Illumina

    Good morning everyone,
    I am new to whole genome sequencing analysis, and if there is another thread for this type of problem, I will be grateful if you can provide it to me. Now a days I am working in comparative analysis of plant genome sequence (DNA). We received sequence data (paired-end) from ILLUMINA, used FASTQC to check the quality and found out > 0.20% overrepresented sequences (from True seq adapters). So, I am looking answers for some questions regarding those overrepresented sequences.
    1) I am wondering if I need to remove those overrepresented sequences from raw data of Genomic DNA sequences before proceeding to downward analysis ?
    2) If I removed it, there might be problem of unequal number of reads between the paired files (R1 and R2). And when trying to remove unpaired reads, we will remove big chunk of single reads from R1 and R2 files. Is there any way to use those single reads from both files that can incorporate in downward analysis, for instance, mapping with reference genome and annotation?

    Thank you in advance.
    akashrestha

  • #2
    0.2% is not a lot.

    Whether you remove adapters depends on what you are going to do with your data, it is more important if say, you don't have a reference genome and you're going to do de novo assembly.

    Depending on how many reads/what level of coverage you have, you can leave out reads that remain unpaired after trimming. Some software may be able to use both the paired and unpaired reads (in separate files).

    I like to use trimmomatic



    for removing adapters, but there are other programs.
    Trimmomatic will separate your trimmed reads into paired and unpaired.

    Comment


    • #3
      Thank you mastal for your reply,

      I am going to do comparative analysis of between the sequences to identify structural variations, indels and snps.

      You have mentioned that there are some software which can use paired and unpaired files seperately, could you please provide me the link of the software.

      Thanks.

      Comment


      • #4
        I was thinking of velvet, for de novo assembly.

        Other software will have their own particular requirements.

        Comment


        • #5
          I am going to do alignment with reference genome instead of velvet. So, is there any softwares that can use unpaired reads in addition to paired reads while conducting mapping with with reference genome.

          Comment


          • #6
            You have a whole tread on the subject of aligning paired and unpaired reads together with BWA on biostars.


            The gist is that you are making your life unnecessarily complicated.
            Just trim with Trimmomatic, and align the remaining paired reads.

            If you absolutely want to align the few unpaired reads remaining after trimming, you can do so following the instructions in the thread posted above. The benefit is dubious, however.

            Comment


            • #7
              Most mapping programs work with either paired or unpaired reads. With BBMap, for example, you would run the program twice (once for paired reads, once for unpaired reads) and merge the resulting mapped output.

              However, there is no reason to have singletons left over after adapter-trimming. Adapter-trimming paired reads should yield paired reads of the same length, since if read 1 has adapter at position X, read 2 will also have adapter at position X. If you use BBDuk for trimming as at the top of this thread, you will not end up with any singletons.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X