Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Overrepresented sequences in Genomic DNA sequence data from Illumina

    Good morning everyone,
    I am new to whole genome sequencing analysis, and if there is another thread for this type of problem, I will be grateful if you can provide it to me. Now a days I am working in comparative analysis of plant genome sequence (DNA). We received sequence data (paired-end) from ILLUMINA, used FASTQC to check the quality and found out > 0.20% overrepresented sequences (from True seq adapters). So, I am looking answers for some questions regarding those overrepresented sequences.
    1) I am wondering if I need to remove those overrepresented sequences from raw data of Genomic DNA sequences before proceeding to downward analysis ?
    2) If I removed it, there might be problem of unequal number of reads between the paired files (R1 and R2). And when trying to remove unpaired reads, we will remove big chunk of single reads from R1 and R2 files. Is there any way to use those single reads from both files that can incorporate in downward analysis, for instance, mapping with reference genome and annotation?

    Thank you in advance.
    akashrestha

  • #2
    0.2% is not a lot.

    Whether you remove adapters depends on what you are going to do with your data, it is more important if say, you don't have a reference genome and you're going to do de novo assembly.

    Depending on how many reads/what level of coverage you have, you can leave out reads that remain unpaired after trimming. Some software may be able to use both the paired and unpaired reads (in separate files).

    I like to use trimmomatic



    for removing adapters, but there are other programs.
    Trimmomatic will separate your trimmed reads into paired and unpaired.

    Comment


    • #3
      Thank you mastal for your reply,

      I am going to do comparative analysis of between the sequences to identify structural variations, indels and snps.

      You have mentioned that there are some software which can use paired and unpaired files seperately, could you please provide me the link of the software.

      Thanks.

      Comment


      • #4
        I was thinking of velvet, for de novo assembly.

        Other software will have their own particular requirements.

        Comment


        • #5
          I am going to do alignment with reference genome instead of velvet. So, is there any softwares that can use unpaired reads in addition to paired reads while conducting mapping with with reference genome.

          Comment


          • #6
            You have a whole tread on the subject of aligning paired and unpaired reads together with BWA on biostars.


            The gist is that you are making your life unnecessarily complicated.
            Just trim with Trimmomatic, and align the remaining paired reads.

            If you absolutely want to align the few unpaired reads remaining after trimming, you can do so following the instructions in the thread posted above. The benefit is dubious, however.

            Comment


            • #7
              Most mapping programs work with either paired or unpaired reads. With BBMap, for example, you would run the program twice (once for paired reads, once for unpaired reads) and merge the resulting mapped output.

              However, there is no reason to have singletons left over after adapter-trimming. Adapter-trimming paired reads should yield paired reads of the same length, since if read 1 has adapter at position X, read 2 will also have adapter at position X. If you use BBDuk for trimming as at the top of this thread, you will not end up with any singletons.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Best Practices for Single-Cell Sequencing Analysis
                by seqadmin



                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                Today, 07:15 AM
              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 08:18 AM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Today, 08:04 AM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 06-03-2024, 06:55 AM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-30-2024, 03:16 PM
              0 responses
              27 views
              0 likes
              Last Post seqadmin  
              Working...
              X