Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SeqPrep missing merged reads between 40-50bp - any suggestions?

    Hello, I was wondering if anyone could help me?

    I've been trying to adapter trim and merge my dataset using Seqprep, but when I plot the read lengths after merging, I'm missing most of the reads between 40 and 50bp. I can't work out why, or whether I'm doing something wrong!

    So: read length plots resemble this:

    ()

    I'm running SeqPrep as follows:

    SeqPrep -f L120_1.qual.fastq -r L120_2_.qual.fastq -1 L120-R1.qual.unmerged.fastq -2 L120-R2.qual.unmerged.fastq -3 L120_NeutCap_2-R1.qual.discarded.fastq -4 L120_NeutCap_2-R2.qual.discarded.fastq -L 30 -q 15 -A AGATCGGAAGAGCACACGTC -B GGAAGAGCGTCGTGTAGGGA -s L120_NeutCap_2.qual.merged.fastq -E L120_NeutCap_2.qual.readable_alignment.txt -o 10

    You'll notice that while the first adapter is the standard illumina one, but the second is a modified one, missing the first 5 bp. You can see both adapters present in the file if you grep the sequences (indicated below in bold)…

    Read1 quality trimmed, L120_2 above:

    @HISEQ:268:C8TMGANXX:2:1101:1430:1965 1:N:0:NTCGTCGGNCGCAACG
    CAGGCACTCCCTGGAAACTCTAAGGGGCAGTTCTACTCTAGATCGGAAGA
    +
    A@B0BGGGGGGGCFGGGGGGGGGGGEGGGGGGGGGGCGG@1E@FGD/CEF
    @HISEQ:268:C8TMGANXX:2:1101:1457:1992 1:N:0:TTCGTCGGNCGCAACG
    CTAGACCGCGAATACACACAAGATCGGAAGAGCACACGTCTGAACTCCAG
    +
    33<<BGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGBGGGGGGGG
    @HISEQ:268:C8TMGANXX:2:1101:1684:1955 1:N:0:TTCGTCGGCCGCAACG
    NTGATATGTCCGGAGTGCATCGTATGGCGCTTTCAATGAATTTGAGATCG
    +
    #3<<@EGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGG
    @HISEQ:268:C8TMGANXX:2:1101:1619:1977 1:N:0:TTCGTCGGCCGCAACG
    CGGTGCCATCGAGCCTGTTCTGTCTCATAGTGACCCTAGATCGGAAGAGC
    +
    33@>@GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
    @HISEQ:268:C8TMGANXX:2:1101:1574:1983 1:N:0:TTCGTCGGCCGCAACG
    CCATCCTAGTGGGGGGAAATAGATCGGAAGAGCACACGTCTGAACTCCAA
    +
    <330<E1EFFCGGGGGFGECDGEGGFGBDCDDGEGGGGCD0DDCDG=EBC


    Read 2, quality trimmed, for L120_2 above.

    @HISEQ:268:C8TMGANXX:2:1101:1430:1965 2:N:0:NTCGTCGGNCGCAACG
    AGAGTAGAACTGCCCCNNNNAGTTTCCAGGGAGTGCCTGGGAAGAGCGTC
    +
    BB@BBGGDFGGGGGGG####==EFGDFFGGGGGGGGGGGGEGGGGGGGGF
    @HISEQ:268:C8TMGANXX:2:1101:1457:1992 2:N:0:TTCGTCGGNCGCAACG
    TGTGTGTATTCGCGGTCTATGGAAGAGCGTCGTGTAGGGAAAGAGTGTCG
    +
    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
    @HISEQ:268:C8TMGANXX:2:1101:1684:1955 2:N:0:TTCGTCGGCCGCAACG
    CAAATTCATTGAAAGNNNNNTACGATGCACTCCGGACATATCATGGAAGA
    +
    CCCCCGGGGGGGGGG#####@=EFGGGGGGGGGGGGGGGGGGGGGGGGGG
    @HISEQ:268:C8TMGANXX:2:1101:1619:1977 2:N:0:TTCGTCGGCCGCAACG
    AGGGTCACTATGAGACAGAACAGGCTCGATGGCACCTGGAAGAGCGTCGT
    +
    CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
    @HISEQ:268:C8TMGANXX:2:1101:1574:1983 2:N:0:TTCGTCGGCCGCAACG
    ATTTCCCCCCACTAGGATGTGGAAGAGCGTCGTGTAGGGAAAGAGTGTCG
    +
    BCCCCGGGGGDGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGFG


    The only time I've seen such a dip is when I got the adapter sequences wrong in the SeqPrep command. When I corrected them it went away. But I think the adapter sequences are correct, so I can't explain why there's a dip in the read length frequency. Is this a quirk of SeqPrep? Can anyone offer any explanation?

    I'd be very grateful of any help!
    Many thanks.

  • #2
    I should also add, that the depth of this dip differs between samples (i.e. some sample have barely any reads between 40 and 50bp, whereas some have hardly any missing). The only thing which differs between samples is the 8bp index, found within the adapter sequence. I'm not sure how Seqprep removes the adapter sequence, but I don't think this should affect it? Again, any thoughts welcome.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Latest Developments in Precision Medicine
      by seqadmin



      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

      Somatic Genomics
      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
      05-24-2024, 01:16 PM
    • seqadmin
      Recent Advances in Sequencing Analysis Tools
      by seqadmin


      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
      05-06-2024, 07:48 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 05-24-2024, 07:15 AM
    0 responses
    195 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-23-2024, 10:28 AM
    0 responses
    218 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-23-2024, 07:35 AM
    0 responses
    224 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 05-22-2024, 02:06 PM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Working...
    X