Announcement

Collapse

Welcome to the New Seqanswers!

Welcome to the new Seqanswers! We'd love your feedback, please post any you have to this topic: New Seqanswers Feedback.
See more
See less

poly-G in NextSeq

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • poly-G in NextSeq

    Hi,
    I just received NextSeq paired-end results (45 bp 1st read and 40 bp second read) and I noticed (using fastQC) that about 1-2% of the second read is poly-G. I known that G has no "color" so it probably means that these spots are not detected in the paired run but what is the cause for that? Is it common to get this number of failing paired reads? Have someone ran into this before?
    Thanks
    By the way, the first read also contains poly-G but for very few reads.

  • #2
    Hi Asaf

    I am also noticing this in our databasets. This is my first time analysing data from NextSeq and FastQC says that in Read 2, there is overrepresented poly G sequences.

    Did you figure out what was going on?

    Comment


    • #3
      I emailed Illumina's representatives here in Israel but didn't get an answer. I think that the explanation I gave above is reasonable (maybe low efficiency of RT in the cluster?). With v.2 chemistry we had better results but we only ran 1 sample so I can't tell for sure.
      What I do is remove reads that have more than 80% G's and/or use DUST filter to remove low complexity reads. Beware that besides poly-G you'll probably have poly-G with some other nucleotides randomly appearing in the sequence (which might even map to the genome) this is why I remove them before mapping.

      Comment


      • #4
        Such tool is available on github

        There is a tool available on Github for removing PolyA, PolyT, PolyC, PolyG

        https://github.com/OpenGene/after

        Automatic Filtering, Trimming, and Error Removing for fastq data
        Currently it supports Illumina 1.8 or newer format
        AFTER can simply go through all fastq files in a folder and then output a good folder and a bad folder, which contains good reads and bad reads of each fastq file

        Besides remove PolyX, it also can do:
        Trim reads at front and tail according to bad per base sequence content
        Detect and eliminate bubble artifact caused by sequencer due to fluid dynamics issue
        Filter low-quality reads
        Last edited by [email protected]; 12-10-2015, 12:50 AM.
        OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
        FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

        Comment


        • #5
          Use AFTER to do filtering

          AFTER works well with nextseq500 data
          Last edited by [email protected]; 08-05-2015, 12:17 AM. Reason: duplicate
          OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
          FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

          Comment


          • #6
            I have noticed the same thing with NextSeq data. Mostly poly-G, but some other homopolymers as well (even poly-N). I tried this tool After to remove these reads, but it doesn't seem to work. What other program can work with paired-end reads and remove poly-X reads?

            Comment


            • #7
              Originally posted by Holinder View Post
              I have noticed the same thing with NextSeq data. Mostly poly-G, but some other homopolymers as well (even poly-N). I tried this tool After to remove these reads, but it doesn't seem to work. What other program can work with paired-end reads and remove poly-X reads?
              What's the error did you meet when using AFTER? Let me know that and I will help you to fix it.
              OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
              FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

              Comment


              • #8
                With default settings it marked almost all the reads as bad. And good reads had a minimum length of 24 bp, however the default should have been 35 bp.

                Comment


                • #9
                  Originally posted by Holinder View Post
                  With default settings it marked almost all the reads as bad. And good reads had a minimum length of 24 bp, however the default should have been 35 bp.
                  cd to the folder contains your fastq files, and try to run with:

                  Code:
                  python after.py -f0 -t0 -s24
                  -f0 means no trimming in the front
                  -t0 means no trimming in the tail
                  -s24 means set the min read length to 24 bp
                  OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
                  FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

                  Comment


                  • #10
                    And because your read length is extreme short, you shoud set following parameters:

                    -p POLY_SIZE_LIMIT, --poly_size_limit=POLY_SIZE_LIMIT
                    if exists one polyX(polyG means GGGGGGGGG...), and its length is >= POLY_SIZE_LIMIT, then this read/pair is bad. Default is 40
                    -a ALLOW_MISMATCH_IN_POLY, --allow_mismatch_in_poly=ALLOW_MISMATCH_IN_POLY
                    the count of allowed mismatches when evaluating poly_X. Default 5 means disallow any mismatches

                    following options may work:

                    python after.py -f0 -t0 -s24 -p15 -a2

                    that means any read has a 15bp polyX, in the poly it has no more than 2 other bases, will be discarded.

                    i.e.
                    ******AAAAAAAAAATACAA****** will be treated as BAD
                    ******AAACAAAAAATACAA****** will be treated as GOOD
                    Last edited by [email protected]; 12-10-2015, 05:14 PM.
                    OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
                    FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

                    Comment

                    Working...
                    X