Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • when do you pre-process Illumina reads before analysis?

    I have some PE Illumina reads that I want to analyze with TopHat.
    By looking at the quality plot, I see some deterioration of quality at the 3' end.

    Is it advisable to trim the reads before feeding them to TopHat? If so, what criteria do I use to decide where to trim? Do I trim all reads at the same length?

    Thanks
    PFS

  • #2
    Originally posted by PFS View Post
    Is it advisable to trim the reads before feeding them to TopHat? If so, what criteria do I use to decide where to trim? Do I trim all reads at the same length
    Although alignments are less likely to break than denovo assembly, i'd still recommend trimming reads (unless the alignment tool itself does it).

    Each read should be trimmed on its own merits, based on the quality score.

    Typically i use an adapter removal step, a hard trim of all 'B' quality bases from the tail, removal of N calls from both ends, and a multi-base sliding window, typically cutting off when the average score per base drops below 10-20, depending on the application.

    I also usually drop reads which have below a certain minimal length after this process (typically something like 36 bases, to give a 40-base read a reasonable chance of survival), since shorter reads are not usually informative. This gives me both paired reads and unpaired reads, where the partner has not survived the cull.

    Comment


    • #3
      Originally posted by tonybolger View Post
      This gives me both paired reads and unpaired reads, where the partner has not survived the cull.
      Thanks tonybolger!

      One more question: when you are left with unpaired reads, do you try to remove them or do you keep them in the analysis and maybe use SAM flags to identify them?

      THANKS
      PFS

      Comment


      • #4
        Originally posted by PFS View Post
        One more question: when you are left with unpaired reads, do you try to remove them or do you keep them in the analysis and maybe use SAM flags to identify them?
        After filtering, i have 4 fastq files per lane, forward paired, reverse paired, forward unpaired and reverse unpaired.

        The pipeline from then on generally treats the paired / unpaired data differently, e.g with alignment tools i'd use paired mode vs single mode, but depending on the purpose, it might not make sense to use the unpaired data at all (e.g. scaffolding). On the other hand, sometimes i treat all the reads as single ended (e.g. verifying denovo assembly, where i don't want the bias of assuming the pairing is correct to force a non-optimal alignment).

        If i'm creating SAM files against a reference, i'll typically end up with 3 - one for the paired data, and one for each of the unpaired data files.

        Comment


        • #5
          Originally posted by tonybolger View Post
          After filtering, i have 4 fastq files per lane, forward paired, reverse paired, forward unpaired and reverse unpaired.

          The pipeline from then on generally treats the paired / unpaired data differently, e.g with alignment tools i'd use paired mode vs single mode, but depending on the purpose, it might not make sense to use the unpaired data at all (e.g. scaffolding). On the other hand, sometimes i treat all the reads as single ended (e.g. verifying denovo assembly, where i don't want the bias of assuming the pairing is correct to force a non-optimal alignment).

          If i'm creating SAM files against a reference, i'll typically end up with 3 - one for the paired data, and one for each of the unpaired data files.
          Hi TonyBolger,

          Please can you tell me what software you use to do the trimming with? And did you write custom scripts to separate the paired vs unpaired into different files?

          Thanks!
          Anelda
          Last edited by Anelda; 04-01-2011, 02:37 AM. Reason: Wrong person addressed

          Comment


          • #6
            Ideally we'd like to be able to leave the data alone and let the aligners use the quality values to determine how best to align the sequences. However in practice we usually just trim off really bad sequence (where the majority of the library has dropped to somewhere close to Q0) since this means we can use more stringent parameters when mapping - which can greatly reduce the time taken to do the mapping. Fortunately these days most runs stay at high quality past 50bp which is enough for the types of experiment we run.

            Comment


            • #7
              Originally posted by tonybolger View Post
              Typically i use an adapter removal step, a hard trim of all 'B' quality bases from the tail, removal of N calls from both ends, and a multi-base sliding window, typically cutting off when the average score per base drops below 10-20, depending on the application.
              Can you please elaborate a little on the sliding window stage?
              What size of window do you use and do you use any existing tool to do it?
              thanks!

              Comment


              • #8
                Originally posted by Anelda View Post
                Hi TonyBolger,

                Please can you tell me what software you use to do the trimming with? And did you write custom scripts to separate the paired vs unpaired into different files?
                It's an all-in-one custom app - which i plan to make publically available (this week if i can get the time) since many people seem to want it.

                You give it the input file(s), and a set of filtering steps, and it creates paired and unpaired output files with the appropriate trimming done.

                Comment


                • #9
                  Originally posted by tonybolger View Post
                  It's an all-in-one custom app - which i plan to make publically available (this week if i can get the time) since many people seem to want it.

                  You give it the input file(s), and a set of filtering steps, and it creates paired and unpaired output files with the appropriate trimming done.
                  Would be great :-))

                  Comment


                  • #10
                    Originally posted by reut View Post
                    Can you please elaborate a little on the sliding window stage?
                    What size of window do you use and do you use any existing tool to do it?
                    Normally i use 4 bases window width, and between 10-20 average quality per base within the window. It's a custom written tool, soon to be made publicly available.

                    Comment


                    • #11
                      thanks

                      thanks, please let us know when you publish the tool, it will be useful for us as well.

                      Comment


                      • #12
                        @Tonybolger
                        Yes, such tool would be nice to have! Thanks in advance!

                        Comment


                        • #13
                          Originally posted by tonybolger View Post
                          It's an all-in-one custom app - which i plan to make publically available (this week if i can get the time) since many people seem to want it.

                          You give it the input file(s), and a set of filtering steps, and it creates paired and unpaired output files with the appropriate trimming done.
                          Ah look, it's been almost a month already

                          Anyway, the Trimmomatic is ready for release.

                          Just one issue, does anyone know if Illumina adapter and other sequences can be included in such a tool? I assume i would need to get specific clearance for this. Otherwise each user would need to find / organise the clipping sequences themselves, which is a bit of a pain.

                          Comment


                          • #14
                            FastQC includes adapters

                            I don't know if you can use the Illumina adapters in your tool,
                            but I do know the FastQC tool by Simon Andrews includes a library of adapters and possible contaminators.
                            If it's of any help...

                            Comment


                            • #15
                              BTW, we've been using fastx for the adapter clipping, N removal and 3' trimming (no window though). works fast and well. The only part missing, that we wrote in-house, is to pass over the files afterwards to see which pairs aren't pairs anymore.

                              Like tonybolger, when reads fall below ~30bp we discard them so some pairs don't stay paired.

                              Our script creates 3 files pair1, pair2 and singles.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Best Practices for Single-Cell Sequencing Analysis
                                by seqadmin



                                While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                                Yesterday, 07:15 AM
                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 08:18 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 08:04 AM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 06-03-2024, 06:55 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-30-2024, 03:16 PM
                              0 responses
                              27 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X