Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Here is the histogram:



    So the aberrant inserts of approx 300bp are clearly caused by an error in how Bowtie classifies insert length.

    I do think this might be for those inserts in the 280-300bp range that fail to have adapters trimmed in the FastQ. What do you think?

    If so, it would explain why there is a dip in apparent insert lenghts in the 280-300bp range, an the anomalous peak at 300bp.

    Comment


    • #17
      Since you now have BBMap installed you can easily check with BBDuk to see how many reads still have adapter contamination. I assume you have not done any trimming on these reads. Specify appropriate adapter file when you trim. Standard illumina adapter files are in "/path_to/bbmap-xx.xx/bbmap/resources/".
      Last edited by GenoMax; 02-05-2015, 06:11 AM.

      Comment


      • #18
        Trimming was performed automatically by Illumina prior to FASTQ download, but I suspect there will be short tails left on a subset of fragments.

        Comment


        • #19
          No harm in trying a pass through BBDuk to verify (specially if you want to get rid of those tails).

          Comment


          • #20
            Initial:
            Memory: free=1039m, used=21m

            Added 16767 kmers; time: 0.225 seconds.
            bbduk output:

            Memory: free=1032m, used=28m

            Input is being processed as paired
            Started output streams: 0.010 seconds.
            Processing time: 14.997 seconds.

            Input: 2546118 reads 645049609 bases.
            KTrimmed: 2543559 reads (99.90%) 633504163 bases (98.21%)
            Result: 467188 reads (18.35%) 11545446 bases (1.79%)

            Time: 15.238 seconds.
            Reads Processed: 2546k 167.09k reads/sec
            Bases Processed: 645m 42.33m bases/sec

            Comment


            • #21
              Hmmm..I don't think I have bbduk working correctly.

              Original Fastq file:

              @M00561:19:000000000-ABAUW:1:1101:10946:1435 1:N:0:9
              TCTCCCTTTTATCTTTACATACTGTCGTTCATTATCCTCTTATCTTATCAAACCTTGCTTTTCATCTTTCTTTTTTTTTTTTTTCTTTTCTCTTTCTTTTCTCCTTTCTACCCTCTATTTTGTTTTCTTTTTTTCTTCAGTTTTTCATCTTTTATTATTCTCTTTTATCTCTGTTTGGATAGCGCCTGACTGAAGTTATAACTTCGGTCATGTTATCTGT
              +
              CBCCCFFFFE9FAFFG9@,C,C@,CC8FF<8CC,CC@F,CC,C,;C,C@,,,;6CE,6,<CC@,,;<CC@,<C,,+78@++7++6<C,<6<,,,<<C?,,5,,,,,<,,,9,,:9,,,9,,,::,:@???<=++8,88,,,,:<8,,,5,,,:,,:,,,5,,,::A,,575,,:8,,+,,,,3+38++,8>,,,,,,,,,,7>@=**66*,,77,@@>,,
              @M00561:19:000000000-ABAUW:1:1101:13213:1452 1:N:0:9
              CCGTATTACCTGCCGCATCATTGTGAGTTGAAGATACATGTGCGGTTGATTTTATCTGGCTAGGCTACGTATTTCTATTTTTTTTCTCCCTTTCTTTTCCTTTTTTCTTTTTTTTTTCTTTTTTCTTTTCTTCTTCTGTTTTTTTTTATTTTTTCTTTTCTTACTCTTTTACTCCAGTGCCTTCAGATGTCTTTTTCTTCGCATTTTCCATTCTTTTTTATTTTTCCTATTTCCTTTATTTCCTCTACCCATCGTTAATATCACTTTCGTTTACTTTCACGTTAGGTTATACCG
              +
              8@<A9EF-C@C<AC+@+B@,CF9C9,,CF,,,,,C,C,C,<,6,+BF+,ECEF,C@E,,;C,,,6,,;,,,,,,69,<,9,,8++:96,::,,6<CC,969,,9,+4,,,<9+++4+4<,,9:@4,<8<8<,,,,5,,<,8,:6++8,::B,,+5,:,:7>,,7,7@,,7,7@>3,,7,77,77,,,7,,7@@<DB>,,6***66@,42,66,6,6,,3*5++5+35++++++3++5+5+5+3+54+2+*+30**2**+**0*2::C***09*2*2*1***0*)))19*2)/*)
              @M00561:19:000000000-ABAUW:1:1101:15428:1464 1:N:0:9
              CACCACCTCTTTTCATGGTACCATTTGCACGCTCCAAACTTGCATAGTGACCTTTTTCGATTAATTGACCAAAGTCAACATTATAACCGTCCTTTTTTTCATCCCTCTTTTCCTCCTCTCTCTTCTCCTCCTCTCCCATCCCCCTATTCAACAGGCCATCTCGTTTTCCTTCTTCTTCTTTTTAACCAATATTTTCTTTTCTTTTCCTTCTTCTTTTTTCCTTTTCCTTTTTTTTTTTTCTCTCTTTTCTCTTCTTTTTTTCCCTCTTTTCTCCTTTTTTTCTTTCTTTTTTCTTCTTT
              +
              8-BB9FFGFFGGGG9F9-C<EE9FFFACAF,@@@@<89DFE9F9F,<C69CCFFEFFE+,<C,,<,,,<;,,,,<,,,9,,,,,,,<,+88@@C,,:++9,,<,:,9:,<<66994=,<:5:9C,<,5,4,94994,,,9:4+8++,,:,,,,,,4+,448:,:+6+:7,:8???@,7A<,,,,,,,,+,,,,,,,,,,,,33,,,,,,,,,,,,,,,+,,,,,,,,,,,,,*******,1++++++++++++++++++))+**+**0*0***0**/**)))********)))****-*
              @M00561:19:000000000-ABAUW:1:1101:12573:1467 1:N:0:9
              CAGCTTATCACCCCGGAATTGGTTTATCCGGAGATGGGGTCTTATGGCTGGAAGAGGCCAGCACCTTTTCTCCCTCCTTTTCTCTTCTGCCGGCCCTTTATATTCCACTCGTATTTTTTGTTTTCTTTCCCTTTCTTACTTTTAACCTCTTCTTGTCTCCTATGTGACCAGCCTCTATTTTTTATTATAATTTTGATAACGTTTGTCTGCTCTTTATCTCCTTCACTTCTTGTTACCTATTTTCTCTCTTCTTCGTGTTTTTAGTGCCTTGGTCTGCCGCAGCGGGCGTGCTTGTTGAC

              Cleaned FASTQ:


              @M00561:19:000000000-ABAUW:1:1101:10946:1435 1:N:0:9
              TCTCCCTTTTATCTT
              +
              CBCCCFFFFE9FAFF
              @M00561:19:000000000-ABAUW:1:1101:13609:1492 1:N:0:9
              TTGTAAAGCATCG
              +
              BC<CCCD<F9FF>
              @M00561:19:000000000-ABAUW:1:1101:17917:1554 1:N:0:9
              TGCTGGACCTGTG
              +
              6-AAB9EFGG8F,
              @M00561:19:000000000-ABAUW:1:1101:10142:1572 1:N:0:9
              TTACTGGCGTCCTTGCTTTCTCCTTC


              It appears to have truncated all my reads by a massive amount.

              Comment


              • #22
                Yikes!

                Can you post your command line for BBDuk? You are trimming the original files, correct (not merged one)?

                Comment


                • #23
                  Sorry my fault. I wrote:

                  k=28 k=12

                  instead of

                  k=28 mink=12

                  New Output:


                  Memory: free=1041m, used=19m

                  Added 126482 kmers; time: 0.091 seconds.
                  Memory: free=1032m, used=28m

                  Input is being processed as paired
                  Started output streams: 0.008 seconds.
                  Processing time: 69.585 seconds.

                  Input: 2546118 reads 645049609 bases.
                  KTrimmed: 11853 reads (0.47%) 597009 bases (0.09%)
                  Result: 2546034 reads (100.00%) 644452600 bases (99.91%)

                  Time: 69.691 seconds.
                  Reads Processed: 2546k 36.53k reads/sec
                  Bases Processed: 645m 9.26m bases/sec

                  Comment


                  • #24
                    Much better. So a few had some adapters left over.

                    Comment


                    • #25
                      If I understand it correctly adapter fragments shorter than 12 will still be left even after this cleaning process. Shoudl I specify a smaller value for mink than 12 to deal with this?

                      Or perhaps there is a different tool for trimming based upon the degree of overlap between read pairs?

                      i.e. IF overlap is less than read length THEN truncate reads to overlap length.

                      Comment


                      • #26
                        Actually it looks like bbmerge can do this with the tbo flag:

                        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


                        I wonder if there is any benefit to processing our reads through this pipeline prior to aligning to the reference?

                        Comment


                        • #27
                          Having reads free of extraneous sequence is going to benefit all downstream analysis. So you should do that trimming. Brian had mentioned about the tbo flag in his earlier post.

                          Comment


                          • #28
                            In case anyone is interested, the FASTQ 2x300bp files cleaned up with bbduk with paramters of k=28 mink=12 resulted in two cleaned reads, which when realligned with Bowtie2 and then opened in SeqMonk create a read distribution that looks like this:



                            So clearly the residual adapter sequences are causing some issues with Bowtie correctly calling insert length, soemthign that is partially corrected by bbduk (but not fully probably due to the mink=12 parameter).

                            If we have time tomorrow, we'll redo with the tbo flag.

                            Comment


                            • #29
                              Originally posted by GenoMax View Post
                              Having reads free of extraneous sequence is going to benefit all downstream analysis. So you should do that trimming. Brian had mentioned about the tbo flag in his earlier post.
                              OH dear. My apologies for not reading/understanding it all. Yes thanks to Brian and Genomax. It's been a tiring day, but thanks for all the help given to a complete novice!

                              Comment


                              • #30
                                A tiring day perhaps, but I am glad it ended well.

                                Good Luck with the rest of your analysis.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Recent Advances in Sequencing Analysis Tools
                                  by seqadmin


                                  The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                  Yesterday, 07:48 AM
                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 06:57 AM
                                0 responses
                                9 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 07:17 AM
                                0 responses
                                14 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 05-02-2024, 08:06 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-30-2024, 12:17 PM
                                0 responses
                                23 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X