Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • fkrueger
    That's weird, then I can't even blame Python for it... Have you tried the version of Trim Galore I attached?

    Leave a comment:

  • bowen
    cutadapt worked fine, wrote a single out.fq file that appears to be right size, etc.

    Leave a comment:

  • bowen
    Will try cutadapt script you graciously suggested. The Python process envoked by trim_galore is still running, it just has gone to 0% CPU. It may be something quirky about the way the last instance of Python was installed on my machine. I may look into that.

    Leave a comment:

  • fkrueger
    I have to admit that I don't exactly know what is going on, but so far I can't see any indication that (or why) Trim Galore would be failing. Just in case I am attaching the latest development version which you might want to give a whirl.

    Alternatively there is a chance that Python or Cutadapt are somehow stalling, so not finishing but also not using a noticable chunk of the CPU anymore. Could you try to run Cutadapt on its own on the file with the same command that Trim Galore is invoking to see if that runs to completion?

    cutadapt -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC -o out.fq index21_GTTTCG_L001-L002_R1_001.fastq
    Attached Files

    Leave a comment:

  • bowen
    and here's a sample of the process now while it's at CPU 0%.

    Sampling process 19136 for 3 seconds with 1 millisecond of run time between samples
    Sampling completed, processing symbols...
    Analysis of sampling Python (pid 19136) every 1 millisecond
    Process: Python [19136]
    Path: /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/
    Load Address: 0x10ab3c000
    Identifier: Python
    Version: ???
    Code Type: X86-64
    Parent Process: perl5.18 [19132]

    Date/Time: 2016-04-14 13:04:45.752 -0400
    Launch Time: 2016-04-14 10:18:54.276 -0400
    OS Version: Mac OS X 10.11.4 (15E65)
    Report Version: 7
    Analysis Tool: /usr/bin/sample

    Call graph:
    2909 Thread_1674560 DispatchQueue_1: (serial)
    2909 start (in libdyld.dylib) + 1 [0x7fff90e945ad]
    2909 Py_Main (in Python) + 3137 [0x10abf7011]
    2909 PyRun_SimpleFileExFlags (in Python) + 698 [0x10abe5634]
    2909 PyRun_FileExFlags (in Python) + 133 [0x10abe5ae5]
    2909 ??? (in Python) load address 0x10ab43000 + 0xa2a42 [0x10abe5a42]
    2909 PyEval_EvalCode (in Python) + 54 [0x10abc5d8c]
    2909 PyEval_EvalCodeEx (in Python) + 1583 [0x10abc63c1]
    2909 PyEval_EvalFrameEx (in Python) + 11609 [0x10abc930c]
    2909 ??? (in Python) load address 0x10ab43000 + 0x894ae [0x10abcc4ae]
    2909 PyEval_EvalCodeEx (in Python) + 1583 [0x10abc63c1]
    2909 PyEval_EvalFrameEx (in Python) + 11609 [0x10abc930c]
    2909 ??? (in Python) load address 0x10ab43000 + 0x894ae [0x10abcc4ae]
    2909 PyEval_EvalCodeEx (in Python) + 1583 [0x10abc63c1]
    2909 PyEval_EvalFrameEx (in Python) + 13400 [0x10abc9a0b]
    2909 ??? (in Python) load address 0x10ab43000 + 0x810af [0x10abc40af]
    2909 PyFile_WriteObject (in Python) + 338 [0x10ab63fb3]
    2909 ??? (in Python) load address 0x10ab43000 + 0x39b09 [0x10ab7cb09]
    2909 ??? (in Python) load address 0x10ab43000 + 0x42a31 [0x10ab85a31]
    2909 fwrite (in libsystem_c.dylib) + 153 [0x7fff8984f34a]
    2909 __sfvwrite (in libsystem_c.dylib) + 194 [0x7fff8984edcb]
    2909 _swrite (in libsystem_c.dylib) + 87 [0x7fff89854202]
    2909 __write_nocancel (in libsystem_kernel.dylib) + 10 [0x7fff810d5612]

    Total number in stack (recursive counted multiple, when >=5):

    Sort by top of stack, same collapsed (when >= 5):
    __write_nocancel (in libsystem_kernel.dylib) 2909

    Binary Images:
    0x10ab3c000 - 0x10ab3cfff org.python.python (2.7.10 - 2.7.10) <307E6E15-ECF7-3BB2-AF06-3E8D23DFDECA> /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/
    0x10ab43000 - 0x10ac34ff7 org.python.python (2.7.10 - 2.7.10) <83AFAAA7-BDFA-354D-8A7A-8F40A30ACB91> /System/Library/Frameworks/Python.framework/Versions/2.7/Python
    0x10affb000 - 0x10affcfff (94) <4394AC91-22AE-3D7D-85C4-792A4F35F3F2> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b081000 - 0x10b093ffb (0) <85EBC770-BB23-375D-99F8-85B587E4DC9C> /Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/
    0x10b0a3000 - 0x10b0a5fff (94) <5FEB3871-0B8F-3233-876C-0E81CF581963> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b0ac000 - 0x10b0affff (94) <D60F7C86-DED4-34F8-BA1B-106E044B6F83> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b0b6000 - 0x10b0bafff (94) <889782F7-5414-3881-BAAB-83CACDFDF0C5> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b0c4000 - 0x10b0c5fff (94) <9200023E-75BA-3F20-843C-398C3709CA88> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b0cb000 - 0x10b0ccff7 (94) <94E8BF2A-7841-32AD-8722-6B2526999CA1> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b113000 - 0x10b116ff7 (94) <44D8B4D6-D536-31EE-94EA-4F3C0FC773FA> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b11c000 - 0x10b11dfff (94) <49B479ED-A07D-322D-9A29-AFF4CA084219> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b162000 - 0x10b165fff (94) <0DCC6B47-A763-3AA6-82C5-B6A58073286B> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b16c000 - 0x10b16dfff (94) <EC2054BE-E4CD-38B3-BBFB-4FEFB76CF1EF> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b2b3000 - 0x10b2b5fff (94) <72EB0E79-95F2-316C-B49C-A259FEA56658> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b2bb000 - 0x10b2cafff (94) <39FEF2EC-8D20-33A6-B91F-EF7B2FAE9009> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b2db000 - 0x10b2ddfff (94) <22170D1C-40EF-303A-8BB7-A48E783F9350> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b2e4000 - 0x10b2e5fff (94) <419069D5-A61F-3925-B320-EA7B9E38F44B> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b2ea000 - 0x10b2ecfff (94) <9044E1C3-221F-3B79-847A-C9C3D8FEA9FD> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b2f1000 - 0x10b2f4fff (94) <435D683B-3940-3669-8CF8-AF280F0B5B9C> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/
    0x10b2fb000 - 0x10b308fff (0) <026B8553-7FE9-3560-B184-D7D2B49AF1DC> /Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/
    0x10b356000 - 0x10b358fff (0) <16A07A2B-280F-3822-AA42-7B44F426CEF4> /Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/
    0x7fff657a7000 - 0x7fff657de0d7 dyld (0.0 - ???) <D9B236BC-4AC1-325F-B3EF-3F06DBDA7119> /usr/lib/dyld
    0x7fff810be000 - 0x7fff810dcff7 libsystem_kernel.dylib (3248.40.184) <88C17B7F-1CD8-3979-A1A9-F7BDB4FCE789> /usr/lib/system/libsystem_kernel.dylib
    0x7fff81390000 - 0x7fff81395ff7 libmacho.dylib (875.1) <318264FA-58F1-39D8-8285-1F6254EE410E> /usr/lib/system/libmacho.dylib
    0x7fff81396000 - 0x7fff8139efff libsystem_networkextension.dylib (385.40.36) <66095DC7-6539-38F2-95EE-458F15F6D014> /usr/lib/system/libsystem_networkextension.dylib
    0x7fff8139f000 - 0x7fff813a7fff libcopyfile.dylib (127) <A48637BC-F3F2-34F2-BB68-4C65FD012832> /usr/lib/system/libcopyfile.dylib
    0x7fff818da000 - 0x7fff818dbfff libsystem_secinit.dylib (20) <32B1A8C6-DC84-3F4F-B8CE-9A52B47C3E6B> /usr/lib/system/libsystem_secinit.dylib
    0x7fff82082000 - 0x7fff82082ff7 libunc.dylib (29) <DDB1E947-C775-33B8-B461-63E5EB698F0E> /usr/lib/system/libunc.dylib
    0x7fff83006000 - 0x7fff83011ff7 libcommonCrypto.dylib (60075.40.2) <B9D08EB8-FB35-3F7B-8A1C-6FCE3F07B7E7> /usr/lib/system/libcommonCrypto.dylib
    0x7fff83248000 - 0x7fff8325fff7 libsystem_asl.dylib (323.40.3) <007F9094-317A-33EA-AF62-BAEAAB48C0F7> /usr/lib/system/libsystem_asl.dylib
    0x7fff8326c000 - 0x7fff83275ff3 libsystem_notify.dylib (150.40.1) <D48BDE34-0F7E-34CA-A0FF-C578E39987CC> /usr/lib/system/libsystem_notify.dylib
    0x7fff83a9a000 - 0x7fff83ab6ff7 libsystem_malloc.dylib (67.40.1) <5748E8B2-F81C-34C6-8B13-456213127678> /usr/lib/system/libsystem_malloc.dylib
    0x7fff8459c000 - 0x7fff84602ff7 libsystem_network.dylib (583.40.20) <269E5ADD-6922-31E2-8D55-7B777263AC0D> /usr/lib/system/libsystem_network.dylib
    0x7fff8461f000 - 0x7fff84623fff libcache.dylib (75) <9548AAE9-2AB7-3525-9ECE-A2A7C4688447> /usr/lib/system/libcache.dylib
    0x7fff858c9000 - 0x7fff858cafff libsystem_blocks.dylib (65) <1244D9D5-F6AA-35BB-B307-86851C24B8E5> /usr/lib/system/libsystem_blocks.dylib
    0x7fff85f29000 - 0x7fff85f31fef libsystem_platform.dylib (74.40.2) <29A905EF-6777-3C33-82B0-6C3A88C4BA15> /usr/lib/system/libsystem_platform.dylib
    0x7fff85fe4000 - 0x7fff85fe6ff7 libquarantine.dylib (80) <0F4169F0-0C84-3A25-B3AE-E47B3586D908> /usr/lib/system/libquarantine.dylib
    0x7fff86398000 - 0x7fff863c7ffb libsystem_m.dylib (3105) <08E1A4B2-6448-3DFE-A58C-ACC7335BE7E4> /usr/lib/system/libsystem_m.dylib
    0x7fff87175000 - 0x7fff87382fff libicucore.A.dylib (551.51) <35315A29-E21C-3CC5-8BD6-E07A3AE8FC0D> /usr/lib/libicucore.A.dylib
    0x7fff89766000 - 0x7fff89768fff libsystem_coreservices.dylib (19.2) <1B3F5AFC-FFCD-3ECB-8B9A-5538366FB20D> /usr/lib/system/libsystem_coreservices.dylib
    0x7fff89810000 - 0x7fff8989dfff libsystem_c.dylib (1082.20.4) <CDEBF2BB-A578-30F5-846F-96274951C3C5> /usr/lib/system/libsystem_c.dylib
    0x7fff89d4a000 - 0x7fff89d61ff7 libsystem_coretls.dylib (83.40.5) <C90DAE38-4082-381C-A185-2A6A8B677628> /usr/lib/system/libsystem_coretls.dylib
    0x7fff8a196000 - 0x7fff8a1a7ff7 libsystem_trace.dylib (201.10.3) <25104542-5251-3E8D-B14A-9E37207218BC> /usr/lib/system/libsystem_trace.dylib
    0x7fff8af3c000 - 0x7fff8af4dff7 libz.1.dylib (61.20.1) <B3EBB42F-48E3-3287-9F0D-308E04D407AC> /usr/lib/libz.1.dylib
    0x7fff8b64f000 - 0x7fff8b657ffb libsystem_dnssd.dylib (625.40.20) <86A05653-DCA0-3345-B29F-F320029AA05E> /usr/lib/system/libsystem_dnssd.dylib
    0x7fff8bd80000 - 0x7fff8bda9fff libc++abi.dylib (125) <DCCC8177-3D09-35BC-9784-2A04FEC4C71B> /usr/lib/libc++abi.dylib
    0x7fff8c2e0000 - 0x7fff8c2e7ff7 libcompiler_rt.dylib (62) <A13ECF69-F59F-38AE-8609-7B731450FBCD> /usr/lib/system/libcompiler_rt.dylib
    0x7fff8cce5000 - 0x7fff8cce6ffb libremovefile.dylib (41) <552EF39E-14D7-363E-9059-4565AC2F894E> /usr/lib/system/libremovefile.dylib
    0x7fff8d19f000 - 0x7fff8d216feb libcorecrypto.dylib (335.40.8) <9D300121-CAF8-3894-8774-DF38FA65F238> /usr/lib/system/libcorecrypto.dylib
    0x7fff8d3c9000 - 0x7fff8d3cafff libDiagnosticMessagesClient.dylib (100) <4243B6B4-21E9-355B-9C5A-95A216233B96> /usr/lib/libDiagnosticMessagesClient.dylib
    0x7fff8d70d000 - 0x7fff8d716ff7 libsystem_pthread.dylib (138.10.4) <3DD1EF4C-1D1B-3ABF-8CC6-B3B1CEEE9559> /usr/lib/system/libsystem_pthread.dylib
    0x7fff8d83a000 - 0x7fff8d83aff7 liblaunch.dylib (765.40.36) <1CD7619D-AF2E-34D1-8EC6-8021CF473D9B> /usr/lib/system/liblaunch.dylib
    0x7fff90b54000 - 0x7fff90b55ffb libSystem.B.dylib (1226.10.1) <CD307E99-FC5C-3575-BCCE-0C861AA63124> /usr/lib/libSystem.B.dylib
    0x7fff90c9a000 - 0x7fff90cc7fff libdispatch.dylib (501.40.12) <C7499857-61A5-3D7D-A5EA-65DCC8C3DF92> /usr/lib/system/libdispatch.dylib
    0x7fff90ceb000 - 0x7fff90cf0ff3 libunwind.dylib (35.3) <F6EB48E5-4D12-359A-AB54-C937FBBE9043> /usr/lib/system/libunwind.dylib
    0x7fff90e91000 - 0x7fff90e94ffb libdyld.dylib (360.21) <8390E026-F7DE-3C32-9486-3DFF6BD131B0> /usr/lib/system/libdyld.dylib
    0x7fff91bb5000 - 0x7fff9202bfff (6.9 - 1258.1) <943A1383-DA6A-3DC0-ABCD-D9AEB3D0D34D> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
    0x7fff92038000 - 0x7fff9203aff7 libsystem_configuration.dylib (802.40.13) <3DEB7DF9-6804-37E1-BC83-0166882FF0FF> /usr/lib/system/libsystem_configuration.dylib
    0x7fff920d9000 - 0x7fff9212cff7 libc++.1.dylib (120.1) <8FC3D139-8055-3498-9AC5-6467CB7F4D14> /usr/lib/libc++.1.dylib
    0x7fff9212e000 - 0x7fff92499657 libobjc.A.dylib (680) <D55D5807-1FBE-32A5-9105-44D7AFE68C27> /usr/lib/libobjc.A.dylib
    0x7fff9249a000 - 0x7fff924e0ff7 libauto.dylib (186) <999E610F-41FC-32A3-ADCA-5EC049B65DFB> /usr/lib/libauto.dylib
    0x7fff9283a000 - 0x7fff92863ff7 libxpc.dylib (765.40.36) <2CC7CF36-66D4-301B-A6D8-EBAE7405B008> /usr/lib/system/libxpc.dylib
    0x7fff92e6d000 - 0x7fff92e7bff7 libbz2.1.0.dylib (38) <28E54258-C0FE-38D4-AB76-1734CACCB344> /usr/lib/libbz2.1.0.dylib
    0x7fff9335a000 - 0x7fff93383fff libsystem_info.dylib (477.40.5) <6B01C09E-A3E5-3C71-B370-D0CABD11A436> /usr/lib/system/libsystem_info.dylib
    0x7fff9352b000 - 0x7fff9352bff7 libkeymgr.dylib (28) <8371CE54-5FDD-3CE9-B3DF-E98C761B6FE0> /usr/lib/system/libkeymgr.dylib
    0x7fff943bc000 - 0x7fff943bffff libsystem_sandbox.dylib (460.40.33) <30671DCC-265F-325A-B33D-11CD336B3DA3> /usr/lib/system/libsystem_sandbox.dylib
    Sample analysis of process 19136 written to file /dev/stdout

    Leave a comment:

  • bowen
    here are the open files for the Python process (parent process Perl)

    Leave a comment:

  • bowen
    looks like it does stop using the CPU, it's still running in activity monitor (Python) but no CPU usage. 0% tmp getting full sounds probable. would the tmp dir be on the drive where the script is housed or on the drive where the outputs are being generated? thanks for the help. it's all local so i don't think network would have any issues. also thanks for the offer of FTP, but hopefully i'll get this figured out soon.

    Leave a comment:

  • fkrueger
    Hi Nathan,
    It is difficult so guess what is going on (or rather wrong) because all seems fine and there are no error messages at all. Just generally, Trim Galore would in a first pass generate two _trimmed.fq files, and then validate these afterwards (length constraints etc) and give rise to two val.fq files. Once that has finished the trimmed.fq files should be deleted again.

    Trim Galore doesn't use a lot of memory so that should not be the problem. Have you checked that you are not running out of disk space? You should be able to gzip the input files, and/or specify --gzip to keep file sizes smaller. And have you checked (maybe run 'top' in another terminal) if Trim Galore is still running or if it has been killed? Maybe it is still running but just very very slowly, e.g. if a network connection or tmp drive is getting full or the like... If it helps I could create you an FTP site and try running it on your files over here to see if there is something unusual? Best, Felix

    Leave a comment:

  • bowen
    Dear Dr. Krueger,
    Thanks for a wonderful piece of stoftware.
    I am running trim_galore on paired end reads, new version on mac os x, seems that python stops after 30M seqs processed sometimes? i have 64GB of RAM. it will stop and not finish. i get an output for the first fq of the pair but never a second one. i have gotten one set of PE reads to finish and create two trimmed .fq files. so I thought I had it all working. not sure why it may have stopped. maybe just a simple command error. not sure. thanks again.
    command pasted below
    Nathan Bowen

    CHR1:RNASeq_working nbowen$ trim_galore index21_GTTTCG_L001-L002_R1_001.fastq index21_GTTTCG_L001-L002_R2_001.fastq --path_to_cutadapt /Users/nbowen/Library/Python/2.7/bin/cutadapt --paired
    No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)

    Path to Cutadapt set as: '/Users/nbowen/Library/Python/2.7/bin/cutadapt' (user defined)
    Cutadapt seems to be working fine (tested command '/Users/nbowen/Library/Python/2.7/bin/cutadapt --version')

    Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> index21_GTTTCG_L001-L002_R1_001.fastq <<)

    Found perfect matches for the following adapter sequences:
    Adapter type Count Sequence Sequences analysed Percentage
    Illumina 239964 AGATCGGAAGAGC 1000000 24.00
    smallRNA 22 TGGAATTCTCGG 1000000 0.00
    Nextera 12 CTGTCTCTTATA 1000000 0.00
    Using Illumina adapter for trimming (count: 239964). Second best hit was smallRNA (count: 22)

    Writing report to 'index21_GTTTCG_L001-L002_R1_001.fastq_trimming_report.txt'

    Input filename: index21_GTTTCG_L001-L002_R1_001.fastq
    Trimming mode: paired-end
    Trim Galore version: 0.4.1
    Cutadapt version: 1.9.1
    Quality Phred score cutoff: 20
    Quality encoding type selected: ASCII+33
    Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
    Maximum trimming error rate: 0.1 (default)
    Minimum required adapter overlap (stringency): 1 bp
    Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp

    Writing final adapter and quality trimmed output to index21_GTTTCG_L001-L002_R1_001_trimmed.fq

    >>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file index21_GTTTCG_L001-L002_R1_001.fastq <<<
    10000000 sequences processed
    20000000 sequences processed
    30000000 sequences processed

    Leave a comment:

  • fkrueger
    Hi Kevin,

    The 27% of sequences that get trimmed is an aggregate value, which is calculated by the sum of sequences that had just anything trimmed, even if it as low as 1, 2, 3 bp etc. The Trim Galore report would give you the whole picture of what has been trimmed, but it normally looks something like this:

    sequence trimmed
    G             1000000
    GC             600000
    GCT             56671
    GCTC             4232
    Good luck with resuming your projects!

    By the way we have started aggregating sequencing related Fails at the QC Fail website:, might be worth visiting every now and then to see if there is anything useful that might explain problems you are seeing. Best, Felix

    Leave a comment:

  • kevinrue
    Dear Felix,

    Thanks for the quick answer, always appreciated!

    I ran "grep -c GCTCTTCCGATCT" on the first million forward reads and got 5597 (0.56%)
    To be comprehensive, I ran it on the first million reverse reads and got 10258 (1%)

    By the way, you certainly understood that the 53% and 27% values in my previous post are those reported by trim_galore.

    Now, I wonder where that the massive difference (27% reported by trim_galore; 0.56% exact matches grep-ed) comes from: is it due to partial rev_comp_adapter detected... although this seems a huge difference (see below for a comparison with the original adapter sequence), especially considering that the reverse complemented sequence tends to appear at the 5' of the read and therefore truncation should not be an issue for detection.
    Illustration of trim_galore report for the rev-comp sequence:

    Adapter sequence: 'GCTCTTCCGATCT' ()
    Maximum trimming error rate: 0.1 (default)
    Optional adapter 2 sequence (only used for read 2 of paired-end files): 'GCTCTTCCGATCT'
    Minimum required adapter overlap (stringency): 1 bp
    Total reads processed: 9,812,098
    Reads with adapters: 2,662,297 (27.1%)
    Reads written (passing filters): 9,812,098 (100.0%)

    Also, running "grep -c AGATCGGAAGAGC" on the first million forward and reverse reads, respectively, returns:
    243583 (24.4%)
    238858 (23.9%)
    In this case, I am confident that the difference between 53% (trim_galore) and 25% (grep) is due to partially adapter sequence.

    I am looking at TruSeq® DNA Methylation Kit and can clearly how the adapter sequence, ligated to the 3' of ssDNA fragments, can appear by reading through short fragments.
    Conversely, the rev_comp should clearly not appear in this context. I am not fully familiar with the chemistry of adapters, but the only way I can picture the rev_comp being sequenced is if the adapter gets ligated to the 5' of a fragment (or another adapter).

    I guess I will do as you suggest:
    • align the adapter-trimmed paired reads
    • FastQC the aligned read mates, hoping that the reads showing rev_comp sequences did not map, and therefore do not appear in the K-mer content plot

    By the way: we were already in touch about this project a while ago, because I had surprisingly low mapping efficiency (link). That might have been the issue. I just had to park the whole project for a while, and only got back to it recently.


    Leave a comment:

  • fkrueger
    Hi Kevin,

    thanks for providing so many details for your question, you don't get this very often...

    As a general recommendation I would suggest that you simply run Trim Galore in its default mode (which you already did and which seems to have worked just fine), and just ignore the reverse complement of the adapter.

    If you draw out a sketch of how libraries are constructed you will see that both reads (R1 and R2 of a library) should have a single copy of the sequence AGATCGGAAGAGC which then further extends into the rest of either the R1 or R2 adapter sequence (the sequence will then divert for R1 and R2, but for trimming purposes it doesn't matter how the sequence continues). You will further see that no sequence should have the reverse complement of the adapter sequence as a result of read-through adapter contamination. I would expect to find this reverse complemented sequence only as part of some weird processes that happened during library preparation, e.g. adapter or primer dimerisation or concatenation etc. (it should not be present in mammalian genomes at least).

    The only thing that slightly caught me out is the occurrence of 27% which seems very high. This might mean that either you've got quite a lot dimerisation happening, or could it be that this is an aggregate number of all kinds of shorter sequences that have been removed, like G, GC, GCT, GCTC,....? In this case the sequences are probably just some random genomic sequences. If you run
    grep -c GCTCTTCCGATCT your_file
    is the number really that high?

    In any case, I would recommend you just run Trim Galore in default mode and then proceed straight to mapping, I am pretty sure that it is the right thing to do.
    Best wishes, Felix

    Leave a comment:

  • kevinrue
    Dear Felix and fellow trim_galore users,

    I am having a problem/confusion regarding trimming of Illumina adapters from paired-end bisulfite (and some non-bisulfite) samples of a whole-genome bisulfite sequencing project.
    I am not sure what is going on here, and how I should process my reads.

    First, running FastQC on my raw read pairs, I see both:
    • adapter contamination increasing toward the 3' of both forward and reverse reads (link (forward))
    • K-mer content clearly indicating over-representation of the reverse complemented Illumina adapter sequence in both forward and reverse reads (link (forward))

    When I use trim_galore using either auto-detect or --illumina options:
    • trim_galore detects the adapter sequence in ~53% of the reads, making FastQC extremely happy with a perfect absence of adapter contamination in the trimmed files
    • reverse complemented adapter sequence still show up in the K-mer content

    Basically, trim_galore cleanly removes any trace of the adapter sequence (AGATCGGAAGAGC), and ignores its reverse complement (well, it did its job!)

    Without much surprise, when I told trim_galore to trim the reverse complement of the adapter sequence, the result of FastQC is:
    • trim_galore detects/removes the sequence in ~27% of the reads
    • Adapter contamination still detected toward the 3' of the reads
    • K-mer content does not show any more sign of reverse complemented adapter over-representation (link (forward))

    A final piece of information that might help is that "grep"ing the adapter and rev_comp_adapter sequences in my raw reads, I observed the following:
    • the adapter sequence is typically found a unique time per read (when found), generally toward the 3' end
    • the reverse complemented sequence can be found multiple times within a single read (typically 1-3 times), generally toward the 5' of the read

    Respective examples:

    In a nutshell, my question is: "What procedure is recommended is this case?"

    I tried to do my homework, and saw earlier in this thread that trim_galore cannot be given multiple adapter sequences (I was thinking of giving both adapter and rev_com).

    Any piece of advice would be greatly appeciated !
    Kind regards,

    Leave a comment:

  • fkrueger
    The temporarily created trimmed.fq files get deleted in the end, leaving only the _val_ files left for further analysis.

    Leave a comment:

  • rakarnik
    *_trimmed vs. *_val

    I am using trim_galore in "--paired" mode. I know that trim_galore runs a second validation step after first running through the first and second reads separately, and is creating *val files as a result. Do these get renamed back to the *trimmed files afterwards, or should downstream analyses use the *val files?
    Thanks for the help!

    Leave a comment:

Latest Articles


  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin

    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM
  • seqadmin
    Strategies for Sequencing Challenging Samples
    by seqadmin

    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
    03-22-2024, 06:39 AM





Topics Statistics Last Post
Started by seqadmin, Yesterday, 12:08 PM
0 responses
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 09:21 AM
0 responses
Last Post seqadmin  
Started by seqadmin, 04-04-2024, 09:00 AM
0 responses
Last Post seqadmin