Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
That's weird, then I can't even blame Python for it... Have you tried the version of Trim Galore I attached?
-
cutadapt worked fine, wrote a single out.fq file that appears to be right size, etc.
thanks,
Nathan
Leave a comment:
-
Thanks,
Will try cutadapt script you graciously suggested. The Python process envoked by trim_galore is still running, it just has gone to 0% CPU. It may be something quirky about the way the last instance of Python was installed on my machine. I may look into that.
Leave a comment:
-
I have to admit that I don't exactly know what is going on, but so far I can't see any indication that (or why) Trim Galore would be failing. Just in case I am attaching the latest development version which you might want to give a whirl.
Alternatively there is a chance that Python or Cutadapt are somehow stalling, so not finishing but also not using a noticable chunk of the CPU anymore. Could you try to run Cutadapt on its own on the file with the same command that Trim Galore is invoking to see if that runs to completion?
Code:cutadapt -f fastq -e 0.1 -q 20 -O 1 -a AGATCGGAAGAGC -o out.fq index21_GTTTCG_L001-L002_R1_001.fastq
Attached Files
Leave a comment:
-
and here's a sample of the process now while it's at CPU 0%.
Sampling process 19136 for 3 seconds with 1 millisecond of run time between samples
Sampling completed, processing symbols...
Analysis of sampling Python (pid 19136) every 1 millisecond
Process: Python [19136]
Path: /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
Load Address: 0x10ab3c000
Identifier: Python
Version: ???
Code Type: X86-64
Parent Process: perl5.18 [19132]
Date/Time: 2016-04-14 13:04:45.752 -0400
Launch Time: 2016-04-14 10:18:54.276 -0400
OS Version: Mac OS X 10.11.4 (15E65)
Report Version: 7
Analysis Tool: /usr/bin/sample
----
Call graph:
2909 Thread_1674560 DispatchQueue_1: com.apple.main-thread (serial)
2909 start (in libdyld.dylib) + 1 [0x7fff90e945ad]
2909 Py_Main (in Python) + 3137 [0x10abf7011]
2909 PyRun_SimpleFileExFlags (in Python) + 698 [0x10abe5634]
2909 PyRun_FileExFlags (in Python) + 133 [0x10abe5ae5]
2909 ??? (in Python) load address 0x10ab43000 + 0xa2a42 [0x10abe5a42]
2909 PyEval_EvalCode (in Python) + 54 [0x10abc5d8c]
2909 PyEval_EvalCodeEx (in Python) + 1583 [0x10abc63c1]
2909 PyEval_EvalFrameEx (in Python) + 11609 [0x10abc930c]
2909 ??? (in Python) load address 0x10ab43000 + 0x894ae [0x10abcc4ae]
2909 PyEval_EvalCodeEx (in Python) + 1583 [0x10abc63c1]
2909 PyEval_EvalFrameEx (in Python) + 11609 [0x10abc930c]
2909 ??? (in Python) load address 0x10ab43000 + 0x894ae [0x10abcc4ae]
2909 PyEval_EvalCodeEx (in Python) + 1583 [0x10abc63c1]
2909 PyEval_EvalFrameEx (in Python) + 13400 [0x10abc9a0b]
2909 ??? (in Python) load address 0x10ab43000 + 0x810af [0x10abc40af]
2909 PyFile_WriteObject (in Python) + 338 [0x10ab63fb3]
2909 ??? (in Python) load address 0x10ab43000 + 0x39b09 [0x10ab7cb09]
2909 ??? (in Python) load address 0x10ab43000 + 0x42a31 [0x10ab85a31]
2909 fwrite (in libsystem_c.dylib) + 153 [0x7fff8984f34a]
2909 __sfvwrite (in libsystem_c.dylib) + 194 [0x7fff8984edcb]
2909 _swrite (in libsystem_c.dylib) + 87 [0x7fff89854202]
2909 __write_nocancel (in libsystem_kernel.dylib) + 10 [0x7fff810d5612]
Total number in stack (recursive counted multiple, when >=5):
Sort by top of stack, same collapsed (when >= 5):
__write_nocancel (in libsystem_kernel.dylib) 2909
Binary Images:
0x10ab3c000 - 0x10ab3cfff org.python.python (2.7.10 - 2.7.10) <307E6E15-ECF7-3BB2-AF06-3E8D23DFDECA> /System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
0x10ab43000 - 0x10ac34ff7 org.python.python (2.7.10 - 2.7.10) <83AFAAA7-BDFA-354D-8A7A-8F40A30ACB91> /System/Library/Frameworks/Python.framework/Versions/2.7/Python
0x10affb000 - 0x10affcfff _locale.so (94) <4394AC91-22AE-3D7D-85C4-792A4F35F3F2> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_locale.so
0x10b081000 - 0x10b093ffb +_align.so (0) <85EBC770-BB23-375D-99F8-85B587E4DC9C> /Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_align.so
0x10b0a3000 - 0x10b0a5fff _collections.so (94) <5FEB3871-0B8F-3233-876C-0E81CF581963> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_collections.so
0x10b0ac000 - 0x10b0affff operator.so (94) <D60F7C86-DED4-34F8-BA1B-106E044B6F83> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/operator.so
0x10b0b6000 - 0x10b0bafff itertools.so (94) <889782F7-5414-3881-BAAB-83CACDFDF0C5> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/itertools.so
0x10b0c4000 - 0x10b0c5fff _heapq.so (94) <9200023E-75BA-3F20-843C-398C3709CA88> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_heapq.so
0x10b0cb000 - 0x10b0ccff7 time.so (94) <94E8BF2A-7841-32AD-8722-6B2526999CA1> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/time.so
0x10b113000 - 0x10b116ff7 strop.so (94) <44D8B4D6-D536-31EE-94EA-4F3C0FC773FA> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/strop.so
0x10b11c000 - 0x10b11dfff _functools.so (94) <49B479ED-A07D-322D-9A29-AFF4CA084219> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_functools.so
0x10b162000 - 0x10b165fff _struct.so (94) <0DCC6B47-A763-3AA6-82C5-B6A58073286B> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_struct.so
0x10b16c000 - 0x10b16dfff cStringIO.so (94) <EC2054BE-E4CD-38B3-BBFB-4FEFB76CF1EF> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/cStringIO.so
0x10b2b3000 - 0x10b2b5fff zlib.so (94) <72EB0E79-95F2-316C-B49C-A259FEA56658> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/zlib.so
0x10b2bb000 - 0x10b2cafff _io.so (94) <39FEF2EC-8D20-33A6-B91F-EF7B2FAE9009> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so
0x10b2db000 - 0x10b2ddfff select.so (94) <22170D1C-40EF-303A-8BB7-A48E783F9350> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/select.so
0x10b2e4000 - 0x10b2e5fff fcntl.so (94) <419069D5-A61F-3925-B320-EA7B9E38F44B> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/fcntl.so
0x10b2ea000 - 0x10b2ecfff binascii.so (94) <9044E1C3-221F-3B79-847A-C9C3D8FEA9FD> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/binascii.so
0x10b2f1000 - 0x10b2f4fff bz2.so (94) <435D683B-3940-3669-8CF8-AF280F0B5B9C> /System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/bz2.so
0x10b2fb000 - 0x10b308fff +_seqio.so (0) <026B8553-7FE9-3560-B184-D7D2B49AF1DC> /Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_seqio.so
0x10b356000 - 0x10b358fff +_qualtrim.so (0) <16A07A2B-280F-3822-AA42-7B44F426CEF4> /Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_qualtrim.so
0x7fff657a7000 - 0x7fff657de0d7 dyld (0.0 - ???) <D9B236BC-4AC1-325F-B3EF-3F06DBDA7119> /usr/lib/dyld
0x7fff810be000 - 0x7fff810dcff7 libsystem_kernel.dylib (3248.40.184) <88C17B7F-1CD8-3979-A1A9-F7BDB4FCE789> /usr/lib/system/libsystem_kernel.dylib
0x7fff81390000 - 0x7fff81395ff7 libmacho.dylib (875.1) <318264FA-58F1-39D8-8285-1F6254EE410E> /usr/lib/system/libmacho.dylib
0x7fff81396000 - 0x7fff8139efff libsystem_networkextension.dylib (385.40.36) <66095DC7-6539-38F2-95EE-458F15F6D014> /usr/lib/system/libsystem_networkextension.dylib
0x7fff8139f000 - 0x7fff813a7fff libcopyfile.dylib (127) <A48637BC-F3F2-34F2-BB68-4C65FD012832> /usr/lib/system/libcopyfile.dylib
0x7fff818da000 - 0x7fff818dbfff libsystem_secinit.dylib (20) <32B1A8C6-DC84-3F4F-B8CE-9A52B47C3E6B> /usr/lib/system/libsystem_secinit.dylib
0x7fff82082000 - 0x7fff82082ff7 libunc.dylib (29) <DDB1E947-C775-33B8-B461-63E5EB698F0E> /usr/lib/system/libunc.dylib
0x7fff83006000 - 0x7fff83011ff7 libcommonCrypto.dylib (60075.40.2) <B9D08EB8-FB35-3F7B-8A1C-6FCE3F07B7E7> /usr/lib/system/libcommonCrypto.dylib
0x7fff83248000 - 0x7fff8325fff7 libsystem_asl.dylib (323.40.3) <007F9094-317A-33EA-AF62-BAEAAB48C0F7> /usr/lib/system/libsystem_asl.dylib
0x7fff8326c000 - 0x7fff83275ff3 libsystem_notify.dylib (150.40.1) <D48BDE34-0F7E-34CA-A0FF-C578E39987CC> /usr/lib/system/libsystem_notify.dylib
0x7fff83a9a000 - 0x7fff83ab6ff7 libsystem_malloc.dylib (67.40.1) <5748E8B2-F81C-34C6-8B13-456213127678> /usr/lib/system/libsystem_malloc.dylib
0x7fff8459c000 - 0x7fff84602ff7 libsystem_network.dylib (583.40.20) <269E5ADD-6922-31E2-8D55-7B777263AC0D> /usr/lib/system/libsystem_network.dylib
0x7fff8461f000 - 0x7fff84623fff libcache.dylib (75) <9548AAE9-2AB7-3525-9ECE-A2A7C4688447> /usr/lib/system/libcache.dylib
0x7fff858c9000 - 0x7fff858cafff libsystem_blocks.dylib (65) <1244D9D5-F6AA-35BB-B307-86851C24B8E5> /usr/lib/system/libsystem_blocks.dylib
0x7fff85f29000 - 0x7fff85f31fef libsystem_platform.dylib (74.40.2) <29A905EF-6777-3C33-82B0-6C3A88C4BA15> /usr/lib/system/libsystem_platform.dylib
0x7fff85fe4000 - 0x7fff85fe6ff7 libquarantine.dylib (80) <0F4169F0-0C84-3A25-B3AE-E47B3586D908> /usr/lib/system/libquarantine.dylib
0x7fff86398000 - 0x7fff863c7ffb libsystem_m.dylib (3105) <08E1A4B2-6448-3DFE-A58C-ACC7335BE7E4> /usr/lib/system/libsystem_m.dylib
0x7fff87175000 - 0x7fff87382fff libicucore.A.dylib (551.51) <35315A29-E21C-3CC5-8BD6-E07A3AE8FC0D> /usr/lib/libicucore.A.dylib
0x7fff89766000 - 0x7fff89768fff libsystem_coreservices.dylib (19.2) <1B3F5AFC-FFCD-3ECB-8B9A-5538366FB20D> /usr/lib/system/libsystem_coreservices.dylib
0x7fff89810000 - 0x7fff8989dfff libsystem_c.dylib (1082.20.4) <CDEBF2BB-A578-30F5-846F-96274951C3C5> /usr/lib/system/libsystem_c.dylib
0x7fff89d4a000 - 0x7fff89d61ff7 libsystem_coretls.dylib (83.40.5) <C90DAE38-4082-381C-A185-2A6A8B677628> /usr/lib/system/libsystem_coretls.dylib
0x7fff8a196000 - 0x7fff8a1a7ff7 libsystem_trace.dylib (201.10.3) <25104542-5251-3E8D-B14A-9E37207218BC> /usr/lib/system/libsystem_trace.dylib
0x7fff8af3c000 - 0x7fff8af4dff7 libz.1.dylib (61.20.1) <B3EBB42F-48E3-3287-9F0D-308E04D407AC> /usr/lib/libz.1.dylib
0x7fff8b64f000 - 0x7fff8b657ffb libsystem_dnssd.dylib (625.40.20) <86A05653-DCA0-3345-B29F-F320029AA05E> /usr/lib/system/libsystem_dnssd.dylib
0x7fff8bd80000 - 0x7fff8bda9fff libc++abi.dylib (125) <DCCC8177-3D09-35BC-9784-2A04FEC4C71B> /usr/lib/libc++abi.dylib
0x7fff8c2e0000 - 0x7fff8c2e7ff7 libcompiler_rt.dylib (62) <A13ECF69-F59F-38AE-8609-7B731450FBCD> /usr/lib/system/libcompiler_rt.dylib
0x7fff8cce5000 - 0x7fff8cce6ffb libremovefile.dylib (41) <552EF39E-14D7-363E-9059-4565AC2F894E> /usr/lib/system/libremovefile.dylib
0x7fff8d19f000 - 0x7fff8d216feb libcorecrypto.dylib (335.40.8) <9D300121-CAF8-3894-8774-DF38FA65F238> /usr/lib/system/libcorecrypto.dylib
0x7fff8d3c9000 - 0x7fff8d3cafff libDiagnosticMessagesClient.dylib (100) <4243B6B4-21E9-355B-9C5A-95A216233B96> /usr/lib/libDiagnosticMessagesClient.dylib
0x7fff8d70d000 - 0x7fff8d716ff7 libsystem_pthread.dylib (138.10.4) <3DD1EF4C-1D1B-3ABF-8CC6-B3B1CEEE9559> /usr/lib/system/libsystem_pthread.dylib
0x7fff8d83a000 - 0x7fff8d83aff7 liblaunch.dylib (765.40.36) <1CD7619D-AF2E-34D1-8EC6-8021CF473D9B> /usr/lib/system/liblaunch.dylib
0x7fff90b54000 - 0x7fff90b55ffb libSystem.B.dylib (1226.10.1) <CD307E99-FC5C-3575-BCCE-0C861AA63124> /usr/lib/libSystem.B.dylib
0x7fff90c9a000 - 0x7fff90cc7fff libdispatch.dylib (501.40.12) <C7499857-61A5-3D7D-A5EA-65DCC8C3DF92> /usr/lib/system/libdispatch.dylib
0x7fff90ceb000 - 0x7fff90cf0ff3 libunwind.dylib (35.3) <F6EB48E5-4D12-359A-AB54-C937FBBE9043> /usr/lib/system/libunwind.dylib
0x7fff90e91000 - 0x7fff90e94ffb libdyld.dylib (360.21) <8390E026-F7DE-3C32-9486-3DFF6BD131B0> /usr/lib/system/libdyld.dylib
0x7fff91bb5000 - 0x7fff9202bfff com.apple.CoreFoundation (6.9 - 1258.1) <943A1383-DA6A-3DC0-ABCD-D9AEB3D0D34D> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
0x7fff92038000 - 0x7fff9203aff7 libsystem_configuration.dylib (802.40.13) <3DEB7DF9-6804-37E1-BC83-0166882FF0FF> /usr/lib/system/libsystem_configuration.dylib
0x7fff920d9000 - 0x7fff9212cff7 libc++.1.dylib (120.1) <8FC3D139-8055-3498-9AC5-6467CB7F4D14> /usr/lib/libc++.1.dylib
0x7fff9212e000 - 0x7fff92499657 libobjc.A.dylib (680) <D55D5807-1FBE-32A5-9105-44D7AFE68C27> /usr/lib/libobjc.A.dylib
0x7fff9249a000 - 0x7fff924e0ff7 libauto.dylib (186) <999E610F-41FC-32A3-ADCA-5EC049B65DFB> /usr/lib/libauto.dylib
0x7fff9283a000 - 0x7fff92863ff7 libxpc.dylib (765.40.36) <2CC7CF36-66D4-301B-A6D8-EBAE7405B008> /usr/lib/system/libxpc.dylib
0x7fff92e6d000 - 0x7fff92e7bff7 libbz2.1.0.dylib (38) <28E54258-C0FE-38D4-AB76-1734CACCB344> /usr/lib/libbz2.1.0.dylib
0x7fff9335a000 - 0x7fff93383fff libsystem_info.dylib (477.40.5) <6B01C09E-A3E5-3C71-B370-D0CABD11A436> /usr/lib/system/libsystem_info.dylib
0x7fff9352b000 - 0x7fff9352bff7 libkeymgr.dylib (28) <8371CE54-5FDD-3CE9-B3DF-E98C761B6FE0> /usr/lib/system/libkeymgr.dylib
0x7fff943bc000 - 0x7fff943bffff libsystem_sandbox.dylib (460.40.33) <30671DCC-265F-325A-B33D-11CD336B3DA3> /usr/lib/system/libsystem_sandbox.dylib
Sample analysis of process 19136 written to file /dev/stdout
Leave a comment:
-
here are the open files for the Python process (parent process Perl)
/Volumes/CHR1_BIOINF_WORKING/RNASeq_working
/System/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python
/System/Library/Frameworks/Python.framework/Versions/2.7/Python
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_locale.so
/Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_align.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_collections.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/operator.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/itertools.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_heapq.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/time.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/strop.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_functools.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_struct.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/cStringIO.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/zlib.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/_io.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/select.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/fcntl.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/binascii.so
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/lib-dynload/bz2.so
/Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_seqio.so
/Users/nbowen/Library/Python/2.7/lib/python/site-packages/cutadapt/_qualtrim.so
/usr/lib/dyld
/private/var/db/dyld/dyld_shared_cache_x86_64
->0x99a5dbc1f67c4a4f
->0x99a5dbc1f67c51cf
/Volumes/CHR1_BIOINF_WORKING/RNASeq_working/index11_GGCTAC_L001-L002_R1_001.fastq
Leave a comment:
-
looks like it does stop using the CPU, it's still running in activity monitor (Python) but no CPU usage. 0% tmp getting full sounds probable. would the tmp dir be on the drive where the script is housed or on the drive where the outputs are being generated? thanks for the help. it's all local so i don't think network would have any issues. also thanks for the offer of FTP, but hopefully i'll get this figured out soon.
Leave a comment:
-
Hi Nathan,
It is difficult so guess what is going on (or rather wrong) because all seems fine and there are no error messages at all. Just generally, Trim Galore would in a first pass generate two _trimmed.fq files, and then validate these afterwards (length constraints etc) and give rise to two val.fq files. Once that has finished the trimmed.fq files should be deleted again.
Trim Galore doesn't use a lot of memory so that should not be the problem. Have you checked that you are not running out of disk space? You should be able to gzip the input files, and/or specify --gzip to keep file sizes smaller. And have you checked (maybe run 'top' in another terminal) if Trim Galore is still running or if it has been killed? Maybe it is still running but just very very slowly, e.g. if a network connection or tmp drive is getting full or the like... If it helps I could create you an FTP site and try running it on your files over here to see if there is something unusual? Best, Felix
Leave a comment:
-
Dear Dr. Krueger,
Thanks for a wonderful piece of stoftware.
I am running trim_galore on paired end reads, new version on mac os x, seems that python stops after 30M seqs processed sometimes? i have 64GB of RAM. it will stop and not finish. i get an output for the first fq of the pair but never a second one. i have gotten one set of PE reads to finish and create two trimmed .fq files. so I thought I had it all working. not sure why it may have stopped. maybe just a simple command error. not sure. thanks again.
command pasted below
best,
Nathan Bowen
CHR1:RNASeq_working nbowen$ trim_galore index21_GTTTCG_L001-L002_R1_001.fastq index21_GTTTCG_L001-L002_R2_001.fastq --path_to_cutadapt /Users/nbowen/Library/Python/2.7/bin/cutadapt --paired
No quality encoding type selected. Assuming that the data provided uses Sanger encoded Phred scores (default)
Path to Cutadapt set as: '/Users/nbowen/Library/Python/2.7/bin/cutadapt' (user defined)
1.9.1
Cutadapt seems to be working fine (tested command '/Users/nbowen/Library/Python/2.7/bin/cutadapt --version')
AUTO-DETECTING ADAPTER TYPE
===========================
Attempting to auto-detect adapter type from the first 1 million sequences of the first file (>> index21_GTTTCG_L001-L002_R1_001.fastq <<)
Found perfect matches for the following adapter sequences:
Adapter type Count Sequence Sequences analysed Percentage
Illumina 239964 AGATCGGAAGAGC 1000000 24.00
smallRNA 22 TGGAATTCTCGG 1000000 0.00
Nextera 12 CTGTCTCTTATA 1000000 0.00
Using Illumina adapter for trimming (count: 239964). Second best hit was smallRNA (count: 22)
Writing report to 'index21_GTTTCG_L001-L002_R1_001.fastq_trimming_report.txt'
SUMMARISING RUN PARAMETERS
==========================
Input filename: index21_GTTTCG_L001-L002_R1_001.fastq
Trimming mode: paired-end
Trim Galore version: 0.4.1
Cutadapt version: 1.9.1
Quality Phred score cutoff: 20
Quality encoding type selected: ASCII+33
Adapter sequence: 'AGATCGGAAGAGC' (Illumina TruSeq, Sanger iPCR; auto-detected)
Maximum trimming error rate: 0.1 (default)
Minimum required adapter overlap (stringency): 1 bp
Minimum required sequence length for both reads before a sequence pair gets removed: 20 bp
Writing final adapter and quality trimmed output to index21_GTTTCG_L001-L002_R1_001_trimmed.fq
>>> Now performing quality (cutoff 20) and adapter trimming in a single pass for the adapter sequence: 'AGATCGGAAGAGC' from file index21_GTTTCG_L001-L002_R1_001.fastq <<<
10000000 sequences processed
20000000 sequences processed
30000000 sequences processed
Leave a comment:
-
Hi Kevin,
The 27% of sequences that get trimmed is an aggregate value, which is calculated by the sum of sequences that had just anything trimmed, even if it as low as 1, 2, 3 bp etc. The Trim Galore report would give you the whole picture of what has been trimmed, but it normally looks something like this:
Code:sequence trimmed G 1000000 GC 600000 GCT 56671 GCTC 4232 ... GCTCTTCCGATCT 4
By the way we have started aggregating sequencing related Fails at the QC Fail website:, might be worth visiting every now and then to see if there is anything useful that might explain problems you are seeing. Best, Felix
Leave a comment:
-
Dear Felix,
Thanks for the quick answer, always appreciated!
I ran "grep -c GCTCTTCCGATCT" on the first million forward reads and got 5597 (0.56%)
To be comprehensive, I ran it on the first million reverse reads and got 10258 (1%)
By the way, you certainly understood that the 53% and 27% values in my previous post are those reported by trim_galore.
Now, I wonder where that the massive difference (27% reported by trim_galore; 0.56% exact matches grep-ed) comes from: is it due to partial rev_comp_adapter detected... although this seems a huge difference (see below for a comparison with the original adapter sequence), especially considering that the reverse complemented sequence tends to appear at the 5' of the read and therefore truncation should not be an issue for detection.
Illustration of trim_galore report for the rev-comp sequence:
Adapter sequence: 'GCTCTTCCGATCT' ()
Maximum trimming error rate: 0.1 (default)
Optional adapter 2 sequence (only used for read 2 of paired-end files): 'GCTCTTCCGATCT'
Minimum required adapter overlap (stringency): 1 bp
...
Total reads processed: 9,812,098
Reads with adapters: 2,662,297 (27.1%)
Reads written (passing filters): 9,812,098 (100.0%)
Also, running "grep -c AGATCGGAAGAGC" on the first million forward and reverse reads, respectively, returns:
243583 (24.4%)
238858 (23.9%)
In this case, I am confident that the difference between 53% (trim_galore) and 25% (grep) is due to partially adapter sequence.
I am looking at TruSeq® DNA Methylation Kit and can clearly how the adapter sequence, ligated to the 3' of ssDNA fragments, can appear by reading through short fragments.
Conversely, the rev_comp should clearly not appear in this context. I am not fully familiar with the chemistry of adapters, but the only way I can picture the rev_comp being sequenced is if the adapter gets ligated to the 5' of a fragment (or another adapter).
I guess I will do as you suggest:- align the adapter-trimmed paired reads
- FastQC the aligned read mates, hoping that the reads showing rev_comp sequences did not map, and therefore do not appear in the K-mer content plot
By the way: we were already in touch about this project a while ago, because I had surprisingly low mapping efficiency (link). That might have been the issue. I just had to park the whole project for a while, and only got back to it recently.
Thanks!
Leave a comment:
-
Hi Kevin,
thanks for providing so many details for your question, you don't get this very often...
As a general recommendation I would suggest that you simply run Trim Galore in its default mode (which you already did and which seems to have worked just fine), and just ignore the reverse complement of the adapter.
If you draw out a sketch of how libraries are constructed you will see that both reads (R1 and R2 of a library) should have a single copy of the sequence AGATCGGAAGAGC which then further extends into the rest of either the R1 or R2 adapter sequence (the sequence will then divert for R1 and R2, but for trimming purposes it doesn't matter how the sequence continues). You will further see that no sequence should have the reverse complement of the adapter sequence as a result of read-through adapter contamination. I would expect to find this reverse complemented sequence only as part of some weird processes that happened during library preparation, e.g. adapter or primer dimerisation or concatenation etc. (it should not be present in mammalian genomes at least).
The only thing that slightly caught me out is the occurrence of 27% which seems very high. This might mean that either you've got quite a lot dimerisation happening, or could it be that this is an aggregate number of all kinds of shorter sequences that have been removed, like G, GC, GCT, GCTC,....? In this case the sequences are probably just some random genomic sequences. If you run
Code:grep -c GCTCTTCCGATCT your_file
In any case, I would recommend you just run Trim Galore in default mode and then proceed straight to mapping, I am pretty sure that it is the right thing to do.
Best wishes, Felix
Leave a comment:
-
Dear Felix and fellow trim_galore users,
I am having a problem/confusion regarding trimming of Illumina adapters from paired-end bisulfite (and some non-bisulfite) samples of a whole-genome bisulfite sequencing project.
I am not sure what is going on here, and how I should process my reads.
First, running FastQC on my raw read pairs, I see both:- adapter contamination increasing toward the 3' of both forward and reverse reads (link (forward))
- K-mer content clearly indicating over-representation of the reverse complemented Illumina adapter sequence in both forward and reverse reads (link (forward))
When I use trim_galore using either auto-detect or --illumina options:- trim_galore detects the adapter sequence in ~53% of the reads, making FastQC extremely happy with a perfect absence of adapter contamination in the trimmed files
- reverse complemented adapter sequence still show up in the K-mer content
Basically, trim_galore cleanly removes any trace of the adapter sequence (AGATCGGAAGAGC), and ignores its reverse complement (well, it did its job!)
Without much surprise, when I told trim_galore to trim the reverse complement of the adapter sequence, the result of FastQC is:- trim_galore detects/removes the sequence in ~27% of the reads
- Adapter contamination still detected toward the 3' of the reads
- K-mer content does not show any more sign of reverse complemented adapter over-representation (link (forward))
A final piece of information that might help is that "grep"ing the adapter and rev_comp_adapter sequences in my raw reads, I observed the following:- the adapter sequence is typically found a unique time per read (when found), generally toward the 3' end
- the reverse complemented sequence can be found multiple times within a single read (typically 1-3 times), generally toward the 5' of the read
Respective examples:
AGAATTCTGGTCTGTCCTATGGTTGTATGGCCCAGAACCATCCTCTTTTTGCTTTTCGTCAGCTCTCAGTTGGTTTCTCAGGGCCAGGTATTTGAAGGGCCTTGGGGTGATGTCATCCTTTAGTAATGACAGATCGGAAGAGCACACGTC
and
AATAGGGTGTGCTCTTCCGATCTATTCTGGTGTGCTCTTCCGATCTTGGCCGGTGTGCTCTTCCGACTTCTCTGTGACCCCAAGATCGGAAGAGCACACCGTGAAGCCCTCAAGAATGGGATACTGCCCTTTAAAAGAGGCCCCCGGGAG
In a nutshell, my question is: "What procedure is recommended is this case?"
I tried to do my homework, and saw earlier in this thread that trim_galore cannot be given multiple adapter sequences (I was thinking of giving both adapter and rev_com).
Any piece of advice would be greatly appeciated !
Kind regards,
Kevin
Leave a comment:
-
The temporarily created trimmed.fq files get deleted in the end, leaving only the _val_ files left for further analysis.
Leave a comment:
-
*_trimmed vs. *_val
Hi,
I am using trim_galore in "--paired" mode. I know that trim_galore runs a second validation step after first running through the first and second reads separately, and is creating *val files as a result. Do these get renamed back to the *trimmed files afterwards, or should downstream analyses use the *val files?
Thanks for the help!
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:09 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
Today, 11:09 AM
|
||
Started by seqadmin, Today, 06:13 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
Today, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Leave a comment: