Hi All,
I have exactly this problem as well, but with fasta files. Anybody know of a program that will work with Fasta or could modify 'mergeShuffledFastqSeqs.pl' so it would work on that format as well?
Much appreciated.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by kmcarr View Postm,
Thanks for the acknowledgement. Here is a link to the thread. If you go there you'll see that I just posted an update. Due to a limitation in cdbfasta my method will not work for large input fastq files. The only work-around at the moment is to split the input up into smaller chunks.
Originally posted by sklages View PostMaybe of interest as well, PairedreadFinder:
from FAR, http://sourceforge.net/apps/mediawik...itle=Main_Page
Sven
Leave a comment:
-
i feel that Trimmomatic index your reads based on input order, not lane_position_... combination, i tested one un matched dataset, they can not handle it.
Leave a comment:
-
thx. everybody, i got it resolved
thx. everybody, i got it resolved
Leave a comment:
-
Hi,
It most probably is a memory issue.
The script loads only the first file into the memory and starts to match with the entries in the second file. You'll have to monitor the memory usage ('top' or 'free -m').
I just ran a test and perl uses 220Gb RAM for two 33Gb fastq file.
Soon I'll start to search for alternative ways to handle memory using perl in order to improve the script. I'll let you know.
-Adhemar
Leave a comment:
-
Hi
my process got killed
perl mergeShuffledFastqSeqs.pl -f1 2044-BH-1_1_sequence.txt -f2 2044-BH-1_2_sequence.txt -r '^@(\S+)\s[1|2]\S+$' -o 2044-BH-1 -t
Loading the first file...Killed
2044-BH-1_1_sequence.txt 18gb, the other one is 17gb. we have a server with 32 duel core cpus and 192gb mem. I wonder what could be the reason it got killed.
thx
Leave a comment:
-
Hi dejavu2010.
@HWI-ST829:138071VACXX:1:1101:1131:2048 1:N:0:ATCACG.
you can use: '^@(\S+)\s[1|2]\S+$'
Assuming that 1 and 2 will appear right after the space char.
'@' 'ID' 'space' '1or1' '...'
I'll add this example to the script.
Leave a comment:
-
hi azneto, how to setup regular expression like the following
@HWI-ST829:138071VACXX:1:1101:1131:2048 1:N:0:ATCACG.
Thanks.
Leave a comment:
-
Maybe of interest as well, PairedreadFinder:
Usage: PairedreadFinder, Version 1.01. This tool takes two fasta/q files and looks for matching readnames in both files. [OPTION]...
-h, --help displays this help message
-v, --version return program version
-s1, --source1 input file 1
-s2, --source2 input file 2
-f, --format input file format
-t1, --target1 target file 1
-t2, --target2 target file 2
-n, --nr-threads nr of threads to use (default 1)
-is, --suffix-ignore nr of characters to ignore from the END of the readname (in case paired reads are named like /1 /2 it should be set to 2) (default 0)
-ip, --prefix-ignore nr of characters to ignore from the BEGINNING of the readname (in case paired reads are named like s_1.. s_2.. it should be set to 3) (default 0)
Sven
Leave a comment:
-
I wrote a script exactly to tackle that issue. You'll find a copy attached.
The script will output either an interleaved mate pair fastq or two fastq files. The unpaired reads will also be saved in a separate file. The script uses a regular expression to identify the ID, so let me know if you need help with that. It requires at least as much RAM as the size of the first file. Feel free to use it and let please let me know how can we improve it.
AdhemarAttached Files
Leave a comment:
-
The trimmer Trimmomatic outputs files with intact pairs as well as files with single reads. It should be able to split your files in the way you want, as well as do trimming at the same time if you wish.
Leave a comment:
-
Originally posted by mgg View PostI remember a nice contribution from kmcarr a while back which can probably help; search for thread 10392. (incidentally I can't recommend my own contribution to that thread - it is hideously slow)
best
m
Thanks for the acknowledgement. Here is a link to the thread. If you go there you'll see that I just posted an update. Due to a limitation in cdbfasta my method will not work for large input fastq files. The only work-around at the moment is to split the input up into smaller chunks.
Leave a comment:
-
re-pairing PE files
I remember a nice contribution from kmcarr a while back which can probably help; search for thread 10392. (incidentally I can't recommend my own contribution to that thread - it is hideously slow)
best
m
Leave a comment:
-
program which can make a pair end to have equal number of sequence
i have a PE100, the problem is some tiles of read 2 are corrupted and it caused un even number of reads in read 1 and read 2. Are there any programs which can match up sequences from both reads and just keep mathed ones for down stream analysis. Thanks
mike
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 05-02-2024, 08:06 AM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
05-02-2024, 08:06 AM
|
||
Started by seqadmin, 04-30-2024, 12:17 PM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
04-30-2024, 12:17 PM
|
||
Started by seqadmin, 04-29-2024, 10:49 AM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
04-29-2024, 10:49 AM
|
||
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
Leave a comment: