Originally posted by bioliyezhang
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Dear btmb,
I'm afraid I still cannot run it. Sorry to keep bothering?
I have corrected tabs and spaces to avoid getting the Unexpected indent Error,
but now I get:
Traceback (most recent call last):
File "fastq_mate_pair", line 61, in <module>
if count==1:
NameError: name 'count' is not defined
Carmen
Comment
-
Hi bmtb,
I hope this is what you are looking for. Even I had the same problem where reads in forward and reverse files were not in order. I use this shell script to extract common reads and orphan reads. It produces 4 files forward_matched, reverse_matched, forward_orphan and reverse_orphan.
Usage is simple
Code:filteresCommonPEreads.sh forward.fastq reverse.fastq
P.S : change file extension to sh.Attached Files
Comment
-
btmb,
I got your edited script to run today, thank you. However, it prints nothing out to the created files. I get the following error.
finished reading in: 46na_1_trimmed_clipped.fastq
finished reading in: 46na_1_trimmed_clipped.fastq
number of paired reads: 37819703
number of single reads left from 46na_1_trimmed_clipped.fastq: 3984165
number of single reads left from 46na_2_trimmed_clipped.fastq: 4582643
Traceback (most recent call last):
File "/usr/bin/fastqcombinepairedend_edit.py", line 106, in <module>
SIN.write('@' + str(label) + '1\n' + str(seqs1[label]) + '\n+' + str(label) + '1\n' + str(quals1[label]) + '\n')
KeyError: 'HWI-ST594:1:2110:2163:8239#TTAGGCTTAGGC'
print 'number of single reads left from %s: %d'%(setoffiles[1],len(single2))
for label in single1:
SIN.write('@' + str(label) + '1\n' + str(seqs1[label]) + '\n+' + str(label) + '1\n' + str(quals1[label]) + '\n')
Comment
-
Hi, maybe I'm tot late but for someone looking into this. this neat one liner does the job as well:
Code:paste file1_PE_DATA.fastq file2_PE_DATA.fastq | awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | shuf | head | sed 's/\t\t/\n/g' | awk '{print $1 > "file1.fastq"; print $2 > "file2.fatsq"}'
best,
Phil
Comment
-
Originally posted by sphil View PostHi, maybe I'm tot late but for someone looking into this. this neat one liner does the job as well:
Code:paste file1_PE_DATA.fastq file2_PE_DATA.fastq | awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | shuf | head | sed 's/\t\t/\n/g' | awk '{print $1 > "file1.fastq"; print $2 > "file2.fatsq"}'
best,
Phil
Maybe I didn't understand but I used the "cat" command in front of the line and ran it. It produced two files, but the script just finished without error (thought it was done) after writing only a couple of reads in file1 and just some "2:N:0:ACTGAT" lines in file2. What is the problem here do you think?
-Bo
Comment
-
try this one:
paste pe_file1.fastq pe_file2.fastq | awk '{ printf("%s",$0); n++; if(n%4==0) { printf("\n");} else { printf("\t\t");} }' | sed 's/\t\t/\n/g' | awk '{print $1 > "file1.fastq"; print $2 > "file2.fatsq"}'
the shuf and head command kinda crashed it. sry..
Comment
-
count and name are not defined
I have the same problem as carmeyeii.
After declare count=0 name=None in fastqcombinepairedend.py, it gives me the empty files as following:
./PECombiner.sh
finished reading in: /fastqData/SRR579533_1_trimmed_clipped.fastq
finished reading in: /fastqData/SRR579533_2_trimmed_clipped.fastq
number of paired reads: 0
number of single reads left from /fastqData/SRR579533_1_trimmed_clipped.fastq: 0
number of single reads left from /fastqData/SRR579533_2_trimmed_clipped.fastq: 0
Is the way I declare the variable is not right?
Something went wrong here, is anybody can help?
Thank you in advance,
Originally posted by carmeyeii View PostDear btmb,
I have tried the code and it seems not to be working 'name' and 'count' are not defined before they are evaluated it seems, and the 'count' block seems to me as it sets to 0 when the header is encountered, but I'm not sure. Are there any python-saavy bioinformaticians that would care to help us out with this code?
Thanks!Last edited by newuserofrnaseq; 01-13-2014, 11:17 AM.
Comment
-
I also struggled to get that script to work and I was a little frustrated once I did get it working. Mainly because it uses a lot of memory, as someone commented previously, but also it strips the pair information off the output and creates hardcoded file names.
I ended up writing my own tool for pairing reads called Pairfq. The problem I kept running into is that most approaches assume 4 line Fastq as input and the sequence name has to be in a certain format. That means you have to come up with different ways to solve this simple task if you are using Fasta or your sequence names are a little different. It was my aim to try and solve these problems.
Here is an example of the usage:
Code:$ pairfq makepairs -f s_1_1_trimmed.fq \ -r s_1_2_trimmed.fq \ -fp s_1_1_trimmed_p.fq \ -rp s_1_2_trimmed_p.fq \ -fs s_1_1_trimmed_s.fq \ -rs s_1_2_trimmed_s.fq
Code:$ pairfq makepairs -f s_1_1_trimmed.fq \ -r s_1_2_trimmed.fq \ -fp s_1_1_trimmed_p.fq \ -rp s_1_2_trimmed_p.fq \ -fs s_1_1_trimmed_s.fq \ -rs s_1_2_trimmed_s.fq \ --index
The input can be Fasta or Fastq, compressed (with gzip or bzip2) or uncompressed, and the sequence identifiers can be in Casava 1.4 or 1.8+ format as explained on the project wiki (note that pairing the reads is just one of the functions of Pairfq). The outputs are separate files of paired and unpaired forward and reverse reads (which can be optionally compressed).
Hopefully, this will save you some time and help to avoid crafting custom shell commands for this task.
Comment
-
I also have a tool for fixing paired reads from interleaved files. It's extremely fast (over 200MB/s) and uses little memory, but cannot fix 2-file input, only single-file interleaved. It's based on the fact that if you corrupt an interleaved file by throwing away some of the reads, all remaining pairs will still be adjacent to their mate. So, you never need to store more than 1 read in memory.
To run it, download BBMap and us use this:
bbsplitpairs.sh in=broken.fq out=pairs.fq outsingle=singletons.fq fixpairs
It reads/writes fasta or fastq, raw or gzipped. It can also do trimming and quality filtering, for example:
bbsplitpairs.sh in=read1.fq in2=read2.fq out=trimmed1.fq out2=trimmed2.fq outsingle=singletons.fq qtrim=rl trimq=10 minlen=40
This will quality-trim the right and left sides of reads to Q10 and discard anything that ends up less than 40bp. For pairs, if one read is shorter than 40bp and the mate is longer, the longer one will be written to the 'outsingle' destination and the shorter will be discarded. in2 and out2 are only needed for paired 2-file input. If a single file is input, then whether it is interleaved or single-ended will be autodetected based on read names, but this can be overridden with the 'int=t' or 'int=f' flag.Last edited by Brian Bushnell; 02-24-2014, 10:29 AM.
Comment
-
Originally posted by Brian Bushnell View PostI also have a tool for fixing paired reads from interleaved files. It's extremely fast (over 200MB/s) and uses little memory, but cannot fix 2-file input, only single-file interleaved. It's based on the fact that if you corrupt an interleaved file by throwing away some of the reads, all remaining pairs will still be adjacent to their mate. So, you never need to store more than 1 read in memory.
To run it, download BBMap and us use this:
bbsplitpairs.sh in=broken.fq out=pairs.fq outsingle=singletons.fq fixpairs
It reads/writes fasta or fastq, raw or gzipped. It can also do trimming and quality filtering, for example:
bbsplitpairs.sh in=read1.fq in2=read2.fq out=trimmed1.fq out2=trimmed2.fq outsingle=singletons.fq qtrim=rl trimq=10 minlen=40
This will quality-trim the right and left sides of reads to Q10 and discard anything that ends up less than 40bp. For pairs, if one read is shorter than 40bp and the mate is longer, the longer one will be written to the 'outsingle' destination and the shorter will be discarded. in2 and out2 are only needed for paired 2-file input. If a single file is input, then whether it is interleaved or single-ended will be autodetected based on read names, but this can be overridden with the 'int=t' or 'int=f' flag.
Code:$ ./bbsplitpairs.sh in=bbsplittest_trimmed_cat.fastq out=bbsplit_test_p.fq outsingle=bbsplit_test_s.fq fixpairs : invalid option Usage: /bin/bash [GNU long option] [option] ... /bin/bash [GNU long option] [option] script-file ... GNU long options: --debug --debugger --dump-po-strings --dump-strings --help --init-file --login --noediting --noprofile --norc --posix --protected --rcfile --rpm-requires --restricted --verbose --version --wordexp Shell options: -irsD or -c command or -O shopt_option (invocation only) -abefhkmnptuvxBCHP or -o option
Code:$ bash bbsplitpairs.sh in=bbsplittest_trimmed_cat.fastq out=bbsplit_test_p.fq outsingle=bbsplit_test_s.fq fixpairs : command not foundne 3: : command not foundne 6: 'bsplitpairs.sh: line 8: syntax error near unexpected token `{ 'bsplitpairs.sh: line 8: `calcXmx () {
Comment
-
Originally posted by SES View PostAs a side note, you might want to put your scripts in directory because people get kind of annoyed when you untar a project and it inflates a bunch of scripts.
The shellscripts are for bash, so you may have to say "bash bbsplitpairs.sh". If your environment does not have bash, then instead of the shellscript, you would do this:
java -Xmx200m -cp /path/to/current/ jgi.SplitPairsAndSingles in=broken.fq out=pairs.fq outsingle=singletons.fq fixpairs
Comment
-
Originally posted by Brian Bushnell View PostThe shellscripts are for bash, so you may have to say "bash bbsplitpairs.sh". If your environment does not have bash, then instead of the shellscript, you would do this:
java -Xmx200m -cp /path/to/current/ jgi.SplitPairsAndSingles in=broken.fq out=pairs.fq outsingle=singletons.fq fixpairs
Comment
-
Originally posted by Brian Bushnell View PostAhh, sorry about that; I'll change it.
The shellscripts are for bash, so you may have to say "bash bbsplitpairs.sh". If your environment does not have bash, then instead of the shellscript, you would do this:
java -Xmx200m -cp /path/to/current/ jgi.SplitPairsAndSingles in=broken.fq out=pairs.fq outsingle=singletons.fq fixpairs
Comment
-
Can you post the top 16 or so lines from the input file?
The expected input is something like this:
@blahA /1
ACGT
+
????
@blahA /2
ACGT
+
????
@blahB /2
ACGT
+
????
@blahC /1
ACGT
+
????
@blahC /2
ACGT
+
????
In this case, "blahA /1" and "blahA /2" would be output as a pair, as would "blahC /1" and "blahC /2", while "blahB /2" would be output to singletons.
If the reads don't contain a " /1" and " /2" or a " 1:" and a " 2:", it won't work.
Comment
Latest Articles
Collapse
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
-
by seqadmin
The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.
Avian Conservation
Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...-
Channel: Articles
03-08-2024, 10:41 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-27-2024, 06:37 PM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
03-27-2024, 06:37 PM
|
||
Started by seqadmin, 03-27-2024, 06:07 PM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
03-27-2024, 06:07 PM
|
||
Started by seqadmin, 03-22-2024, 10:03 AM
|
0 responses
53 views
0 likes
|
Last Post
by seqadmin
03-22-2024, 10:03 AM
|
||
Started by seqadmin, 03-21-2024, 07:32 AM
|
0 responses
69 views
0 likes
|
Last Post
by seqadmin
03-21-2024, 07:32 AM
|
Comment