Originally posted by Brian Bushnell
View Post
Header Leaderboard Ad
Collapse
matching up paired-end reads after fastx-toolkit filtering
Collapse
Announcement
Collapse
No announcement yet.
X
-
Last edited by SES; 02-24-2014, 08:28 PM.
-
Originally posted by SES View PostI also struggled to get that script to work and I was a little frustrated once I did get it working. Mainly because it uses a lot of memory, as someone commented previously, but also it strips the pair information off the output and creates hardcoded file names.
I ended up writing my own tool for pairing reads called Pairfq. The problem I kept running into is that most approaches assume 4 line Fastq as input and the sequence name has to be in a certain format. That means you have to come up with different ways to solve this simple task if you are using Fasta or your sequence names are a little different. It was my aim to try and solve these problems.
Here is an example of the usage:
Code:$ pairfq makepairs -f s_1_1_trimmed.fq \ -r s_1_2_trimmed.fq \ -fp s_1_1_trimmed_p.fq \ -rp s_1_2_trimmed_p.fq \ -fs s_1_1_trimmed_s.fq \ -rs s_1_2_trimmed_s.fq
Code:$ pairfq makepairs -f s_1_1_trimmed.fq \ -r s_1_2_trimmed.fq \ -fp s_1_1_trimmed_p.fq \ -rp s_1_2_trimmed_p.fq \ -fs s_1_1_trimmed_s.fq \ -rs s_1_2_trimmed_s.fq \ --index
The input can be Fasta or Fastq, compressed (with gzip or bzip2) or uncompressed, and the sequence identifiers can be in Casava 1.4 or 1.8+ format as explained on the project wiki (note that pairing the reads is just one of the functions of Pairfq). The outputs are separate files of paired and unpaired forward and reverse reads (which can be optionally compressed).
Hopefully, this will save you some time and help to avoid crafting custom shell commands for this task.
I am trying to install dependencies. I could not find the version of Berekely DB you listed with tar -xzvf db-5.1.19.tar.gz so I installed the next closest one of db-5.1.29.tar.gz.
However, when I run the perl MakeFile.PL I get the following:
perl Makefile.PL
WARNING: MIN_PERL_VERSION is not a known parameter.
WARNING: CONFIGURE_REQUIRES is not a known parameter.
WARNING: BUILD_REQUIRES is not a known parameter.
WARNING: LICENSE is not a known parameter.
Checking if your kit is complete...
Looks good
Warning: prerequisite BerkeleyDB 0.54 not found.
Warning: prerequisite IPC::System::Simple 1.21 not found.
Warning: prerequisite List::MoreUtils 0.33 not found.
'BUILD_REQUIRES' is not a known MakeMaker parameter name.
'CONFIGURE_REQUIRES' is not a known MakeMaker parameter name.
'LICENSE' is not a known MakeMaker parameter name.
'MIN_PERL_VERSION' is not a known MakeMaker parameter name.
Writing Makefile for bin/pairfq
I still have to install the IPC::System::Simple 1.21 and the List::MoreUtils 0.33 as I did not know these were dependencies until I ran the file, but is it not finding the BerkeleyDB 0.54 because I have an updated version?
Comment
-
Hi Smiller85, The immediate problem is that your version of ExtUtils::MakeMaker is too old to recognize those parameters. From what I can tell, those features were added to EU::MM version 6.48, which is about 6 years old. You can check your version with this command:
Code:perl -MExtUtils::MakeMaker -e 'print ExtUtils::MakeMaker->VERSION'
Let me know if you have any other questions. Feel free to send me an email, or post an issue on the project site.Last edited by SES; 03-18-2014, 09:03 AM.
Comment
-
SES. Right after I sent you the error I noticed the perl version requirement. my version is 5.8.8. Also, looks like you are right about the ExtUtils::MakeMaker being too old. My version is 6.30.
I ran the perl -MExtUtils::MakeMaker -e 'print BerkeleyDB->VERSION' I did not get any info
With the BerkelyDB I had to install it manually because the server does not recognize the cpanminus. I downloaded the db-5.1.29.tar.gz and did the tar command. I then did the following commands to install it:
..dist/configure prefix=/home/smiller/blast/bin/pipeline-work/db-5.1.29/build_unix
make
make install
I also figured that maybe since pairfq is in its own folder home/smiller/blast/bin/pipeline-work/pairfq that maybe that is where I went wrong, but then I also noticed my outdated version of perl, and now from the other code the MakeMaker is outdated.
My school is currently on Spring Break, so I don't know how quick of a response I will get from the administrator on updating things like perl and the ExtUtils::MakeMaker.
Comment
-
Originally posted by smiller85 View PostSES. Right after I sent you the error I noticed the perl version requirement. my version is 5.8.8. Also, looks like you are right about the ExtUtils::MakeMaker being too old. My version is 6.30.
I ran the perl -MExtUtils::MakeMaker -e 'print BerkeleyDB->VERSION' I did not get any info
With the BerkelyDB I had to install it manually because the server does not recognize the cpanminus. I downloaded the db-5.1.29.tar.gz and did the tar command. I then did the following commands to install it:
..dist/configure prefix=/home/smiller/blast/bin/pipeline-work/db-5.1.29/build_unix
make
make install
I also figured that maybe since pairfq is in its own folder home/smiller/blast/bin/pipeline-work/pairfq that maybe that is where I went wrong, but then I also noticed my outdated version of perl, and now from the other code the MakeMaker is outdated.
My school is currently on Spring Break, so I don't know how quick of a response I will get from the administrator on updating things like perl and the ExtUtils::MakeMaker.Code:perl -MBerkeleyDB -e 1
Let me know if you are able to get help from your Sys Admin. I could make a version with no requirements if this is an issue, and that may serve most use cases. Though, my original goal was to solve the problem of having to pair hundreds of millions of reads and removing the deps would not solve that issue with the current design.
Comment
-
Originally posted by smiller85 View PostSES. Right after I sent you the error I noticed the perl version requirement. my version is 5.8.8. Also, looks like you are right about the ExtUtils::MakeMaker being too old. My version is 6.30.Last edited by SES; 03-20-2014, 12:40 PM.
Comment
-
Hi everybody, to help your discussion I can just give as an advice to NOT USE fastx_toolkit for pair end library.
According to the authors, this tool was done for SHORT MOLECULE only. (e.g. shorter than 50 bp or 100 bp depending on your sequencer read length)
FASTQ/A short-reads pre-processing tools
Comment
-
Originally posted by ericaramos View PostHi Carmen,
I'm facing the same problem when running the script. Did you received any answer about your problem?
If yes, could you share with us?
Thanks!
Did you try the tool Pairfq that was mentioned in the thread above? I'd be happy to help with this if you run into any issues. We can help with the other approach as well, but it is hard to see what the issue is and it's also a challenge to keep code updated on a forum such as this.
Comment
-
Originally posted by carmeyeii View PostDear btmb,
I'm afraid I still cannot run it. Sorry to keep bothering?
I have corrected tabs and spaces to avoid getting the Unexpected indent Error,
but now I get:
Thanks again for any help,
CarmenOriginally posted by SES View PostIf you look through the discussion above you can see that a number of people had similar issues, and this script doesn't appear to be maintained. I think the best solution may be to find another approach unless you want to work on that shell/python code.
Did you try the tool Pairfq that was mentioned in the thread above? I'd be happy to help with this if you run into any issues. We can help with the other approach as well, but it is hard to see what the issue is and it's also a challenge to keep code updated on a forum such as this.
...................................................................................................................................Ok, I didn't try using Pairfq, but I will.
Thank you for the answer!
Comment
-
Originally posted by SES View PostIf you look through the discussion above you can see that a number of people had similar issues, and this script doesn't appear to be maintained. I think the best solution may be to find another approach unless you want to work on that shell/python code.
Did you try the tool Pairfq that was mentioned in the thread above? I'd be happy to help with this if you run into any issues. We can help with the other approach as well, but it is hard to see what the issue is and it's also a challenge to keep code updated on a forum such as this.
Pairfq worked pretty well!! Thank you!
Comment
-
After removing the adapters from cutadapt i got unsymmetrical pair end file so I want to know the script that could remove the orphan reads and make the data symmetric although I made it using hash but its very slow.The above mention script is showing error..
Comment
-
Originally posted by ranu1 View PostAfter removing the adapters from cutadapt i got unsymmetrical pair end file so I want to know the script that could remove the orphan reads and make the data symmetric although I made it using hash but its very slow.The above mention script is showing error..
Also, what do you mean when you say the script is showing error? It is not possible to know what the issue is based on that information alone.
Comment
-
BBTools has a tool to quickly re-pair arbitrarily disordered reads based on their names.
For interleaved reads:
repair.sh in=reads.fq out=fixed.fq outsingle=single.fq
For paired reads in two files:
repair.sh in1=read1.fq in2=read2.fq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
You can also repair simple broken interleaving much faster and with less memory, but this will not fix arbitrarily disordered reads, just reads that were interleaved and had some of the reads thrown away:
bbsplitpairs.sh in=reads.fq out=fixed.fq outsingle=single.fq fixinterleavingLast edited by Brian Bushnell; 02-13-2015, 10:31 AM.
Comment
Latest Articles
Collapse
-
by seqadmin
Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...-
Channel: Articles
03-10-2023, 05:31 AM -
-
by seqadmin
Using automation to prepare sequencing libraries isn’t a new concept, and most researchers are aware that there are numerous benefits to automating this process. However, many labs are still hesitant to switch to automation and often believe that it’s not suitable for their lab. To combat these concerns, we’ll cover some of the key advantages, review the most important considerations, and get real-world advice from automation experts to remove any lingering anxieties....-
Channel: Articles
02-21-2023, 02:14 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-17-2023, 12:32 PM
|
0 responses
7 views
0 likes
|
Last Post
by seqadmin
03-17-2023, 12:32 PM
|
||
Started by seqadmin, 03-15-2023, 12:42 PM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
03-15-2023, 12:42 PM
|
||
Started by seqadmin, 03-09-2023, 10:17 AM
|
0 responses
66 views
1 like
|
Last Post
by seqadmin
03-09-2023, 10:17 AM
|
||
Started by seqadmin, 03-03-2023, 12:03 PM
|
0 responses
64 views
0 likes
|
Last Post
by seqadmin
03-03-2023, 12:03 PM
|
Comment