Originally posted by Brian Bushnell
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Last edited by SES; 02-24-2014, 08:28 PM.
-
Originally posted by SES View PostI also struggled to get that script to work and I was a little frustrated once I did get it working. Mainly because it uses a lot of memory, as someone commented previously, but also it strips the pair information off the output and creates hardcoded file names.
I ended up writing my own tool for pairing reads called Pairfq. The problem I kept running into is that most approaches assume 4 line Fastq as input and the sequence name has to be in a certain format. That means you have to come up with different ways to solve this simple task if you are using Fasta or your sequence names are a little different. It was my aim to try and solve these problems.
Here is an example of the usage:
Code:$ pairfq makepairs -f s_1_1_trimmed.fq \ -r s_1_2_trimmed.fq \ -fp s_1_1_trimmed_p.fq \ -rp s_1_2_trimmed_p.fq \ -fs s_1_1_trimmed_s.fq \ -rs s_1_2_trimmed_s.fq
Code:$ pairfq makepairs -f s_1_1_trimmed.fq \ -r s_1_2_trimmed.fq \ -fp s_1_1_trimmed_p.fq \ -rp s_1_2_trimmed_p.fq \ -fs s_1_1_trimmed_s.fq \ -rs s_1_2_trimmed_s.fq \ --index
The input can be Fasta or Fastq, compressed (with gzip or bzip2) or uncompressed, and the sequence identifiers can be in Casava 1.4 or 1.8+ format as explained on the project wiki (note that pairing the reads is just one of the functions of Pairfq). The outputs are separate files of paired and unpaired forward and reverse reads (which can be optionally compressed).
Hopefully, this will save you some time and help to avoid crafting custom shell commands for this task.
I am trying to install dependencies. I could not find the version of Berekely DB you listed with tar -xzvf db-5.1.19.tar.gz so I installed the next closest one of db-5.1.29.tar.gz.
However, when I run the perl MakeFile.PL I get the following:
perl Makefile.PL
WARNING: MIN_PERL_VERSION is not a known parameter.
WARNING: CONFIGURE_REQUIRES is not a known parameter.
WARNING: BUILD_REQUIRES is not a known parameter.
WARNING: LICENSE is not a known parameter.
Checking if your kit is complete...
Looks good
Warning: prerequisite BerkeleyDB 0.54 not found.
Warning: prerequisite IPC::System::Simple 1.21 not found.
Warning: prerequisite List::MoreUtils 0.33 not found.
'BUILD_REQUIRES' is not a known MakeMaker parameter name.
'CONFIGURE_REQUIRES' is not a known MakeMaker parameter name.
'LICENSE' is not a known MakeMaker parameter name.
'MIN_PERL_VERSION' is not a known MakeMaker parameter name.
Writing Makefile for bin/pairfq
I still have to install the IPC::System::Simple 1.21 and the List::MoreUtils 0.33 as I did not know these were dependencies until I ran the file, but is it not finding the BerkeleyDB 0.54 because I have an updated version?
Comment
-
Hi Smiller85, The immediate problem is that your version of ExtUtils::MakeMaker is too old to recognize those parameters. From what I can tell, those features were added to EU::MM version 6.48, which is about 6 years old. You can check your version with this command:
Code:perl -MExtUtils::MakeMaker -e 'print ExtUtils::MakeMaker->VERSION'
Let me know if you have any other questions. Feel free to send me an email, or post an issue on the project site.Last edited by SES; 03-18-2014, 09:03 AM.
Comment
-
SES. Right after I sent you the error I noticed the perl version requirement. my version is 5.8.8. Also, looks like you are right about the ExtUtils::MakeMaker being too old. My version is 6.30.
I ran the perl -MExtUtils::MakeMaker -e 'print BerkeleyDB->VERSION' I did not get any info
With the BerkelyDB I had to install it manually because the server does not recognize the cpanminus. I downloaded the db-5.1.29.tar.gz and did the tar command. I then did the following commands to install it:
..dist/configure prefix=/home/smiller/blast/bin/pipeline-work/db-5.1.29/build_unix
make
make install
I also figured that maybe since pairfq is in its own folder home/smiller/blast/bin/pipeline-work/pairfq that maybe that is where I went wrong, but then I also noticed my outdated version of perl, and now from the other code the MakeMaker is outdated.
My school is currently on Spring Break, so I don't know how quick of a response I will get from the administrator on updating things like perl and the ExtUtils::MakeMaker.
Comment
-
Originally posted by smiller85 View PostSES. Right after I sent you the error I noticed the perl version requirement. my version is 5.8.8. Also, looks like you are right about the ExtUtils::MakeMaker being too old. My version is 6.30.
I ran the perl -MExtUtils::MakeMaker -e 'print BerkeleyDB->VERSION' I did not get any info
With the BerkelyDB I had to install it manually because the server does not recognize the cpanminus. I downloaded the db-5.1.29.tar.gz and did the tar command. I then did the following commands to install it:
..dist/configure prefix=/home/smiller/blast/bin/pipeline-work/db-5.1.29/build_unix
make
make install
I also figured that maybe since pairfq is in its own folder home/smiller/blast/bin/pipeline-work/pairfq that maybe that is where I went wrong, but then I also noticed my outdated version of perl, and now from the other code the MakeMaker is outdated.
My school is currently on Spring Break, so I don't know how quick of a response I will get from the administrator on updating things like perl and the ExtUtils::MakeMaker.Code:perl -MBerkeleyDB -e 1
Let me know if you are able to get help from your Sys Admin. I could make a version with no requirements if this is an issue, and that may serve most use cases. Though, my original goal was to solve the problem of having to pair hundreds of millions of reads and removing the deps would not solve that issue with the current design.
Comment
-
Originally posted by smiller85 View PostSES. Right after I sent you the error I noticed the perl version requirement. my version is 5.8.8. Also, looks like you are right about the ExtUtils::MakeMaker being too old. My version is 6.30.Last edited by SES; 03-20-2014, 12:40 PM.
Comment
-
Hi everybody, to help your discussion I can just give as an advice to NOT USE fastx_toolkit for pair end library.
According to the authors, this tool was done for SHORT MOLECULE only. (e.g. shorter than 50 bp or 100 bp depending on your sequencer read length)
FASTQ/A short-reads pre-processing tools
Comment
-
Originally posted by ericaramos View PostHi Carmen,
I'm facing the same problem when running the script. Did you received any answer about your problem?
If yes, could you share with us?
Thanks!
Did you try the tool Pairfq that was mentioned in the thread above? I'd be happy to help with this if you run into any issues. We can help with the other approach as well, but it is hard to see what the issue is and it's also a challenge to keep code updated on a forum such as this.
Comment
-
Originally posted by carmeyeii View PostDear btmb,
I'm afraid I still cannot run it. Sorry to keep bothering?
I have corrected tabs and spaces to avoid getting the Unexpected indent Error,
but now I get:
Thanks again for any help,
CarmenOriginally posted by SES View PostIf you look through the discussion above you can see that a number of people had similar issues, and this script doesn't appear to be maintained. I think the best solution may be to find another approach unless you want to work on that shell/python code.
Did you try the tool Pairfq that was mentioned in the thread above? I'd be happy to help with this if you run into any issues. We can help with the other approach as well, but it is hard to see what the issue is and it's also a challenge to keep code updated on a forum such as this.
...................................................................................................................................Ok, I didn't try using Pairfq, but I will.
Thank you for the answer!
Comment
-
Originally posted by SES View PostIf you look through the discussion above you can see that a number of people had similar issues, and this script doesn't appear to be maintained. I think the best solution may be to find another approach unless you want to work on that shell/python code.
Did you try the tool Pairfq that was mentioned in the thread above? I'd be happy to help with this if you run into any issues. We can help with the other approach as well, but it is hard to see what the issue is and it's also a challenge to keep code updated on a forum such as this.
Pairfq worked pretty well!! Thank you!
Comment
-
After removing the adapters from cutadapt i got unsymmetrical pair end file so I want to know the script that could remove the orphan reads and make the data symmetric although I made it using hash but its very slow.The above mention script is showing error..
Comment
-
Originally posted by ranu1 View PostAfter removing the adapters from cutadapt i got unsymmetrical pair end file so I want to know the script that could remove the orphan reads and make the data symmetric although I made it using hash but its very slow.The above mention script is showing error..
Also, what do you mean when you say the script is showing error? It is not possible to know what the issue is based on that information alone.
Comment
-
BBTools has a tool to quickly re-pair arbitrarily disordered reads based on their names.
For interleaved reads:
repair.sh in=reads.fq out=fixed.fq outsingle=single.fq
For paired reads in two files:
repair.sh in1=read1.fq in2=read2.fq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq
You can also repair simple broken interleaving much faster and with less memory, but this will not fix arbitrarily disordered reads, just reads that were interleaved and had some of the reads thrown away:
bbsplitpairs.sh in=reads.fq out=fixed.fq outsingle=single.fq fixinterleavingLast edited by Brian Bushnell; 02-13-2015, 10:31 AM.
Comment
Latest Articles
Collapse
-
by seqadmin
Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.
Nucleic Acid Quality Control
Preparing for NGS starts with isolating the...-
Channel: Articles
02-10-2025, 01:58 PM -
-
by seqadmin
In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...-
Channel: Articles
01-27-2025, 07:46 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Genetic Mapping of Plasmodium knowlesi Identifies Essential Genes and Drug Resistance Mechanisms
by seqadmin
Started by seqadmin, 02-07-2025, 09:30 AM
|
0 responses
54 views
0 likes
|
Last Post
by seqadmin
02-07-2025, 09:30 AM
|
||
Started by seqadmin, 02-05-2025, 10:34 AM
|
0 responses
85 views
0 likes
|
Last Post
by seqadmin
02-05-2025, 10:34 AM
|
||
Started by seqadmin, 02-03-2025, 09:07 AM
|
0 responses
68 views
0 likes
|
Last Post
by seqadmin
02-03-2025, 09:07 AM
|
||
Started by seqadmin, 01-31-2025, 08:31 AM
|
0 responses
44 views
0 likes
|
Last Post
by seqadmin
01-31-2025, 08:31 AM
|
Comment