Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Originally posted by Brian Bushnell View Post
    Can you post the top 16 or so lines from the input file?

    The expected input is something like this:

    @blahA /1
    ACGT
    +
    ????
    @blahA /2
    ACGT
    +
    ????
    @blahB /2
    ACGT
    +
    ????
    @blahC /1
    ACGT
    +
    ????
    @blahC /2
    ACGT
    +
    ????


    In this case, "blahA /1" and "blahA /2" would be output as a pair, as would "blahC /1" and "blahC /2", while "blahB /2" would be output to singletons.

    If the reads don't contain a " /1" and " /2" or a " 1:" and a " 2:", it won't work.
    I see, so the trimmed files have to be interleaved prior to running this pairing program? I was trying to test it out, but to be honest this makes it kind of difficult to work with if you have to run a script or format the data beforehand. I'll try to test this and report back but I don't have time to keep working on this right now. Thanks.
    Last edited by SES; 02-24-2014, 08:28 PM.

    Comment


    • #32
      Originally posted by SES View Post
      I also struggled to get that script to work and I was a little frustrated once I did get it working. Mainly because it uses a lot of memory, as someone commented previously, but also it strips the pair information off the output and creates hardcoded file names.

      I ended up writing my own tool for pairing reads called Pairfq. The problem I kept running into is that most approaches assume 4 line Fastq as input and the sequence name has to be in a certain format. That means you have to come up with different ways to solve this simple task if you are using Fasta or your sequence names are a little different. It was my aim to try and solve these problems.

      Here is an example of the usage:

      Code:
      $ pairfq makepairs -f s_1_1_trimmed.fq \
      -r s_1_2_trimmed.fq \
      -fp s_1_1_trimmed_p.fq \
      -rp s_1_2_trimmed_p.fq \
      -fs s_1_1_trimmed_s.fq \
      -rs s_1_2_trimmed_s.fq
      My observations are that the above command uses about 43% as much memory as the Python script listed above in the thread. This command is a bit slower because it is not making any assumptions about the format (see below). It is also possible to specify that an index should be used. For example,

      Code:
      $ pairfq makepairs -f s_1_1_trimmed.fq \
      -r s_1_2_trimmed.fq \
      -fp s_1_1_trimmed_p.fq \
      -rp s_1_2_trimmed_p.fq \
      -fs s_1_1_trimmed_s.fq \
      -rs s_1_2_trimmed_s.fq \
      --index
      This will result in almost no memory being used (15 MB RAM actually). The execution will be much slower with this option, but this is the only method to my knowledge that can handle pairing really large sequence sets without a big memory machine.

      The input can be Fasta or Fastq, compressed (with gzip or bzip2) or uncompressed, and the sequence identifiers can be in Casava 1.4 or 1.8+ format as explained on the project wiki (note that pairing the reads is just one of the functions of Pairfq). The outputs are separate files of paired and unpaired forward and reverse reads (which can be optionally compressed).

      Hopefully, this will save you some time and help to avoid crafting custom shell commands for this task.
      SES,

      I am trying to install dependencies. I could not find the version of Berekely DB you listed with tar -xzvf db-5.1.19.tar.gz so I installed the next closest one of db-5.1.29.tar.gz.

      However, when I run the perl MakeFile.PL I get the following:
      perl Makefile.PL
      WARNING: MIN_PERL_VERSION is not a known parameter.
      WARNING: CONFIGURE_REQUIRES is not a known parameter.
      WARNING: BUILD_REQUIRES is not a known parameter.
      WARNING: LICENSE is not a known parameter.
      Checking if your kit is complete...
      Looks good
      Warning: prerequisite BerkeleyDB 0.54 not found.
      Warning: prerequisite IPC::System::Simple 1.21 not found.
      Warning: prerequisite List::MoreUtils 0.33 not found.
      'BUILD_REQUIRES' is not a known MakeMaker parameter name.
      'CONFIGURE_REQUIRES' is not a known MakeMaker parameter name.
      'LICENSE' is not a known MakeMaker parameter name.
      'MIN_PERL_VERSION' is not a known MakeMaker parameter name.
      Writing Makefile for bin/pairfq

      I still have to install the IPC::System::Simple 1.21 and the List::MoreUtils 0.33 as I did not know these were dependencies until I ran the file, but is it not finding the BerkeleyDB 0.54 because I have an updated version?

      Comment


      • #33
        Hi Smiller85, The immediate problem is that your version of ExtUtils::MakeMaker is too old to recognize those parameters. From what I can tell, those features were added to EU::MM version 6.48, which is about 6 years old. You can check your version with this command:

        Code:
        perl -MExtUtils::MakeMaker -e 'print ExtUtils::MakeMaker->VERSION'
        Thanks for noting this, I have never actually seen these warnings and I filed an issue about this on the project site. That should be a quick fix. For now, please run the same command above, but replace "ExtUtils::MakeMaker" with "BerkeleyDB" so I can see what is happening on your system. I don't think you have BerekelyDB installed. To be clear, you need the database backend and the Perl bindings, and the message is saying you don't have the Perl package (called "BerkeleyDB") installed. I wouldn't try to do this manually, do it through the CPAN shell, or better yet, use cpanminus and it will install all the deps for you. Also, please run "perl -v" so I can see what version of Perl you have.

        Let me know if you have any other questions. Feel free to send me an email, or post an issue on the project site.
        Last edited by SES; 03-18-2014, 09:03 AM.

        Comment


        • #34
          SES. Right after I sent you the error I noticed the perl version requirement. my version is 5.8.8. Also, looks like you are right about the ExtUtils::MakeMaker being too old. My version is 6.30.

          I ran the perl -MExtUtils::MakeMaker -e 'print BerkeleyDB->VERSION' I did not get any info

          With the BerkelyDB I had to install it manually because the server does not recognize the cpanminus. I downloaded the db-5.1.29.tar.gz and did the tar command. I then did the following commands to install it:
          ..dist/configure prefix=/home/smiller/blast/bin/pipeline-work/db-5.1.29/build_unix
          make
          make install

          I also figured that maybe since pairfq is in its own folder home/smiller/blast/bin/pipeline-work/pairfq that maybe that is where I went wrong, but then I also noticed my outdated version of perl, and now from the other code the MakeMaker is outdated.

          My school is currently on Spring Break, so I don't know how quick of a response I will get from the administrator on updating things like perl and the ExtUtils::MakeMaker.

          Comment


          • #35
            Originally posted by smiller85 View Post
            SES. Right after I sent you the error I noticed the perl version requirement. my version is 5.8.8. Also, looks like you are right about the ExtUtils::MakeMaker being too old. My version is 6.30.

            I ran the perl -MExtUtils::MakeMaker -e 'print BerkeleyDB->VERSION' I did not get any info

            With the BerkelyDB I had to install it manually because the server does not recognize the cpanminus. I downloaded the db-5.1.29.tar.gz and did the tar command. I then did the following commands to install it:
            ..dist/configure prefix=/home/smiller/blast/bin/pipeline-work/db-5.1.29/build_unix
            make
            make install

            I also figured that maybe since pairfq is in its own folder home/smiller/blast/bin/pipeline-work/pairfq that maybe that is where I went wrong, but then I also noticed my outdated version of perl, and now from the other code the MakeMaker is outdated.

            My school is currently on Spring Break, so I don't know how quick of a response I will get from the administrator on updating things like perl and the ExtUtils::MakeMaker.
            Thanks for the response. The version of EUMM you have is not even on CPAN anymore, meaning it is quite old and not supported. Though, I did add a check for this to solve that issue. Also, Perl version 5.10 or greater is required at this time, sorry about that (it is documented at least, under the installation instructions). This version was first released in 2007 but I know a lot of people are stuck with really old systems in academia (I know because I am). I will think about incorporating changes to allow older versions but that creates other problems. By the way, your command above is not quite correct (you were specifying two different modules). If you want to see if a module is installed, just try:
            Code:
            perl -MBerkeleyDB -e 1
            and if it prints nothing, it is installed. If it prints "Can't locate ... in @INC ..." then the module is not installed.

            Let me know if you are able to get help from your Sys Admin. I could make a version with no requirements if this is an issue, and that may serve most use cases. Though, my original goal was to solve the problem of having to pair hundreds of millions of reads and removing the deps would not solve that issue with the current design.

            Comment


            • #36
              Originally posted by smiller85 View Post
              SES. Right after I sent you the error I noticed the perl version requirement. my version is 5.8.8. Also, looks like you are right about the ExtUtils::MakeMaker being too old. My version is 6.30.
              This should not be a problem anymore because I have created a standalone script (called "pairfq_lite.pl") that has no dependencies and I have tested it with Perl 5.6.2. If this is still of interest, you may want to try this script that is now part of Pairfq. I should note that this has fewer features, mainly no indexing function, but it will still handle FASTA/FASTQ and compressed or uncompressed data. The only real limitation will be memory if you have very large read sets and little RAM available on your computer. In that case, it would be worthwhile to install the one dependency of the main application and then try to install as before. Let me know if anything is unclear or if any issues arise.
              Last edited by SES; 03-20-2014, 12:40 PM.

              Comment


              • #37
                Hi everybody, to help your discussion I can just give as an advice to NOT USE fastx_toolkit for pair end library.
                According to the authors, this tool was done for SHORT MOLECULE only. (e.g. shorter than 50 bp or 100 bp depending on your sequencer read length)
                FASTQ/A short-reads pre-processing tools

                Comment


                • #38
                  Wrong message...see below
                  Last edited by ericaramos; 04-24-2014, 11:25 AM.

                  Comment


                  • #39
                    Hi Carmen,
                    I'm facing the same problem when running the script. Did you received any answer about your problem?
                    If yes, could you share with us?

                    Thanks!

                    Comment


                    • #40
                      Originally posted by ericaramos View Post
                      Hi Carmen,
                      I'm facing the same problem when running the script. Did you received any answer about your problem?
                      If yes, could you share with us?

                      Thanks!
                      If you look through the discussion above you can see that a number of people had similar issues, and this script doesn't appear to be maintained. I think the best solution may be to find another approach unless you want to work on that shell/python code.

                      Did you try the tool Pairfq that was mentioned in the thread above? I'd be happy to help with this if you run into any issues. We can help with the other approach as well, but it is hard to see what the issue is and it's also a challenge to keep code updated on a forum such as this.

                      Comment


                      • #41
                        Originally posted by carmeyeii View Post
                        Dear btmb,
                        I'm afraid I still cannot run it. Sorry to keep bothering?

                        I have corrected tabs and spaces to avoid getting the Unexpected indent Error,

                        but now I get:



                        Thanks again for any help,

                        Carmen
                        Originally posted by SES View Post
                        If you look through the discussion above you can see that a number of people had similar issues, and this script doesn't appear to be maintained. I think the best solution may be to find another approach unless you want to work on that shell/python code.

                        Did you try the tool Pairfq that was mentioned in the thread above? I'd be happy to help with this if you run into any issues. We can help with the other approach as well, but it is hard to see what the issue is and it's also a challenge to keep code updated on a forum such as this.



                        ...................................................................................................................................Ok, I didn't try using Pairfq, but I will.

                        Thank you for the answer!

                        Comment


                        • #42
                          Originally posted by SES View Post
                          If you look through the discussion above you can see that a number of people had similar issues, and this script doesn't appear to be maintained. I think the best solution may be to find another approach unless you want to work on that shell/python code.

                          Did you try the tool Pairfq that was mentioned in the thread above? I'd be happy to help with this if you run into any issues. We can help with the other approach as well, but it is hard to see what the issue is and it's also a challenge to keep code updated on a forum such as this.
                          --------------------------------------------------------------------------------------------------------------------------------------
                          Pairfq worked pretty well!! Thank you!

                          Comment


                          • #43
                            After removing the adapters from cutadapt i got unsymmetrical pair end file so I want to know the script that could remove the orphan reads and make the data symmetric although I made it using hash but its very slow.The above mention script is showing error..

                            Comment


                            • #44
                              Originally posted by ranu1 View Post
                              After removing the adapters from cutadapt i got unsymmetrical pair end file so I want to know the script that could remove the orphan reads and make the data symmetric although I made it using hash but its very slow.The above mention script is showing error..
                              We will need some more details in order to help. For example, which script are you referring to, the Python script mentioned on the first page of this thread? If that is the script you are attempting to use, I don't think you'll be able to get it working without some code changes, as mentioned above.

                              Also, what do you mean when you say the script is showing error? It is not possible to know what the issue is based on that information alone.

                              Comment


                              • #45
                                BBTools has a tool to quickly re-pair arbitrarily disordered reads based on their names.

                                For interleaved reads:

                                repair.sh in=reads.fq out=fixed.fq outsingle=single.fq

                                For paired reads in two files:

                                repair.sh in1=read1.fq in2=read2.fq out1=fixed1.fq out2=fixed2.fq outsingle=single.fq

                                You can also repair simple broken interleaving much faster and with less memory, but this will not fix arbitrarily disordered reads, just reads that were interleaved and had some of the reads thrown away:

                                bbsplitpairs.sh in=reads.fq out=fixed.fq outsingle=single.fq fixinterleaving
                                Last edited by Brian Bushnell; 02-13-2015, 10:31 AM.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM
                                • seqadmin
                                  Techniques and Challenges in Conservation Genomics
                                  by seqadmin



                                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                  Avian Conservation
                                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                  03-08-2024, 10:41 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 06:37 PM
                                0 responses
                                11 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Yesterday, 06:07 PM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-22-2024, 10:03 AM
                                0 responses
                                51 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 03-21-2024, 07:32 AM
                                0 responses
                                68 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X