Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastx-Toolkit - Analyze multiple files in directory (Linux/Qiime)

    I am trying to run the fastq_quality_filter on a directory of .fastq files (~500) where each file has a unique name. I simply cannot get it to work on the Linux Command Line within the Qiime VirtualBox. As an output, I am looking to have a new folder with each quality-filtered .fastq file having the same unique name it previously had. I have successfully run the script on a single .fastq file with the following command…

    fastq_quality_filter –Q33 -q 19 -p 89 –i /home/qiime/Desktop/Hilo_New/Mock_community.fastq -o /home/qiime/Desktop/Hilo_New/Mock_community_fqf.fastq

    I would like to use this for all the files in a directory, however, I cannot. Also, doing this for 500 files seems quite daunting. Is there a way to make this happen? Here is the code I have tried…

    fastq_quality_filter –Q33 -q 19 -p 89 –i “/home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join/*.fastq” -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/

    and I receive the following error…

    fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =
    (10)

    The file format is just fine, as it works with an individual file. I just can't seem to figure it out.

    Finally, I would like to do the same directory analysis for the FastQ Artifacts Filter

    Any help would be much appreciated. Thanks.

  • #2
    Try this :
    Code:
    for i in `ls -1 /home/qiime/Desktop/Hilo_New/*.fastq | sed 's/.fastq//'`; do fastq_quality_filter –Q33 -q 19 -p 89 –i /home/qiime/Desktop/Hilo_New/$i.fastq -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/$i\_fqf.fastq; done
    Make sure /home/qiime/Desktop/Hilo_New/fastX-filterFastq is pre-created before you run this code.

    Edit: This does not work with Fastx_toolkit. A more modern program like BBMap handles this fine. Example in post #12 below.
    Last edited by GenoMax; 08-02-2017, 04:02 AM.

    Comment


    • #3
      Thanks. Quick follow up... should I be pasting the entirety of the code as written? The coding language I don't understand is the "for i in 'ls -1" etc.

      Is this all one command?

      Thanks.

      Comment


      • #4
        This is small bash script which is using a for loop.

        "ls -1 /home/qiime/Desktop/Hilo_New/*.fastq | sed 's/.fastq//" - Takes the listing of files in your source directory (one at a time), removes the .fastq on the end of the file name using stream editor called "sed" (for reason mentioned below) and then assigns the first part of file name to a "variable" called i.

        Variable i is then used to construct the fastx command line (one file at a time). For -i option we are adding the ".fastq" back on the variable i so the original file name is recreated. For -o (output file name) we are appending "_fqf.fastq" (which is what you had in your example) to make up a new file name while retaining the sample name (which is being saved in new output directory).

        This process will iterate until all files in source directory are processed.

        Comment


        • #5
          Okay, thanks for the clarification. Now I'm getting the following when I put in the two commands...

          fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =
          (10)

          I'm not sure why this is happening since I can push a single file through the filter...

          Comment


          • #6
            What two command are you referring to? I only have a single command line up there. I have not recently used fastx toolkit and I assume your command line is correct? Have you made sure your fastq files are in the correct format?

            Comment


            • #7
              Nice command GenoMax - think I'm learning something about loops! . But couldn't you just do something like this:

              Move into the Hilo_New folder, create a new folder called "fastX-filterFastq", then do:

              for i in *.fastq; do fastq_quality_filter –Q33 -q 19 -p 89 –i $i -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

              I was also under the impression that these bash loops didn't iterate until the current job was complete, so didn't need a 'sleep' call? Could be wrong there..

              Comment


              • #8
                I just loaded the fastQValidator and pulled a subsample out of my original folder. I checked 43 .fastq files with fastQValidator and they all passed with the following result...

                qiime@qiime-190-virtual-box:~/fastQValidator$ /home/qiime/fastQValidator/bin/profile/fastQValidator --file /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/MockComm_fastqjoin.join.fastq

                Finished processing /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/MockComm_fastqjoin.join.fastq with 58084 lines containing 14521 sequences.
                There were a total of 0 errors.
                Returning: 0 : FASTQ_SUCCESS

                I also checked the header of my files and they all seem to look good
                Code:
                qiime@qiime-190-virtual-box:~$ head /home/qiime/Desktop/Hilo2_fastqjoin.join.fastq
                @M01498:340:000000000-B86MB:1:1101:22408:1708 1:N:0:2
                GTGAATCATCAAATTTTTGAACGCACCTTGCGCTCTCTGGTATTCCGGAGAGCACGTCTGTCTGAGTGTCGCTTTACTCTCAACGACCGAGTTTTTGTTAACTCGGGAGTTGGATCTTGAGCGCTGCCGGGTTCCTTGGGATCGTTGGCTCGCTTTAAAAGCTCGGATTGTGTCTTCGAGGTCGTTAATCCTAGTCGACGTGTAATTAGATTTATCGTTGGCGTTACGGAGGCCTCTTAACGGACCTTTCTCCCCTATCGTGCTCTTTAGGAGTGCAACTTTTGAACTTTTGACCTCAGATCAGTCGGGACTACCCGCTGAACTTAAGCATATCAATAAGCGGAGGA
                +
                CCCCCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGGEFGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGDGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGCCCCC
                @M01498:340:000000000-B86MB:1:1101:9873:2268 1:N:0:2
                GTGAATCATCAAATCTTTGAACGCACCTTGCGCTCTCTGGTATTCCGGAGAGCACGTCTGTCTGAGTGTCGCTTTACTCT
                Then I made a folder to analyze only these 43 files with the command you suggested and it just hangs and does nothing for a while. When I press enter again the same error comes out...

                fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =
                (10)

                I was getting frustrated so I just tried a single file again to make sure it worked and it doesn't. I am getting the same error as above... I guess I don't know what's going on at all.
                Last edited by GenoMax; 08-02-2017, 03:48 AM.

                Comment


                • #9
                  neavemj - same story with the code you posted. Thanks.

                  Comment


                  • #10
                    If you can see something wrong, here is the .fastq file. I couldn't figure out how to post it, so just delete .pdf from the end (I think that will work).
                    Attached Files

                    Comment


                    • #11
                      Huh, not sure. It seems like it's ignoring the -i flag and waiting for something from the STDIN. You could try giving it an opened file instead of the input flag, like so:

                      Code:
                      for i in *.fastq; do cat $i | fastq_quality_filter –Q33 -q 19 -p 89 -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

                      Something like that might work. I haven't tested it though.

                      Cheers,

                      Matt.
                      Last edited by GenoMax; 08-02-2017, 04:14 AM.

                      Comment


                      • #12
                        It looks like fastx_toolkit does not want to behave like a normal unix program.

                        @ j.cappellazzi: You can use @neavemj's suggestion above if you want to stick with fastx_toolkit. Otherwise, I suggest that you use bbduk.sh from BBMap suite (a more current program) for this.

                        Code:
                        for i in `ls -1 *.fq | sed 's/.fq//'`; do bbduk.sh qin=33 qtrim=r trimq=19 in=$i.fq out=$i\_fqf.fq; done
                        @neavemj: Multiple ways to skin the cat. You are right that we didn't need the sleep option.
                        Last edited by GenoMax; 08-02-2017, 04:16 AM.

                        Comment


                        • #13
                          @Genomax @neavemj

                          Well, that was frustrating and silly. It wasn't recognizing the -i because it was a long "-" not a short one. Must have been from copying from word into the command line. WOW!

                          Now it works just fine with individual files again (must have mucked that up as well while copying last night), however, there is still a problem with @Genomax loop script. I entered...

                          for i in `ls -1 /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/*.fastq | sed 's/.fastq//'`; do fastq_quality_filter -Q33 -q 19 -p 89 -i /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/$i.fastq -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/$i\_fqf.fastq; done

                          There is an output folder created and empty, patiently awaiting 43 new files, however, this is the response I receive...

                          fastq_quality_filter: failed to open input file '/home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled//home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends-join_labeled/MockComm_fastqjoin.join.fastq.fastq': No such file or directory

                          It does this for every single file in the directory. The issue I can see is that "MockComm_fastqjoin.join.fastq.fastq" is not a file, as the input file will have only one ".fastq" in the name. I tried playing around with the script but honestly don't understand all the details and other errors kept popping up.

                          Any further help on this would be greatly appreciated. Thanks.

                          Comment


                          • #14
                            @neavemj

                            I also tried the code you suggested after getting into the proper directory at the command line and fixing the "-i" issue...

                            qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled$ for i in *.fastq; do fastq_quality_filter -Q33 -q 19 -p 89 -i $i -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

                            I made a folder in the following directory titled "FastX-filterFastq"

                            qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled

                            It seemed to begin working but for each of my 43 .fastq files in that folder it gave me the following error message...

                            fastq_quality_filter: Failed to create output file (./fastX-filterFastq/MockComm_fastqjoin.join_fqf.fastq): No such file or directory

                            Thanks for any further help on this.

                            Comment


                            • #15
                              It worked!

                              Well, now I understand my coding limitations. I didn't know the "./" meant I needed to provide the path to the output folder. I messed around with the code and did...

                              qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled$ for i in *.fastq; do fastq_quality_filter -Q33 -q 19 -p 89 -i $i -o /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/FastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

                              It worked like a charm. I frequently run into problems like this, when it's just my lack of coding knowledge that makes even the simplest tasks frustrating. Thank you so much for creating that code. I truly appreciate it.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Non-Coding RNA Research and Technologies
                                by seqadmin




                                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                Nobel Prize for MicroRNA Discovery
                                This week,...
                                10-07-2024, 08:07 AM
                              • seqadmin
                                Recent Developments in Metagenomics
                                by seqadmin





                                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                09-23-2024, 06:35 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 02:44 PM
                              0 responses
                              7 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-11-2024, 06:55 AM
                              0 responses
                              14 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-02-2024, 04:51 AM
                              0 responses
                              110 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-01-2024, 07:10 AM
                              0 responses
                              117 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X