Seqanswers Leaderboard Ad



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fastx-Toolkit - Analyze multiple files in directory (Linux/Qiime)

    I am trying to run the fastq_quality_filter on a directory of .fastq files (~500) where each file has a unique name. I simply cannot get it to work on the Linux Command Line within the Qiime VirtualBox. As an output, I am looking to have a new folder with each quality-filtered .fastq file having the same unique name it previously had. I have successfully run the script on a single .fastq file with the following command…

    fastq_quality_filter –Q33 -q 19 -p 89 –i /home/qiime/Desktop/Hilo_New/Mock_community.fastq -o /home/qiime/Desktop/Hilo_New/Mock_community_fqf.fastq

    I would like to use this for all the files in a directory, however, I cannot. Also, doing this for 500 files seems quite daunting. Is there a way to make this happen? Here is the code I have tried…

    fastq_quality_filter –Q33 -q 19 -p 89 –i “/home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join/*.fastq” -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/

    and I receive the following error…

    fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =

    The file format is just fine, as it works with an individual file. I just can't seem to figure it out.

    Finally, I would like to do the same directory analysis for the FastQ Artifacts Filter

    Any help would be much appreciated. Thanks.

  • #2
    Try this :
    for i in `ls -1 /home/qiime/Desktop/Hilo_New/*.fastq | sed 's/.fastq//'`; do fastq_quality_filter –Q33 -q 19 -p 89 –i /home/qiime/Desktop/Hilo_New/$i.fastq -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/$i\_fqf.fastq; done
    Make sure /home/qiime/Desktop/Hilo_New/fastX-filterFastq is pre-created before you run this code.

    Edit: This does not work with Fastx_toolkit. A more modern program like BBMap handles this fine. Example in post #12 below.
    Last edited by GenoMax; 08-02-2017, 04:02 AM.


    • #3
      Thanks. Quick follow up... should I be pasting the entirety of the code as written? The coding language I don't understand is the "for i in 'ls -1" etc.

      Is this all one command?



      • #4
        This is small bash script which is using a for loop.

        "ls -1 /home/qiime/Desktop/Hilo_New/*.fastq | sed 's/.fastq//" - Takes the listing of files in your source directory (one at a time), removes the .fastq on the end of the file name using stream editor called "sed" (for reason mentioned below) and then assigns the first part of file name to a "variable" called i.

        Variable i is then used to construct the fastx command line (one file at a time). For -i option we are adding the ".fastq" back on the variable i so the original file name is recreated. For -o (output file name) we are appending "_fqf.fastq" (which is what you had in your example) to make up a new file name while retaining the sample name (which is being saved in new output directory).

        This process will iterate until all files in source directory are processed.


        • #5
          Okay, thanks for the clarification. Now I'm getting the following when I put in the two commands...

          fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =

          I'm not sure why this is happening since I can push a single file through the filter...


          • #6
            What two command are you referring to? I only have a single command line up there. I have not recently used fastx toolkit and I assume your command line is correct? Have you made sure your fastq files are in the correct format?


            • #7
              Nice command GenoMax - think I'm learning something about loops! . But couldn't you just do something like this:

              Move into the Hilo_New folder, create a new folder called "fastX-filterFastq", then do:

              for i in *.fastq; do fastq_quality_filter –Q33 -q 19 -p 89 –i $i -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

              I was also under the impression that these bash loops didn't iterate until the current job was complete, so didn't need a 'sleep' call? Could be wrong there..


              • #8
                I just loaded the fastQValidator and pulled a subsample out of my original folder. I checked 43 .fastq files with fastQValidator and they all passed with the following result...

                qiime@qiime-190-virtual-box:~/fastQValidator$ /home/qiime/fastQValidator/bin/profile/fastQValidator --file /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/MockComm_fastqjoin.join.fastq

                Finished processing /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/MockComm_fastqjoin.join.fastq with 58084 lines containing 14521 sequences.
                There were a total of 0 errors.
                Returning: 0 : FASTQ_SUCCESS

                I also checked the header of my files and they all seem to look good
                qiime@qiime-190-virtual-box:~$ head /home/qiime/Desktop/Hilo2_fastqjoin.join.fastq
                @M01498:340:000000000-B86MB:1:1101:22408:1708 1:N:0:2
                @M01498:340:000000000-B86MB:1:1101:9873:2268 1:N:0:2
                Then I made a folder to analyze only these 43 files with the command you suggested and it just hangs and does nothing for a while. When I press enter again the same error comes out...

                fastq_quality_filter: input file (-) has unknown file format (not FASTA or FASTQ), first character =

                I was getting frustrated so I just tried a single file again to make sure it worked and it doesn't. I am getting the same error as above... I guess I don't know what's going on at all.
                Last edited by GenoMax; 08-02-2017, 03:48 AM.


                • #9
                  neavemj - same story with the code you posted. Thanks.


                  • #10
                    If you can see something wrong, here is the .fastq file. I couldn't figure out how to post it, so just delete .pdf from the end (I think that will work).
                    Attached Files


                    • #11
                      Huh, not sure. It seems like it's ignoring the -i flag and waiting for something from the STDIN. You could try giving it an opened file instead of the input flag, like so:

                      for i in *.fastq; do cat $i | fastq_quality_filter –Q33 -q 19 -p 89 -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

                      Something like that might work. I haven't tested it though.


                      Last edited by GenoMax; 08-02-2017, 04:14 AM.


                      • #12
                        It looks like fastx_toolkit does not want to behave like a normal unix program.

                        @ j.cappellazzi: You can use @neavemj's suggestion above if you want to stick with fastx_toolkit. Otherwise, I suggest that you use from BBMap suite (a more current program) for this.

                        for i in `ls -1 *.fq | sed 's/.fq//'`; do qin=33 qtrim=r trimq=19 in=$i.fq out=$i\_fqf.fq; done
                        @neavemj: Multiple ways to skin the cat. You are right that we didn't need the sleep option.
                        Last edited by GenoMax; 08-02-2017, 04:16 AM.


                        • #13
                          @Genomax @neavemj

                          Well, that was frustrating and silly. It wasn't recognizing the -i because it was a long "-" not a short one. Must have been from copying from word into the command line. WOW!

                          Now it works just fine with individual files again (must have mucked that up as well while copying last night), however, there is still a problem with @Genomax loop script. I entered...

                          for i in `ls -1 /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/*.fastq | sed 's/.fastq//'`; do fastq_quality_filter -Q33 -q 19 -p 89 -i /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/$i.fastq -o /home/qiime/Desktop/Hilo_New/fastX-filterFastq/$i\_fqf.fastq; done

                          There is an output folder created and empty, patiently awaiting 43 new files, however, this is the response I receive...

                          fastq_quality_filter: failed to open input file '/home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled//home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends-join_labeled/MockComm_fastqjoin.join.fastq.fastq': No such file or directory

                          It does this for every single file in the directory. The issue I can see is that "MockComm_fastqjoin.join.fastq.fastq" is not a file, as the input file will have only one ".fastq" in the name. I tried playing around with the script but honestly don't understand all the details and other errors kept popping up.

                          Any further help on this would be greatly appreciated. Thanks.


                          • #14

                            I also tried the code you suggested after getting into the proper directory at the command line and fixing the "-i" issue...

                            qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled$ for i in *.fastq; do fastq_quality_filter -Q33 -q 19 -p 89 -i $i -o ./fastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

                            I made a folder in the following directory titled "FastX-filterFastq"


                            It seemed to begin working but for each of my 43 .fastq files in that folder it gave me the following error message...

                            fastq_quality_filter: Failed to create output file (./fastX-filterFastq/MockComm_fastqjoin.join_fqf.fastq): No such file or directory

                            Thanks for any further help on this.


                            • #15
                              It worked!

                              Well, now I understand my coding limitations. I didn't know the "./" meant I needed to provide the path to the output folder. I messed around with the code and did...

                              qiime@qiime-190-virtual-box:~/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled$ for i in *.fastq; do fastq_quality_filter -Q33 -q 19 -p 89 -i $i -o /home/qiime/Desktop/Hilo_New/Hilo_join_paired_ends_fastq-join_labeled/FastX-filterFastq/$(basename $i .fastq)_fqf.fastq; done

                              It worked like a charm. I frequently run into problems like this, when it's just my lack of coding knowledge that makes even the simplest tasks frustrating. Thank you so much for creating that code. I truly appreciate it.


                              Latest Articles


                              • seqadmin
                                The Impact of AI in Genomic Medicine
                                by seqadmin

                                Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                                02-26-2024, 02:07 PM
                              • seqadmin
                                Multiomics Techniques Advancing Disease Research
                                by seqadmin

                                New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

                                A major leap in the field has
                                02-08-2024, 06:33 AM





                              Topics Statistics Last Post
                              Started by seqadmin, 02-28-2024, 06:12 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 02-23-2024, 04:11 PM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 02-21-2024, 08:52 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 02-20-2024, 08:57 AM
                              0 responses
                              Last Post seqadmin