Header Leaderboard Ad


FastQ Screen: Does your library contain what you think it does?



No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    I am responsible for developing FastQ Screen.

    The standard way to remove contamination is:
    Run FastQ Screen (latest version) with --subset (to process the entire dataset) and --nohits. In the config file include the Bowtie1/2 indices of all the potential contaminants (human genome indices should not be included).

    A FastQ file should then be produced containing all the reads that did not map to any of the contaminants.

    I am wondering, why do you need to only remove the hits that are classified as 'one-hit/one-library' AND 'multiple-hits/one-library'? Also is this single-end data?

    Please feel free to contact me directly to discuss this further.

    Kindest regards,
    Steven W


    • #62
      Thanks Steven

      I apologize for the delayed reply.

      Yes, I am doing similar to what you have mentioned. I have two config files in place:
      - one with just the contaminants for filtering the fastq file as you mentioned
      - and one with contaminants and mammalian genomes to generate a figure using your tool that depicts the level of contamination in comparison to its real hit in a mammal (similar plot to the example on the fast_screen webpage.

      This is single-end data.

      The question on the single-hit was motivated because we have some contaminant-like sequences (custom to our study), that we want to quantify but not remove if they have homologs in mammalian genomes. But again using a similar approach as above, I am able to tackle that (stepwise).

      Thanks for a very useful tool.


      • #63
        Excellent, so everything is fine?

        PS I'm releasing a new version of the software today.


        • #64
          hey, what ever happened to the --paired option? Is it still possible to screen paired end data?


          • #65
            Thanks for your message; I am part of the team responsible for developing FastQ Screen.
            We removed the --paired option from the script in a recent update as we felt it was unnecessary and was causing confusion. Mapping forward or reverse reads independently should be perfectly adequate to ascertain whether there is contamination, and will also provide the user with additional information if the forward reads are more prone generally to contamination than the reverse reads (or vice versa). Also, some users were reporting that the script was sometimes failing to detect contamination in --paired mode. For example, if the read pair did not constitute a contiguous region of DNA, or if the paired reads were separated by are large distance (such as RNA seq).
            So we now recommend that you screen both read files independently.
            Is there any particular reason you would have to use the –paired mode?


            • #66
              FastQ Screen for bisulfite samples


              I'm trying to use fastq screen for bisulfite sequencing samples. I've run the test data and that works fine. However, I get a file handle error when running my bisulfite samples:
              my code:

              PHP Code:
              fastq_screen --bisulfite G3.S22.fastq.gz 
              PHP Code:
              Using fastq_screen v0.11.2
              Defaulting to Bowtie 2 
              for --bisulfite mode
              Reading configuration from 
              Using '/usr/lib/bowtie2/bin/bowtie2' as Bowtie 2 path
              '/data/Bismark/bismark' as Bismark path
              Adding database Daphnia
              Using 8 threads 
              for searches
              --subset set to 100000 reads
              Processing G3
              Counting sequences in G3
              Making reduced sequence file with ratio 69
              Searching G3
              .S22.fastq.gz_temp_subset.fastq against Daphnia
              No such file or directory
              [main_samviewfail to open "/data/Bismark/fastq_screen_v0.11.2/Daphnia.G3.S22.fastq.gz_temp_subset_bismark_bt2.bam" for reading.
              Cannot close filehandle on '/data/Bismark/fastq_screen_v0.11.2/Daphnia.G3.S22.fastq.gz_temp_subset_bismark_bt2.bam' :  at fastq_screen line 1059. 
              I do get the outputfile and mapping report of the subsample against the first database, so it seems that the mapping did work. It happens regardless of the databases I use. However when using my samples in non bisulfite mode, and mapping them against the regular genome indices, this does not happen. So I do not think my sample file is this issue. Also, I know my bismark genome build indices are fine as I used them with bismark as well.

              Any ideas on what is wrong or why this is happening?



              • #67
                FastQ Screen Bisulfite Problem

                Hi jaas,

                I am one of the developers of FastQ Screen. Hopefully we can get this problem resolved quickly.

                Would you be able to send me the configuration file you used when running FastQ Screen. This will help me resolve the problem.

                Many thanks,



                • #68
                  Here's the config file. I can't figure out how to send it to you alone. I have changed the extension to a txt file to be able to upload it.

                  Thanks in advance for your help
                  Attached Files


                  Latest Articles


                  • seqadmin
                    Improved Targeted Sequencing: A Comprehensive Guide to Amplicon Sequencing
                    by seqadmin

                    Amplicon sequencing is a targeted approach that allows researchers to investigate specific regions of the genome. This technique is routinely used in applications such as variant identification, clinical research, and infectious disease surveillance. The amplicon sequencing process begins by designing primers that flank the regions of interest. The DNA sequences are then amplified through PCR (typically multiplex PCR) to produce amplicons complementary to the targets. RNA targets...
                    03-21-2023, 01:49 PM
                  • seqadmin
                    Targeted Sequencing: Choosing Between Hybridization Capture and Amplicon Sequencing
                    by seqadmin

                    Targeted sequencing is an effective way to sequence and analyze specific genomic regions of interest. This method enables researchers to focus their efforts on their desired targets, as opposed to other methods like whole genome sequencing that involve the sequencing of total DNA. Utilizing targeted sequencing is an attractive option for many researchers because it is often faster, more cost-effective, and only generates applicable data. While there are many approaches...
                    03-10-2023, 05:31 AM





                  Topics Statistics Last Post
                  Started by seqadmin, Today, 11:44 AM
                  0 responses
                  Last Post seqadmin  
                  Started by seqadmin, 03-24-2023, 02:45 PM
                  0 responses
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2023, 12:26 PM
                  0 responses
                  Last Post seqadmin  
                  Started by seqadmin, 03-17-2023, 12:32 PM
                  0 responses
                  Last Post seqadmin