Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi to everyone.

    I have an issue with bbduk.sh and I,m looking for some help. At this moment I'm calculating the percentage of reads with a quality equal or higher than 30. Using bbduk.sh, I'm running this command:

    ```../bbmap/bbduk.sh in1=../t01-R1.fastq in2=../t01-R2.fastq trimq=30 qtrim=rl ````

    and the output looks like:

    ````

    Max memory cannot be determined. Attempting to use 1400 MB.
    If this fails, please add the -Xmx flag (e.g. -Xmx24g) to your command,
    or run this program qsubbed or from a qlogin session on Genepool, or set ulimit to an appropriate value.
    java -ea -Xmx1400m -Xms1400m -cp /Users/rodolfochavez/RNASeq/bbmap/current/ jgi.BBDuk in1=../t01-R1.fastq in2=../t01-R2.fastq trimq=30 qtrim=rl
    Executing jgi.BBDuk [in1=../t01-R1.fastq, in2=../t01-R2.fastq, trimq=30, qtrim=rl]
    Version 38.59

    No output stream specified. To write to stdout, please specify 'out=stdout.fq' or similar.
    0.019 seconds.
    Initial:
    Memory: max=1468m, total=1468m, free=1439m, used=29m

    Input is being processed as paired
    Processing time: 2.250 seconds.

    Input: 3000000 reads 225505684 bases.
    QTrimmed: 2167934 reads (72.26%) 96158343 bases (42.64%)
    Total Removed: 444188 reads (14.81%) 96158343 bases (42.64%)
    Result: 2555812 reads (85.19%) 129347341 bases (57.36%)

    Time: 2.252 seconds.
    Reads Processed: 3000k 1331.91k reads/sec
    Bases Processed: 225m 100.12m bases/sec

    ```
    And the result of the analysis is printed in the screen of my terminal. My question is, how can I save the statistics Results from this analysis in a file?

    Thanks a lot

    Comment


    • You can simply capture the STDERR output (where this is bring written) to a file.

      Comment


      • GenoMax,

        Thanks a lot, you solved my issue!

        Comment


        • I've made a mistake that I did not use certain version of BBduk.
          Last edited by pilyric; 11-15-2019, 08:27 PM. Reason: Problem solved

          Comment


          • I would like to extract paired reads with exact matches to a list of 25bp kmers and bbduk seemed like a good tool to use. However, bbduk seems to be returning some 'outm' reads that don't have matches to the kmer list.

            I've attached a file containing my bbduk command, a single pair of reads and a reference fasta containing the 2 kmers (including reverse complemented sequences) that the bbduk stats file reports as matching.

            Using grep to search the input reads for each kmer sequence produces no matches. Please let me know if I have an incorrect bbduk option set.
            Attached Files

            Comment


            • Good afternoon Brian and GenoMax,

              Hope you two are doing well all things considered

              I have many metagenomes to assemble these days and am trying to make fewer and longer scaffolds and reduce wall clock time by reducing errors in my 150x2 reads.

              The following commands result in an aqhist distribution that bothers me. It has a gradual incline from Q17-33 (from ~0 to 3.5 million reads), an obvious dip around Q35 (down to ~0.5 million reads), and a rapid rise to Q38 (~10 million reads). My Phred linear base quality (qhist) seems good with a gradually decline from ~38 to ~34 and a small up swing to ~35.5 at the end.

              #Cleaning commands:
              #Paths removed for simplicity.

              #Force Trim Poor Quality Ends
              bbduk.sh -Xmx1g in=input.fastq out=trim151.fq ftr=149
              bbduk.sh -Xmx1g in=trim151.fq out=ftrimmed.fq ftl=7

              #Kmer Trim Adapters
              bbduk.sh -Xmx1g in=ftrimmed.fq out=ktrimmed.fq ktrim=r k=23 mink=11 hdist=1 ref=truseq.fa.gz tbo tpe

              #Quality Trimming
              /usr/local/usrapps/bioinfo/env_jmw/bbtools/lib/bbduk.sh -Xmx1g in=ktrimmed.fq out=qtrimmed.fq minlen=100 qtrim=rl trimq=10

              #Contaminant Removal
              bbduk.sh -Xmx1g in=qtrimmed.fq out=bbclean.fq k=31 ref=sequencing_artifacts.fa.gz, phix_adapters.fa.gz, phix174_ill.ref.fa.gz

              Questions:
              1) Why is this happening?
              2) Should it be fixed?
              3) How can it be fixed (suggested command please)?

              I greatly appreciate your help and any additional comments for improvement.

              Thank you,
              Jason

              Comment


              • Good morning,

                It seems that the aqhist for my raw fastq file has the same dip at Q35. So, please disregard the previous questions. If you could tell me if peaks and dips are common or not uncommon, and if you have any insights about why the read quality scores have peaks and dips, I'd really appreciate it.

                Thank you,
                Jason

                Comment


                • When quality trimming raw reads to a phred score there is often conflicting advice surrounding to what Q score one should filter too.

                  I have read many of your post and replies and from what I gather there really is no right answer as it depends on what you want to do.

                  Say for example I have some illumina short reads and some pacbio long reads that I will to do the following with:
                  1. Hybrid assembly
                  2. Illumina only de novo assembly
                  3. Pacbio only de novo assembly


                  I have been previously told that a score of Q30 is highly typical to use however I have read that anything above 27 is just unnecessary and potentially damaging to generating good assemblies. On the bbduk help pages on Seqanswers Q10 is used in every example.

                  Thus is Q10 a good general base point for quality trimming to provide decent assemblies or is this also to high?

                  How one one work out the above on their own without needing to come here to answers?

                  Secondly, should we trim pacbio reads in either a hybrid assembly or pacbio de novo assembly?

                  Comment


                  • What exactly "artifacts" mean in bbduk?

                    Comment


                    • Hello everyone,

                      I've installed bbmap throw 'sudo apt-get install bbmap' and also by downloading the package, untaring it and adding the path to the directory in the .bashrc file. However, when running "bbduk.sh --version" or a script "bbduk.sh -Xmx ..." I get the following error:

                      /usr/bin/bbduk.sh: línea 344: /usr/share/bbmap/calcmem: No existe el archivo o el directorio
                      /usr/bin/bbduk.sh: línea 345: setEnvironment: orden no encontrada
                      /usr/bin/bbduk.sh: línea 346: parseXmx: orden no encontrada
                      /usr/bin/bbduk.sh: línea 350: freeRam: orden no encontrada
                      java -Xmxm -Xmsm -cp /usr/share/java/bbmap.jar jgi.BBDuk --version
                      Invalid maximum heap size: -Xmxm
                      Error: Could not create the Java Virtual Machine.
                      Error: A fatal exception has occurred. Program will exit.

                      I've tried -Xmx with different amounts of memory and receive the same error. Also, the folder /calcmem does indeed do not exist. Any ideas on how to solve this problem?

                      Thank you!

                      Comment


                      • BBDUK trimming and poor quality fastqc

                        Hi

                        I am very new to bioinformatics and I am attempting to trim reads using BBDUK. My libraries were prepared using the lexogen quantseq 3' mRNA fwd kit.

                        According to their recommendation I have performed the following:

                        runID*R1_001.fastq; do cat $i | bbduk.sh in=stdin.fq out=${i}_trimmed_clean ref=/data/resources/polyA.fa.gz,/data/resources/truseq.fa.gz k=13 ktrim=r useshortkmers=t mink=5 qtrim=t trimq=10 minlength=20; done

                        However my reads quality on fastqc has not improved much. Do I need to go ahead and remove illumina sequences? Does anyone happen to know the specific illumina adapter sequence?

                        Thank you.

                        Comment


                        • Adaptor trimming is not working

                          This is what my script looks like:

                          Ordered=t #Set to true to output reads in same order as input
                          Ktrim=r #once a reference kmer is matched in a read, that kmer and all the bases to the right will be trimmed
                          K=21 #specifies the kmer size
                          Mink=8 #"mink" allows it to use shorter kmers at the ends of the read
                          Hdist=2 #number of permitted mismatches


                          for Prefix in `ls -1 *_R1.fastq.gz | sed 's/_R1.fastq.gz//'`
                          do

                          bbduk.sh -Xmx128g in1=$Prefix\_R1.fastq.gz in2=$Prefix\_R2.fastq.gz out1=$Prefix\_clean_R1.fastq.gz out2=$Prefix\_clean_R2.fastq.gz ref=$adapters ordered=$Ordered ktrim=$Ktrim k=$K mink=$Mink hdist=$Hdist tpe tbo

                          done

                          My R1 and R2 files consist of concatenated sequences from different runs. Do you think this could be the reason?

                          Many thanks

                          Comment


                          • I have repeated Brian Bushnell's comparison among adaptor trimmers, this time including the most recent versions of Cutadapt, Trimmomatic, and fastp. Here are my commands:

                            Code:
                            # Cutadapt 3.4:
                            time cutadapt -m 21 -j 0 -b "file:gruseq.fa" -B "file:gruseq.fa" -o cutadapt_R1.fq.gz -p cutadapt_R2.fq.gz dirty_R1.fq.gz dirty_R2.fq.gz
                            
                            # Trimmomatic 0.39:
                            time trimmomatic PE -phred33 dirty_R1.fq.gz dirty_R2.fq.gz trimmomatic_R1.fq.gz trimmomatic_U1.fq.gz trimmomatic_R2.fq.gz trimmomatic_U2.fq.gz ILLUMINACLIP:gruseq.fa:2:28:10:2:keepBothReads MINLEN:21
                            
                            # fastp 0.22.0:
                            time fastp -w 8 -Q -l 21 --adapter_fasta gruseq.fa --detect_adapter_for_pe --in1 dirty_R1.fq.gz --in2 dirty_R2.fq.gz --out1 fastp_R1.fq.gz --out2 fastp_R2.fq.gz
                            
                            # bbduk.sh 38.92:
                            time bbduk.sh in=dirty_R#.fq.gz out=bbduk_R#.fq.gz ref=gruseq.fa ktrim=r mink=12 hdist=1 minlen=21 tpe tbo
                            
                            # bbduk.sh 38.92 (x2):
                            time bbduk.sh ktrim=r minlength=21 interleaved=f tpe tbo ref=gruseq.fa in=dirty_R#.fq.gz out=stdout.fq k=21 mink=11 hdist=2 | bbduk.sh ktrim=r minlength=21 interleaved=f tpe tbo ref=gruseq.fa in=stdin.fq out=bbduk_x2_R#.fq.gz k=19 mink=9 hdist=1
                            And these were the results:
                            No code has to be inserted here.I still prefer bbdu.sh for its speed and high accuracy. fastp had slightly higher accuracy but it sometimes mistakes genomic sequence for adaptor (see this post). However Cutadapt now is clearly more accurate (but the slowest by far). I wonder if anybody can recommend some settings that could increase bbduk's accuracy a little more?

                            Comment


                            • bbduk hdist=0 does not work

                              Hi

                              I want to search CCGG in reads, so I tried following command


                              bbduk.sh in=test.fq literal=CCGG k=4 hdist=0 overwrite=t out=test.result.fq

                              in test.fq, only one read, no CCGG

                              @A00582:707:HJ3NTDSX2:2:1101:5267:1658 1:N:0:AACCTC
                              TATTAAGCAGAAGGGCAGGCTGGAAAATCCTCTTCAGCAGAACGGTGGACTGAGGCTCACTGCTATCAAGGTGGACAGGCTTCTCTGCTCAGCAAACCAGGCTCACCCAGGGGTGCTCTACACAGACTCGGGCTCGCTAA
                              +
                              FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

                              but , result this read is matched

                              BBDuk version 37.62
                              Initial:
                              Memory: max=210091m, free=200226m, used=9865m

                              Added 1 kmers; time: 0.151 seconds.
                              Memory: max=210091m, free=192553m, used=17538m

                              Input is being processed as unpaired
                              Started output streams: 0.269 seconds.
                              Processing time: 0.116 seconds.

                              Input: 1 reads 140 bases.
                              Contaminants: 1 reads (100.00%) 140 bases (100.00%)
                              Total Removed: 1 reads (100.00%) 140 bases (100.00%)
                              Result: 0 reads (0.00%) 0 bases (0.00%)

                              Time: 0.564 seconds.
                              Reads Processed: 1 0.00k reads/sec
                              Bases Processed: 140 0.00m bases/sec

                              it seems hdist does not work

                              Could you give some advice

                              Comment


                              • @lituan you will need to use k=2 or less with such a short pattern (CCGG). Even then it may not work well.

                                This may be a case where you would want to use a different package called Seqkit. Specific tool would be "seqkit grep".

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Genetic Variation in Immunogenetics and Antibody Diversity
                                  by seqadmin



                                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                  11-06-2024, 07:24 PM
                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 11-08-2024, 11:09 AM
                                0 responses
                                222 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-08-2024, 06:13 AM
                                0 responses
                                163 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-01-2024, 06:09 AM
                                0 responses
                                80 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-30-2024, 05:31 AM
                                0 responses
                                27 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X