Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Brian,

    In this case I'm trying to trim vector and low qual from Sanger (BAC End Seq) reads (~850bp) to run through SSPACE. Will your script have problems with this?

    Comment


    • No, that will work... but generally I would recommend quality-trimming using "qtrim=r trimq=15" or similar, using the vector sequence as the reference with "ktrim=r" or "kmask=N" for vector trimming. "ref=vector kmask=N k=31 edist=1" would, for example, mask all of the vector sequence with Ns, which you could then remove via subsequent quality-trimming on both ends with "qtrim=rl trimq=2" (Ns have quality 0).

      If you just want to remove the first and last 125bp the command would be "ftl=125 ftr2=125".

      Comment


      • Hi Brian:

        I ran your "repair.sh" script yesterday and it has generated only "singletons.fq" file, the "r1.fq" and "r2.fq" files are empty. The command that I ran was the same as you mention on the page 2:

        cat SRR867646_1.fastq SRR867646_2.fastq | repair.sh -Xmx4g in=stdin.fq out1=r1.fq out2=r2.fq outs=singletons.fq
        I ran it with SRR867646's reads which seems to have beem trimmed previously to be uploaded to SRA archive files, and in fact, I am not able to do an alignment because R1 has 10919266 sequences and R2 has 2177589 sequences.

        But... I don't know how to interpret that, why r1.fq and r2.fq files are empty and singletons.fq file not? Should I treat the reads as single-end using the singletons.fq file?

        Also, the singletons.fq file contains 13096855 sequences, the same amount that R1+R2 from the original reads.

        How should I handle that?
        What recommendations do you have?


        Thanks in advance.
        ~g

        Comment


        • @germelcar's questions has been answered over at Biostars.

          Comment


          • Thanks, Genomax!

            Comment


            • BBDuk - Remove singletons?

              Hi Brian,

              I'm using BBDuk to adapter and quality trim Illumina reads that were prepped with the Nextera kit. I'm using the Plugin within Geneious, and I have a few questions:

              1) I'm getting different a number or reads as outputs for pairs. I tried using removeifeitherbad=t, but still get different number of reads in each file. Would this command normally result in an equal number of reads for each file-pair? If so, I'll inquire with Geneious.

              2) When trimming adapters, what is the benefit of selecting only the right or left end? I'm guessing my reads will only have adapter sequence on the 5' end, correct? This assumes that the insert size is longer than the read length, as I think in principal I should have adapters on both ends? Why not just trim both ends to be safe? Does it result in non-specific trimming of sample sequence?

              Next, I intend to pair and merge the reads, and then de novo assemble.

              Thanks for any help.

              Jake

              Comment


              • Hi Jake,

                1) When using paired reads with BBDuk, if the reads are in two files, you must run BBDuk just once using both files as input (using the in1= and in2= flags), rather than on one file at a time. As long as both files are used as input together, pairs will always be kept together.

                2) Adapter-trimming should only be done on the right (5') end for fragment libraries. Left-trimming is only for special circumstances like specific long-mate pair protocols and amplicons with custom inline barcodes. For fragment libraries, the original molecule has adapters on both ends, but reading starts just after the adapter so the reads have no adapter sequence on the left end, and only have adapter sequence on the right end if the insert size was shorter than read length. If you left-trim a fragment library after right-trimming it, nothing will happen except that, as you note, you will get occasional random trimming of genomic sequence, though that will be very rare. Also note that BBDuk can't do left and right trimming simultaneously, as based on how it does trimming (when a reference kmer is found, trim that kmer and everything to the left or right) it would trim all bases in the entire read.

                Comment


                • Originally posted by Brian Bushnell View Post
                  Hi Jake,

                  1) When using paired reads with BBDuk, if the reads are in two files, you must run BBDuk just once using both files as input (using the in1= and in2= flags), rather than on one file at a time. As long as both files are used as input together, pairs will always be kept together.
                  Thanks Brian. I found that the unequal read number following trimming was due to the " were being generated from read length cut-off. I was specifying a minimum read length of 30, but it was removing the read's mate. It could be an issue with Geneious. I can quality- and adapter- trim and get the same read numbers in each file. I'll just dictate read length at a later step.

                  Jake

                  Comment


                  • Hi Jake,

                    It's still important to process both files together even if you have no minimum length cutoff, because the output order of BBDuk is not guaranteed to be the same as the input order (unless you add the "ordered" flag). So, I guess, if Geneious is running BBDuk on paired files individually, please add the "ordered" flag, and report that issue to the Geneious developers - it should process them together.

                    Comment


                    • Hi Brian, I'll report the issue to Geneious. I'm using the "keep order" feature in Geneious. I've attached a screenshot. The check boxes and insert fields that Geneious created are nice for those of us that don't code, but only if they actually do what they say :P

                      Jake

                      Never mind, forums have some pretty stringent rules on picture attachment dimension...painful.

                      Comment


                      • Originally posted by Brian Bushnell View Post
                        Hi Jake,

                        It's still important to process both files together even if you have no minimum length cutoff, because the output order of BBDuk is not guaranteed to be the same as the input order (unless you add the "ordered" flag). So, I guess, if Geneious is running BBDuk on paired files individually, please add the "ordered" flag, and report that issue to the Geneious developers - it should process them together.
                        Hey Brian. I heard back from Geneious. Turns out that I had to pair the reads before running BBDuk. After pairing I'm left with a single file in Geneious, though the reads have not been merged. Running BBDuk on the 'combined' file results in the removal of both pairs, yay!

                        Comment


                        • Hi Brian,

                          I just recently started using BBDuk to adapter and quality trim. I then use the reads to assemble in Spades, but ran into an issue. The error correction software that Spades uses cannot recognize the reads names.

                          The input read format is:
                          M00281:69:000000000-D22HU:1:1101:15164:1363 1:N:0:53

                          The output read format is:
                          M00281:69:000000000-D22HU:1:1101:15164:1363_1:N:0:53

                          The underscore isn't recognized by BWA, which is activated when the "careful" mode is used in Spades. Any way to switch that _ back to a space?

                          Thanks
                          Jake

                          Comment


                          • Hi Jake,

                            BBDuk does not add an underscore to read names. Reformat can, if you run it with the flag "addunderscore", but doesn't by default. Can you list all of the steps you are doing prior to running Spades and verify that the underscores are not present for the BBDuk input?

                            Comment


                            • Hi Brian,

                              You are correct. While the file names were correct in the program interface, upon export the "_" were introduced. I apologize for not opening up the fastq file in a texteditor to be sure.

                              Jake

                              Comment


                              • No problem!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Genetic Variation in Immunogenetics and Antibody Diversity
                                  by seqadmin



                                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                  11-06-2024, 07:24 PM
                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Today, 11:09 AM
                                0 responses
                                22 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, Today, 06:13 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 11-01-2024, 06:09 AM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-30-2024, 05:31 AM
                                0 responses
                                21 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X