Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Brian,

    In this case I'm trying to trim vector and low qual from Sanger (BAC End Seq) reads (~850bp) to run through SSPACE. Will your script have problems with this?

    Comment


    • No, that will work... but generally I would recommend quality-trimming using "qtrim=r trimq=15" or similar, using the vector sequence as the reference with "ktrim=r" or "kmask=N" for vector trimming. "ref=vector kmask=N k=31 edist=1" would, for example, mask all of the vector sequence with Ns, which you could then remove via subsequent quality-trimming on both ends with "qtrim=rl trimq=2" (Ns have quality 0).

      If you just want to remove the first and last 125bp the command would be "ftl=125 ftr2=125".

      Comment


      • Hi Brian:

        I ran your "repair.sh" script yesterday and it has generated only "singletons.fq" file, the "r1.fq" and "r2.fq" files are empty. The command that I ran was the same as you mention on the page 2:

        cat SRR867646_1.fastq SRR867646_2.fastq | repair.sh -Xmx4g in=stdin.fq out1=r1.fq out2=r2.fq outs=singletons.fq
        I ran it with SRR867646's reads which seems to have beem trimmed previously to be uploaded to SRA archive files, and in fact, I am not able to do an alignment because R1 has 10919266 sequences and R2 has 2177589 sequences.

        But... I don't know how to interpret that, why r1.fq and r2.fq files are empty and singletons.fq file not? Should I treat the reads as single-end using the singletons.fq file?

        Also, the singletons.fq file contains 13096855 sequences, the same amount that R1+R2 from the original reads.

        How should I handle that?
        What recommendations do you have?


        Thanks in advance.
        ~g

        Comment


        • @germelcar's questions has been answered over at Biostars.

          Comment


          • Thanks, Genomax!

            Comment


            • BBDuk - Remove singletons?

              Hi Brian,

              I'm using BBDuk to adapter and quality trim Illumina reads that were prepped with the Nextera kit. I'm using the Plugin within Geneious, and I have a few questions:

              1) I'm getting different a number or reads as outputs for pairs. I tried using removeifeitherbad=t, but still get different number of reads in each file. Would this command normally result in an equal number of reads for each file-pair? If so, I'll inquire with Geneious.

              2) When trimming adapters, what is the benefit of selecting only the right or left end? I'm guessing my reads will only have adapter sequence on the 5' end, correct? This assumes that the insert size is longer than the read length, as I think in principal I should have adapters on both ends? Why not just trim both ends to be safe? Does it result in non-specific trimming of sample sequence?

              Next, I intend to pair and merge the reads, and then de novo assemble.

              Thanks for any help.

              Jake

              Comment


              • Hi Jake,

                1) When using paired reads with BBDuk, if the reads are in two files, you must run BBDuk just once using both files as input (using the in1= and in2= flags), rather than on one file at a time. As long as both files are used as input together, pairs will always be kept together.

                2) Adapter-trimming should only be done on the right (5') end for fragment libraries. Left-trimming is only for special circumstances like specific long-mate pair protocols and amplicons with custom inline barcodes. For fragment libraries, the original molecule has adapters on both ends, but reading starts just after the adapter so the reads have no adapter sequence on the left end, and only have adapter sequence on the right end if the insert size was shorter than read length. If you left-trim a fragment library after right-trimming it, nothing will happen except that, as you note, you will get occasional random trimming of genomic sequence, though that will be very rare. Also note that BBDuk can't do left and right trimming simultaneously, as based on how it does trimming (when a reference kmer is found, trim that kmer and everything to the left or right) it would trim all bases in the entire read.

                Comment


                • Originally posted by Brian Bushnell View Post
                  Hi Jake,

                  1) When using paired reads with BBDuk, if the reads are in two files, you must run BBDuk just once using both files as input (using the in1= and in2= flags), rather than on one file at a time. As long as both files are used as input together, pairs will always be kept together.
                  Thanks Brian. I found that the unequal read number following trimming was due to the " were being generated from read length cut-off. I was specifying a minimum read length of 30, but it was removing the read's mate. It could be an issue with Geneious. I can quality- and adapter- trim and get the same read numbers in each file. I'll just dictate read length at a later step.

                  Jake

                  Comment


                  • Hi Jake,

                    It's still important to process both files together even if you have no minimum length cutoff, because the output order of BBDuk is not guaranteed to be the same as the input order (unless you add the "ordered" flag). So, I guess, if Geneious is running BBDuk on paired files individually, please add the "ordered" flag, and report that issue to the Geneious developers - it should process them together.

                    Comment


                    • Hi Brian, I'll report the issue to Geneious. I'm using the "keep order" feature in Geneious. I've attached a screenshot. The check boxes and insert fields that Geneious created are nice for those of us that don't code, but only if they actually do what they say :P

                      Jake

                      Never mind, forums have some pretty stringent rules on picture attachment dimension...painful.

                      Comment


                      • Originally posted by Brian Bushnell View Post
                        Hi Jake,

                        It's still important to process both files together even if you have no minimum length cutoff, because the output order of BBDuk is not guaranteed to be the same as the input order (unless you add the "ordered" flag). So, I guess, if Geneious is running BBDuk on paired files individually, please add the "ordered" flag, and report that issue to the Geneious developers - it should process them together.
                        Hey Brian. I heard back from Geneious. Turns out that I had to pair the reads before running BBDuk. After pairing I'm left with a single file in Geneious, though the reads have not been merged. Running BBDuk on the 'combined' file results in the removal of both pairs, yay!

                        Comment


                        • Hi Brian,

                          I just recently started using BBDuk to adapter and quality trim. I then use the reads to assemble in Spades, but ran into an issue. The error correction software that Spades uses cannot recognize the reads names.

                          The input read format is:
                          M00281:69:000000000-D22HU:1:1101:15164:1363 1:N:0:53

                          The output read format is:
                          M00281:69:000000000-D22HU:1:1101:15164:1363_1:N:0:53

                          The underscore isn't recognized by BWA, which is activated when the "careful" mode is used in Spades. Any way to switch that _ back to a space?

                          Thanks
                          Jake

                          Comment


                          • Hi Jake,

                            BBDuk does not add an underscore to read names. Reformat can, if you run it with the flag "addunderscore", but doesn't by default. Can you list all of the steps you are doing prior to running Spades and verify that the underscores are not present for the BBDuk input?

                            Comment


                            • Hi Brian,

                              You are correct. While the file names were correct in the program interface, upon export the "_" were introduced. I apologize for not opening up the fastq file in a texteditor to be sure.

                              Jake

                              Comment


                              • No problem!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM
                                • seqadmin
                                  Strategies for Sequencing Challenging Samples
                                  by seqadmin


                                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                  03-22-2024, 06:39 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                23 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                24 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 09:21 AM
                                0 responses
                                21 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-04-2024, 09:00 AM
                                0 responses
                                52 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X