Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by JVGen View Post
    Thanks Brian,

    I took the de novo approach to try to eliminate contaminating sequences. We're all working on HIV, and it's highly mutated: it's hard to remove contaminants by DNA sequence alone. However, since I know the approximate size of my PCR amplicons (it's different in each case because of DNA deletions), I know what size my contig should be after de novo assembly. By using stringent assembly parameters, that require ~30 bp overlap with maybe 1 mismatch, I can force contaminants to be assembled into a separate contig. I can then extract consensus sequences from each of these contigs, and filter them so that only contigs greater than ~150 bp are used in a subsequent map to reference, which should remove any minor contaminants that were present. I can then extract the consensus from this alignment, which should represent the original PCR amplicon - any contaminants that might have made it into my contig should be lost as they are outnumber by the 100s.

    Does this workflow make sense for my application? I'm working on a desktop Mac, so I have limited options. I've been told that Spades might be a better assembler for me, but I think I'd need to purchase a server. With the little coding experience I have, I'm a bit nervous to invest the money, lest we never get the thing to work.

    Are you available for hire as a consultant?

    Thanks,
    Jake
    Hi Jake,

    It might be helpful in this case if you could generate a kmer frequency histogram to see whether the contaminant and non-contaminant sequence is easily separable by depth alone. If so, there are a couple of easy ways to remove it. You can generate a kmer-frequency histogram with kmercountexact or BBNorm; just attach the text file to this thread. Normally I look at it in a log-log plot.

    What assembler are you currently using, by the way? I've had poor results with Spades on viruses, and better results with Tadpole. But this was raw viral sequence and amplicon sequence may give different results.

    As for consultancy, I've sent you a pm.

    -Brian

    Comment


    • Originally posted by moistplus
      I've ran pileup for contig coverage estimation after assembly.

      The output is :

      ID Avg_fold Length Ref_GC Base_Coverage Read_GC

      1) The coverage is the Avg_fold right ?
      Yes, that's correct.

      2) If yes, I have some negative or 0 coverage ... How can it be possible ?
      Zero coverage is certainly possible, since mapping and assembly results can differ. Negative coverage should be impossible. Can you send me the output file, and post your exact command line?

      Thanks,
      Brian

      Comment


      • Indeed, I see 3 entries with negative coverage, which should not happen. However, v32.15 was released about 2.5 years ago and there have been thousands of changes since then, including hundreds of bug fixes. Could you try the latest version and see if that fixes the problem?

        Comment


        • Originally posted by moistplus
          With the new version, I don't have anymore negative coverage!

          Another question:
          Using this command :



          in1 and in2 are Forward reads and Reverse reads right ? So it keeps the mated pair at the end ?
          It should do that.

          Comment


          • Yes, it's random.

            Comment


            • But can it output chimerically mapped reads to a separate output file, like STAR can? There's no mention of chimeric reads in the user guide, so I'm not sure if it's even suitable for that case.

              Comment


              • No, BBMap does not output chimerically-mapped reads to a separate file, though it can be used to separate properly-paired reads from improper (possibly chimeric) pairs.

                Comment


                • Originally posted by Brian Bushnell View Post
                  No, BBMap does not output chimerically-mapped reads to a separate file, though it can be used to separate properly-paired reads from improper (possibly chimeric) pairs.
                  Is that functionality on "feature request" list?

                  Comment


                  • Originally posted by GenoMax View Post
                    Is that functionality on "feature request" list?
                    Oh, very well, I'll add it to the list

                    Comment


                    • Originally posted by Brian Bushnell View Post
                      Oh, very well, I'll add it to the list
                      Hi Brian,

                      I am trying to use BBMap to align 150 bp paired end reads to a 10 kb reference. The reference is an HIV genome, and my sequencing input is PCR amplified HIV proviruses (means I get lots of coverage).

                      I use BBDuk to adapter and quality trim my reads. I then used BBNorm to normalize coverage to ~150. Then I used BBMap to map the reads to the reference.

                      Deletions in HIV proviruses are common, and I noticed that BBMap seems to get hung up near the deletions. For instance, if a read spans the deletion, BBMap doesn't seem to insert a gap into the read so that it aligns on the other size of the deletion. I've attached some pictures. First pic is the full alignment, second 5' of the deletion, third 3' of the deletion. Note that the "CTGAGGGGACAGAT" sequence is present on the reads on the 5' side of the deletion, and should extend to the 3' side. In this case, most of these reads were trimmed so that the consensus would reflect the correct deletion, but I do worry that this might not always be the case.

                      Are there any settings to adjust this to allow the reads to span the deletion?

                      Thanks,
                      Jake





                      Comment


                      • @JVGen: Are you using default alignment settings for bbmap? You may need to adjust maxindel (which defaults to 16000) and intronlen settings.

                        Comment


                        • Originally posted by GenoMax View Post
                          @JVGen: Are you using default alignment settings for bbmap? You may need to adjust maxindel (which defaults to 16000) and intronlen settings.
                          Hi GenoMax,

                          The entirety of my reference is only 9000 bp, so I think the default maxindel size is appropriate. What does intronlen do?

                          Thanks!
                          JV

                          Comment


                          • The default maxindel should be fine in this case. If there are reads spanning the deletion, they will be mapped spanning the deletion. I'm not familiar with the viewer you are using... perhaps you could try IGV?

                            "intronlen=10" will, for example, replace "D" (deletion) symbols in cigar strings with "N" (skipped) symbols, for deletions of at least 10bp in length. Some programs and viewers prefer N over D for whatever reason. I consider them equivalent. But, it's possible the viewer you are using does not properly display reads with "D" symbols in the cigar string, so using IGV or remapping with the "intronlen=10" flag might be helpful. Or, if you send me the sam file and reference I can look at it.

                            most of these reads were trimmed so that the consensus would reflect the correct deletion
                            I'm not sure what you mean by that - can you clarify? Also, can you give the exact command you used for BBMap? If you use the "local" flag, long deletions might get erased.

                            I've honestly never heard a concern before that BBMap was unwilling to map reads spanning long deletions - only the opposite. In your last picture, it looks to me like all of the reads are mapped with a long deletion extending off the screen to the left; or am I misinterpreting it?

                            Comment


                            • Originally posted by Brian Bushnell View Post
                              The default maxindel should be fine in this case. If there are reads spanning the deletion, they will be mapped spanning the deletion. I'm not familiar with the viewer you are using... perhaps you could try IGV?
                              Hi Brian, that's Geneious I'm viewing in. I download and tried using IGV, but I get an error when trying to load up the SAM file. I shared the SAM file with you on google drive. I included the trimmed, normalized, unassembled reads and the reference as a separate FASTA as well, just in case.

                              Originally posted by Brian Bushnell View Post
                              "intronlen=10" will, for example, replace "D" (deletion) symbols in cigar strings with "N" (skipped) symbols, for deletions of at least 10bp in length. Some programs and viewers prefer N over D for whatever reason. I consider them equivalent. But, it's possible the viewer you are using does not properly display reads with "D" symbols in the cigar string, so using IGV or remapping with the "intronlen=10" flag might be helpful. Or, if you send me the sam file and reference I can look at it.
                              This could be, but they replace no coverage with gaps, so I don't know what the program is doing behind the scenes (if it's logging D's or N's). I will try repeating with this flag and see if it changes the outcome.

                              Originally posted by Brian Bushnell View Post
                              I'm not sure what you mean by that - can you clarify? Also, can you give the exact command you used for BBMap? If you use the "local" flag, long deletions might get erased.
                              I'm running it in Geneious with the following parameters (in picture). I'm going to try ticking the "discard trimmed regions", though I think it is unnecessary, because I don't think BBDuk keeps trimmed information (which is how I trimmed the reads). Dissolve contigs is redundant - no contigs have yet been assembled. Quirk of the program.



                              Originally posted by Brian Bushnell View Post
                              I've honestly never heard a concern before that BBMap was unwilling to map reads spanning long deletions - only the opposite. In your last picture, it looks to me like all of the reads are mapped with a long deletion extending off the screen to the left; or am I misinterpreting it?
                              There is a ~3.5kb deletion on the 3' end of the HIV genome. The HIV reference sequence is depicted in the faded yellow box. Reads assembled to the reference are depicted below as black rectangles (which, when I zoom in, show their sequence). A coverage map is shown above the reference in red. A consensus sequence for the assembled reads is provided above the coverage map. Within the consensus, black represents a mismatch to the reference (this many mismatches is not uncommon for HIV, as it's a retrovirus and reverse transcription introduces many mutations).

                              Thanks for any help!

                              JV

                              Comment


                              • Originally posted by JVGen View Post
                                Hi Brian, that's Geneious I'm viewing in. I download and tried using IGV, but I get an error when trying to load up the SAM file.
                                IGV needs a sorted, indexed bam file. It won't accept sam.

                                I shared the SAM file with you on google drive. I included the trimmed, normalized, unassembled reads and the reference as a separate FASTA as well, just in case.
                                Please send me the links, and I'll look at them.

                                I'm running it in Geneious with the following parameters (in picture). I'm going to try ticking the "discard trimmed regions", though I think it is unnecessary, because I don't think BBDuk keeps trimmed information (which is how I trimmed the reads). Dissolve contigs is redundant - no contigs have yet been assembled. Quirk of the program.
                                Hmmm, I'm not really sure what Geneous is doing behind the scenes here with regards to trimming, but it doesn't look like it would have any kind of effect that would suppress long deletions.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                30 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 09:45 AM
                                0 responses
                                201 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-03-2024, 08:54 AM
                                0 responses
                                212 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-02-2024, 03:00 PM
                                0 responses
                                194 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X