Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • @mewu3: Since paired-end reads are aligned together you should use a single "out=output.sam". If you wanted to capture unmapped reads into separate files then you would want to do that as "outu1=R1.unmapped.fq outu2=R2.unmapped.fq".

    You may be able to write them out as an unmapped sam file "outu=unmapped.sam" but then again you should use only one output for that. This is untested.

    Comment


    • Originally posted by HESmith View Post
      "Exception in thread "main" java.lang.AssertionError: Attempting to output paired reads to different sam files."

      Typically, BBMap tools keep paired reads together. You're attempting to write aligned and unaligned reads to separate files, which violates that function.
      Thank you !

      Comment


      • pileup.sh explication

        Hello,

        Can some please kindly explain the output file of pileup.sh ?
        • basecov
        • bincove
        • covstat

        How the coverage is calculated ?

        Comment


        • Finding mapped rate for rpkm output

          Hi Brian,

          I'm running BBMap with the rpkm output option and would like to know how to see mapped rate for each read file. I run multiple files consecutively with nohup. Here's my code:

          bbmap.sh ref=data/Assembly.fasta \
          in1=data/clean/A_1.clean.fq.gz \
          in2=data/clean/A_2.clean.fq.gz \
          rpkm=data/fpkm/A.fpkm \
          t=5 &

          bbmap.sh ref=data/Assembly.fasta \
          in1=data/clean/B_1.clean.fq.gz \
          in2=data/clean/B_2.clean.fq.gz \
          rpkm=data/fpkm/B.fpkm \
          t=5

          etc etc

          So far it's worked really well. However, while the stdout file shows the mapped rates it doesn't tell me which read files relate to which stats. It just has a number of repeated --- Results 1 ---- records and I don't know which is which.

          Is there a flag I need to add to ensure I can see which stats are for which files?

          Thanks!

          Lisa

          Comment


          • Originally posted by lmusgrove View Post
            Hi Brian,

            I'm running BBMap with the rpkm output option and would like to know how to see mapped rate for each read file. I run multiple files consecutively with nohup. Here's my code:

            bbmap.sh ref=data/Assembly.fasta \
            in1=data/clean/A_1.clean.fq.gz \
            in2=data/clean/A_2.clean.fq.gz \
            rpkm=data/fpkm/A.fpkm \
            t=5 &

            bbmap.sh ref=data/Assembly.fasta \
            in1=data/clean/B_1.clean.fq.gz \
            in2=data/clean/B_2.clean.fq.gz \
            rpkm=data/fpkm/B.fpkm \
            t=5

            etc etc

            So far it's worked really well. However, while the stdout file shows the mapped rates it doesn't tell me which read files relate to which stats. It just has a number of repeated --- Results 1 ---- records and I don't know which is which.

            Is there a flag I need to add to ensure I can see which stats are for which files?

            Thanks!

            Lisa
            You should be able to use
            Code:
            covstats=<file>         Per-scaffold coverage info.
            for each of your commands. You can also capture the stderr/out to a file to get statistics. https://linuxize.com/post/bash-redirect-stderr-stdout/

            Comment


            • bbmap ignores minid

              bbmap ignores minid parameter

              This is for version 38.93 and 38.92


              Download BBMap for free. BBMap short read aligner, and other bioinformatic tools. This package includes BBMap, a short read aligner, as well as various other bioinformatic tools. It is written in pure Java, can run on any platform, and has no dependencies other than Java being installed (compiled for Java 6 and higher).
              Last edited by silask; 09-28-2021, 11:44 PM.

              Comment


              • All &quot;N&quot; reads in paired reads not being filtered if other read is &quot;good&quot;

                Version 38.94

                This thread details the "bug":

                Comment


                • Insert mutations in a VCF

                  I see mutate.sh takes a vcf and inserts mutations in a reference genome. I need something a little different. I want to specify a mutation rate and potentially a sample, and insert mutations at that specified rate in the specified sample. Anything in bbmap that might help with this?

                  Comment


                  • You can do it in two steps: 1) use mutate.sh on the reference genome; 2) use randomreads.sh of the mutated reference to generate the data (I assume that's what you mean by sample).

                    Comment


                    • Originally posted by HESmith View Post
                      You can do it in two steps: 1) use mutate.sh on the reference genome; 2) use randomreads.sh of the mutated reference to generate the data (I assume that's what you mean by sample).
                      Not exactly. I literally want to replace the 0/1s or 0/0s or 1/1s in the VCF, not generate reads.

                      Comment


                      • Originally posted by turnersd View Post
                        Not exactly. I literally want to replace the 0/1s or 0/0s or 1/1s in the VCF, not generate reads.
                        Those genotype calls are based on the read data, which are summarized in the INFO fields of the VCF. If you change the genotypes, those field data will be inconsistent.

                        It would be useful if you explained more clearly what you're trying to accomplish.

                        Comment


                        • Originally posted by HESmith View Post
                          Those genotype calls are based on the read data, which are summarized in the INFO fields of the VCF. If you change the genotypes, those field data will be inconsistent.

                          It would be useful if you explained more clearly what you're trying to accomplish.
                          Yes, completely correct about GT not matching up with INFO fields. I'll probably end up removing all the info fields altogether because GT is really the only thing I'm interested in.

                          I've used a published simulation framework to simulate pedigrees having genotype data, and what I'm trying to do is specify a rate at which I want to mutate genotypes, i.e., to mimic allele drop-out or drop-in, each at specified rates. Mimicking a het to hom-ref would be easy if variant alleles were all I cared about (ie just deleting random rows from the VCF), but I _do_ care about genotypes at invariant sites.

                          My plan was to write some custom bash/python/bcftoolsy things to get out genotypes and mutate at a specified rate then stick them back into the VCF. I was just wondering if there was some existing tool to do something like this in bbmap (or elsewhere). I can't seem to find anything.

                          Comment


                          • How is coverage calculated in bbmap?

                            Hi Brian and everyone else,

                            I am using bbmap for my metagenomes. Currently, I am comparing the coverage of different genes across metagenomes. BBmap has the flag -rpkm, which calculates fold coverage, RPKM and FPKM. But I couldn´t find the respective formulas, how exactly it is calculated. When I tried to re-calculate these parameters, I came to different results.
                            Thank you very much in advance! I hope this thread is at the right position in SEQanswers.

                            Best,
                            Franz

                            Comment


                            • Entropy filtering

                              Hello,
                              I am working with shotgun metagenomic sequencing data from gut microbiome samples. We're planning to do taxonomic abundance estimates. For data preprocessing we are going to trim and filter sequences for quality and adapter content with bbduk, as well as remove host sequences with bbsplit. We are also considering an additional entropy filtering step with bbduk after our other preprocessing steps. I was hoping I could ask you for some more information about how this entropy filtering process works, and if you might have a recommendation in our case.
                              • In some of our samples, we have an overrepresentation of G homopolymers. We’re confident that these are technical artifacts from the NextSeq sequencing protocol. I know we can filter these out with an entropy threshold of 0.1. However, I’ve seen in some metagenomic studies they filter out repetitive sequences that are not sequencing artifacts. Would you recommend raising this entropy threshold in our case, and if so to what new value?
                              • On the more technical side of how bbduk implements entropy filtering…
                                • How does the sliding window traverse a read? Is it one base at a time, or does it move by the whole window size?
                                • If a read has a region of low complexity sequences at the beginning/end, are only these sections filtered or is the entire read removed?
                                • How might a read with an internal region of low complexity be treated?

                              Comment


                              • BBmap Error

                                I am getting this error how to solve it.

                                java -ea -Xmx4393m -Xms4393m -cp /home/szweda/bbmap/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 in=PY53_contigs.fasta out=PY53_bbmapped.sam bamscript=bs.sh
                                Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, in=PY53_contigs.fasta, out=PY53_bbmapped.sam, bamscript=bs.sh]
                                Version 38.95

                                Retaining first best site only for ambiguous mappings.
                                Found samtools 1.10
                                Exception in thread "main" java.lang.RuntimeException: Can't find file ref/genome/1/summary.txt
                                at fileIO.ReadWrite.getRawInputStream(ReadWrite.java:933)
                                at fileIO.ReadWrite.getInputStream(ReadWrite.java:898)
                                at fileIO.TextFile.open(TextFile.java:280)
                                at fileIO.TextFile.<init>(TextFile.java:123)
                                at dna.Data.setGenome2(Data.java:823)
                                at dna.Data.setGenome(Data.java:769)
                                at align2.BBMap.loadIndex(BBMap.java:316)
                                at align2.BBMap.main(BBMap.java:32)

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X