Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • darthsequencer
    replied
    Originally posted by Brian Bushnell View Post
    Ahh... sorry, but nope That would require reading the file twice, so it would not be easy to implement.
    Oh yeah - duh

    Leave a comment:


  • Brian Bushnell
    replied
    Ahh... sorry, but nope That would require reading the file twice, so it would not be easy to implement.

    Leave a comment:


  • darthsequencer
    replied
    Originally posted by Brian Bushnell View Post
    Actually, while it's not documented, the flag "minlength" also works with BBMap. Reads shorter than that will be discarded completely (they won't be output as unmapped).
    Thanks for the tip - save me some space so I don't have to make additional files. I deal with a lot of libraries with different read lengths.

    Is there a way to make BBMap require the smallest read to be some fraction of the longest read length? I know that's a niche use but BBMap always suprises me with it's built in functions.

    Leave a comment:


  • Brian Bushnell
    replied
    Actually, while it's not documented, the flag "minlength" also works with BBMap. Reads shorter than that will be discarded completely (they won't be output as unmapped).

    Leave a comment:


  • GenoMax
    replied
    You could pre-filter input data with
    Code:
    reformat.sh in=file.fq out=filt.fq minlength=N

    Leave a comment:


  • darthsequencer
    replied
    Hi Brian,
    Is there a way to set a minimum input fragment length when mapping with BBMap?

    Leave a comment:


  • Brian Bushnell
    replied
    To do that, you'd need to trim all soft-clipped bases. I don't have any programs that will do so, but it looks like I could add it to Reformat without too much difficulty.

    Leave a comment:


  • jweger1988
    replied
    Hi Brian,

    Is there anyway to use bbmap (or any other of your tools) to map a read to a reference file and then trim anything to the left of the reference sequence?

    For example

    My reference is
    XXXXXXXXXXXXXXXXXX

    And my reads are

    NNNNXXXXXXXXXXXXXXXXXX
    NNNXXXXXXXXXXXXXXXXXX
    NNXXXXXXXXXXXXXXXXXX

    I want them to be
    XXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXX
    XXXXXXXXXXXXXXXXXX

    I basically want to just trim anything of the left of the reads that doesn't match my reference? Thanks in advance.

    Leave a comment:


  • sdriscoll
    replied
    Oh yes! Thanks Brian.

    Leave a comment:


  • Brian Bushnell
    replied
    You can use the flag "intronlen" to control this. For example, "intronlen=100" will change the reporting so that all deletions of at least 100bp will reported using 'N' instead of 'D' in the cigar string.

    Leave a comment:


  • sdriscoll
    replied
    I was testing something with BBMap today and I realized I had forgotten something about how to use it and couldn't figure it out. I was mapping RNA-Seq and I thought that it would report spliced alignment cigars (using sam 1.3) as /[0-9]+M[0-9]+N[0-9]+M/ but it was reporting the introns as deletions with the [D] cigar value. Is that right or is there a way to get it to not do that?

    Thanks-

    Leave a comment:


  • Brian Bushnell
    replied
    Hi Jweger,

    Sorry, I don't have anything that will do that. Clumpify allows something sort of similar; you can create consensus sequence from raw reads, and map those. But that loses per-base depth information which is important for variant-calling, so I don't think it's what you want.

    Leave a comment:


  • jweger1988
    replied
    Hi Brian,

    I'm wondering if there is any way you can get bbmap (or another of your tools) to give a consensus sequence of an alignment? I went to map to a ref sequence and then use that to create a consensus sequence and then use that to map again and then call variants.

    Thanks!

    Leave a comment:


  • Kristian
    replied
    Hi Brian,

    Originally posted by Brian Bushnell View Post
    Is this a Nextera long-mate pair library?
    This is Nextera XT prep, and the adapters/linker were trimmed along with demultplexing directly through Illumina RTA pipeline. On my end the bbduk2 left and right trimming had almost no effect, and the splitnextera also had only a few hits (0.1%) to the adapter, so I'd guess the dataset is (for the most part) trimmed.
    Last edited by Kristian; 04-18-2017, 05:21 AM.

    Leave a comment:


  • Brian Bushnell
    replied
    Hi Kristian,

    Is this a Nextera long-mate pair library? Those need special processing before they can be mapped. Or... can you give me any more information about the library construction, and the trimming methodology? The library has an extremely high error rate (particularly with read 2), less than half of the reads map to the mito, and it appears that both adapters and transposase are still present.... also, I'm measuring the median insert size as 159 (BBMap) or 133 (BBMerge), so there are a lot of pairs with insert size shorter than the sequenced read length; those might be displayed differently in IGV depending on whether the adapter portion was soft-clipped (which bwa would do by default) or not (bbmap does not soft-clip by default).

    I adapter-trimmed the reads and error-corrected them, but still under 50% map. I'm not really sure what's wrong with the library. But, I don't see anything unusual about the pairing orientations. I get 45.5670% properly paired with "rcs=f" (require correct strand = false) and 45.5481% with "rcs=t", so only 0.02% map in the wrong orientation.
    Last edited by Brian Bushnell; 04-17-2017, 02:38 PM.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Working...
X