BBMap (aligner for DNA/RNAseq) is now open-source and available for download.

mcmc replied

01-16-2018, 03:58 PM
Originally posted by sdriscoll View Post

mcmc- what are you setting the idfilter value to? Because reads still take a hit to their alignment score for soft-clipping. That is to say a read that's aligned with no mismatches but it was soft-clipped should have a lower score than a read that was mapped perfectly without soft-clipping. This strategy doesn't make sense when you're read is being soft-clipped due to adapter sequence at the end of the read...but for biological purposes it does make sense. Maybe a more appropriate question should be how to get rid of those "couple bases of adapter remaining" after trimming them.

yes I know the scores may be different... but I'm simply wondering whether the idfilter and subfilter post-filtering options work on the soft-clipped alignments or the non-clipped alignments. I want to be able to filter with, say, 2 MM or 98% ID *after* the soft clipping.

Thanks,
MCMC
Leave a comment:
sdriscoll replied

01-11-2018, 03:53 PM
mcmc- what are you setting the idfilter value to? Because reads still take a hit to their alignment score for soft-clipping. That is to say a read that's aligned with no mismatches but it was soft-clipped should have a lower score than a read that was mapped perfectly without soft-clipping. This strategy doesn't make sense when you're read is being soft-clipped due to adapter sequence at the end of the read...but for biological purposes it does make sense. Maybe a more appropriate question should be how to get rid of those "couple bases of adapter remaining" after trimming them.
Leave a comment:
mcmc replied

01-11-2018, 01:48 PM
bbmap local + idfilter

Brian et al., I am using the 'local' option with bbmap because I sometimes have a couple bases of adapter remaining (after bbduk mink=6) and my understanding is that 'local' will trim the ends to achieve a better score.

My question: does the idfilter option work on the local or global alignment? I have reads with 100% ID after trimming ends but they are not passing the idfilter. I'm curious whether there is another way to filter based on the local alignment percent id?

thanks!
MCMC
Leave a comment:
jweger1988 replied

12-05-2017, 07:30 AM
Hi Brian,

I'm using BBtools to call indels from small RNA virus genomes using both bbmap and callvariants. It's doing a great job of calling these. We are interested in the possibility of also calling inversion events from this data. I know this is somewhat common in transcriptomics data. Do these programs have this functionality?

To be more specific, is it possible to identify reads that are aligning to both the sense and antisense of the given reference?

Thanks in advance!

James
Leave a comment:
santiagorevale replied

11-07-2017, 10:06 AM
So, I've spoken with 10X support. For now they are not doing or planning to do any filtering step on the rawdata. However, thanks to my email they've opened a software feature request. Running "cellranger count" without the Index file will just skip the metric on that file, nothing else.

In summary, I'm so sorry for making you waste your time! If appropriate, we could even delete all this posts because we have spoken more about 10X than bbduk.

Thanks GenoMax for your feedback! Cheers!
Leave a comment:
santiagorevale replied

11-06-2017, 04:31 AM
Yes, that's how I'm running mkfastq. The sequencing is currently producing 2x75 bp, but by running mkfastq with the above mention cycle pattern: Read 1 is being trimmed to 26 bp, and for Read 2, we are only getting 75 bp, which is still enough data for doing alignment. Even though this is not the recommended way of sequencing this 10X libraries, 10X support team tell us they had other customers doing it this way with good results.

Regarding the N thing, that's not the case. These high N proportion reads are not being filtered at all by the first step (not by the bcl2fastq nor by the mkfastq wrapper).

I have contacted 10X and they are saying that this reads are not being filtered, but they are not being mapped also thus downstream analysis is not being affected, which is something we already knew.

However, they told me that for v2 chemistry the Index read is not being used any more after the demultiplexing step. I replied them asking how are they calculating "Q30 Bases in Sample Index" metric then without that file. In the meanwhile, I'm running a test so I'll be back later with both the results and the reply. So lets see. If this works, then I won't be needing any change on bbduk. Lets hope for the best!

Cheers!
Leave a comment:
GenoMax replied

11-03-2017, 09:08 AM
So how are you doing the base calling then?

10x recommends that you run 26x8x98 but you are running this as a 75x8x75 run instead? If that is the case my recommendation above would cut the read 1 down to 26 as expected by 10x. The index read is correct length and the last read, which should be 98 bp is only 75 in your case. Is this how you are running cellranger mkfq?

With above base mask you should get reads that don't have any N's. Or so I would think.
Leave a comment:
santiagorevale replied

11-03-2017, 09:02 AM
Oh, sorry if I was not clear enough. Those are the sequencing cycles not how I'm doing the base calling.

I've been working on 10X projects for several months now and evaluating them in different platforms. That's why I also included the additional question above.
Leave a comment:
GenoMax replied

11-03-2017, 08:51 AM
Yikes! You are certainly doing something completely off-label here :-)

In the cellranger mkfastq run, you could use a base-mask such as --use-bases-mask=Y26n*,I8,Y*. That should produce reads in the format 10x wants. If this is a 2D, 4000 run then you may have to do --use-bases-mask=Y26n*,I8,n*,Y*, Your second read is going to be short by about 20 bases. I am not sure if the software will like that.

@Brian would have to comment on your original request. For him, being a programmer, anything would be possible to implement.
Leave a comment:
santiagorevale replied

11-03-2017, 08:35 AM
Regarding the issue that produces the Ns, it's not supported by them, because it occurs when sequencing on a HiSeq 4000 using 75,8,0,75 bp cycle pattern instead of their recommended 26,8,98 bp pattern.

Regarding the N filtering (or any other quality filtering) I agree with you that they should be doing it in advance and incorporate it into their pipeline. I know that so far it's not being done and I have just sent them an email about it.

I'll see what their answer is and be back to you with it. Let's see what their feedback is.

In the meantime, do you think this would be something complicated to implement? If not as a permanent thing on bbduk, could you walk me on how to make it work for me? Because I was unable to match your scripts performance (I tried coding things in perl, python and bash).

Thanks for this quick replies!
Leave a comment:
GenoMax replied

11-03-2017, 08:26 AM
While I have not worked with cellranger (I have only used longranger for WGS data) this sounds like an odd behavior. 10x should be accounting for presence of N's in the reads in their software.

If you have not talked with their software tech support then I would suggest that you give that a try. They may have some specific suggestions or will need to implement a fix in their software. Support has been responsive to me in past.
Leave a comment:
santiagorevale replied

11-03-2017, 08:21 AM
cellranger pipeline is splitted in two steps: "cellranger mkfastq" which produces the three above mentioned files (R1, R2, I1) and "cellranger count" which uses this 3 files as input.

The reason why I need to do the filtering first is that some platforms fail sometimes when calling reads from the TOP surface of the flowcell for Read 2, resulting in reads which will be fully or partly composed by Ns. Then, "cellranger count" will use all of the reads (including those that are 100% Ns) to do their calculations, ending up with aberrant metrics (like "mean reads per cell", which is determined as the raw number of reads divided by the number of called cells).
Leave a comment:
GenoMax replied

11-03-2017, 08:13 AM
If you are using the cellranger pipeline then you should not need to do anything to the data outside the pipeline. Cellranger pipeline takes in raw Illumina flowcell data folder and does the entire analysis (including demultiplexing/alignments/counting etc).

Last edited by GenoMax; 11-03-2017, 08:17 AM.
Leave a comment:
santiagorevale replied

11-03-2017, 08:04 AM
Hi Genomax,

I'm using 10Xs cellranger pipeline.

In this particular configuration, they include both "cell barcode" and "UMI" in Read 1 while the current RNA data is encoded in Read 2. The Index 1 file is used as part of the demultiplexing process because 10X uses 4 Illumina indexes per sample instead of just one.

One additional question: in this scenario, Read 1 is only 26 bp while Read 2 is 75 bp. I want to apply a basic filtering (only maxns) on Read 2, is there a way of doing that? I've tried skipr1=t but that only applies to kmer opperations.

Thanks again.
Leave a comment:
GenoMax replied

11-03-2017, 07:50 AM
Can you give examples of programs you are referring to? I assume the index read here is not the standard Illumina index but some sort of UMI?

BTW: Generation of index read in a separate file is an atypical application which requires a separate option for the bcl2fastq pipeline.

Last edited by GenoMax; 11-03-2017, 07:52 AM.
Leave a comment:

Previous 1 7 8 9 10 11 12 13 20 34 template Next

Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM
Strategies for Sequencing Challenging Samples

by seqadmin

Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
- Channel: Articles
03-22-2024, 06:39 AM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News