Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Robin
    replied
    Originally posted by Brian Bushnell View Post
    You have a different number of read 1 and read 2. It looks like the pairing got corrupted in preprocessing, which would explain why none are proper pairs. You should reprocess the raw reads using both files at the same time, and only pair-aware tools such as BBDuk, to keep the order intact. Then remap. And if you are interested in indels, I suggest using BBMap for mapping. Alternately, if the pairs are all supposed to overlap, you can get more accurate indel calls by merging them first and mapping the merged reads rather than mapping them as pairs.
    The fastq files have low quality scores and contamination and I used the fastx-toolkit to trimmed some of reads base on the quality scores. The results of PE reads are not matched. I will use BBDuk tool to make PE reads match again, and realigning them with bwa-mem.

    You suggested to use "merged" two reads into one fastq file and map them as single-end reads. I am not sure that freebayes software will work with the single-end reads file.

    Thanks

    R

    Leave a comment:


  • Brian Bushnell
    replied
    You have a different number of read 1 and read 2. It looks like the pairing got corrupted in preprocessing, which would explain why none are proper pairs. You should reprocess the raw reads using both files at the same time, and only pair-aware tools such as BBDuk, to keep the order intact. Then remap. And if you are interested in indels, I suggest using BBMap for mapping. Alternately, if the pairs are all supposed to overlap, you can get more accurate indel calls by merging them first and mapping the merged reads rather than mapping them as pairs.

    Leave a comment:


  • Robin
    replied
    Originally posted by alexholman View Post
    There are two flags that I'm using that are not in your command.
    --no-snps
    suppresses calling SNPs, because for CRISPR analysis you don't really care about them
    and
    --use-duplicate-reads
    I have a feeling that this one is what you need. When next-gen sequencing amplicons most of your reads are going to be duplicates, simply because you are sequencing identical amplicons. You need to keep these in to get proper depth for the analysis. Make sure your alignment pipeline is not removing duplicates, and use the above flag in freebayes to make sure the indel analysis is using them.
    I just tried with two additional parameters on the command line, and the result is still the same as my early command line's results. Just let you know that my paired-end reads (250bp) is completed overlap with each other along the amplicons sequences region. I used the BWA-MEM aligned them with genomic reference hg38-chr5.fa only.

    here is the samtools flaystat info:

    -bash-4.1$ samtools flagstat test_clean.sorted.bam
    18255 + 0 in total (QC-passed reads + QC-failed reads)
    0 + 0 duplicates
    18243 + 0 mapped (99.93%:-nan%)
    18255 + 0 paired in sequencing
    9104 + 0 read1
    9151 + 0 read2
    0 + 0 properly paired (0.00%:-nan%)
    18231 + 0 with itself and mate mapped
    12 + 0 singletons (0.07%:-nan%)
    0 + 0 with mate mapped to a different chr
    0 + 0 with mate mapped to a different chr (mapQ>=5)

    you can see there is no properly paired in the sample.
    Thanks

    R

    Leave a comment:


  • alexholman
    replied
    There are two flags that I'm using that are not in your command.
    --no-snps
    suppresses calling SNPs, because for CRISPR analysis you don't really care about them
    and
    --use-duplicate-reads
    I have a feeling that this one is what you need. When next-gen sequencing amplicons most of your reads are going to be duplicates, simply because you are sequencing identical amplicons. You need to keep these in to get proper depth for the analysis. Make sure your alignment pipeline is not removing duplicates, and use the above flag in freebayes to make sure the indel analysis is using them.

    Leave a comment:


  • Robin
    replied
    Originally posted by alexholman View Post
    Well, after much frustration, I stumbled onto the package freebayes "Bayesian haplotype-based polymorphism discovery and genotyping"
    Bayesian haplotype-based genetic polymorphism discovery and genotyping. - GitHub - freebayes/freebayes: Bayesian haplotype-based genetic polymorphism discovery and genotyping.


    The caller seems to work well on amplicon data and ended up being the cleanest and most complete VCF file (with ref and alt allele frequencies).
    I have Crisp dataset with PE 250bp, and I tried Bayesian software tool, but I see very little INDEL in the vcf result file, and I can see the INDEL in IGV when the bam file is loaded into IGV view. I would like you to share some of detail info with me.
    1) I used bwa-mem as an aligner, and which one you used?
    2) my freebayes command line:
    freebayes -f /home/db/chr5.fa --region chr5:112818960-112819204 my_sorted.bam > results.vcf

    Leave a comment:


  • gerbarinov
    replied
    Originally posted by alexholman View Post
    I am attempting to detect indels from a panel of clones resulting from CRISPR targeted deletion. Regions around the target were PCR amplified to produce a roughly 160bp amplicon, which was then sequenced with as a PE150 run.
    I've been banging my head against finding a tool that can:

    1) Detect which clones have indels
    2) Identify the location of these indels, ideally in a VCF or similar file such that the full panel of 96 clones can visualized (i.e. in IGV).
    3) Provide annotation details about the read depth at the indel position and percentage of the sequences that contain an indel.
    4) Is a stand-alone tool that I can install on my Unix bo
    Thank you, for your very informative topic with a happy ending

    Leave a comment:


  • maxsalm
    replied
    You may also find Scalpel useful (http://scalpel.sourceforge.net/) which uses an assembly step during indel calling (http://www.ncbi.nlm.nih.gov/pubmed/25128977 ) that may help with some of the alignment-derived false negatives.

    Leave a comment:


  • Matt Kearse
    replied
    Originally posted by sarvidsson View Post
    Would it be possible to bulk import thousands of (typically paired) FASTQ files and assigning sample IDs to them?
    If prior to import you give the FASTQ files names that match their sample ID then their file name becomes the effective sample ID. Paired files should have an suffix (e.g 1 or 2) which will get stripped from the name when you pair them within Geneious which can be done in bulk.

    It's probably best you just try it with a sample or two to start with to see if Geneious gives acceptable results on your data.

    Leave a comment:


  • sarvidsson
    replied
    Originally posted by Matt Kearse View Post
    Unfortunately no, there isn't a Geneious command line interface. You can align or variant call in bulk by selecting all the data sets and choosing the options once.
    Or if that's not sufficient you can put together workflows with optional custom code fragments. See https://www.youtube.com/watch?v=uvgB2_YBmD4 for a short demo of workflows.
    I'll have a look at the custom workflows. Would it be possible to bulk import thousands of (typically paired) FASTQ files and assigning sample IDs to them?

    Originally posted by Matt Kearse View Post
    Also, one limitation of Geneious is that you can't yet export to VCF format so you'll have to settle for CSV export for now.
    VCF would be nice but is not a must. If the CSV format contain enough data I could genereate VCF from it where needed. First I'd like to compare the aligner/caller to our current amplicon re-sequencing pipeline.

    Leave a comment:


  • Matt Kearse
    replied
    Originally posted by sarvidsson View Post
    One of my colleagues has a Geneious license, so I might try it. However, I'm a command line guy - is there a way to automate Geneious for amplicon data? (I typically have thousands of samples with custom inline barcoding schemes)
    Unfortunately no, there isn't a Geneious command line interface. You can align or variant call in bulk by selecting all the data sets and choosing the options once.

    Or if that's not sufficient you can put together workflows with optional custom code fragments. See https://www.youtube.com/watch?v=uvgB2_YBmD4 for a short demo of workflows.

    Also, one limitation of Geneious is that you can't yet export to VCF format so you'll have to settle for CSV export for now.

    Leave a comment:


  • sarvidsson
    replied
    Originally posted by Matt Kearse View Post
    For indels like this, I recommend aligning using either Geneious, or BBMap which both successfully span large indels. Or maybe other aligners have settings to tweak that will improve results around indels.
    One of my colleagues has a Geneious license, so I might try it. However, I'm a command line guy - is there a way to automate Geneious for amplicon data? (I typically have thousands of samples with custom inline barcoding schemes)

    Leave a comment:


  • sarvidsson
    replied
    Originally posted by Matt Kearse View Post
    I also ran his data through FreeBayes, which also found the obvious indels he expected, but it didn't find other 'obvious' indels he wasn't aware of.
    That's my experience with Freebayes as well, I haven't found a good set of parameters to work with the amplicon data I have. I typically use GMAP (when I have Sanger data) and GSNAP (for Illumina data) to align this kind of data, and with my own custom caller on pileups from samtools am quite happy...

    Leave a comment:


  • Matt Kearse
    replied
    Alex shared some of his data with me to run through Geneious and we corresponded a bit by email, so I thought I'd share the results with anyone else who's interested.

    Geneious called the indels although it split them into multiple adjacent indels in some situations which isn't ideal. I hope to improve this soon.

    I also ran his data through FreeBayes, which also found the obvious indels he expected, but it didn't find other 'obvious' indels he wasn't aware of.

    The main problem was that the data was poorly aligned. For example, in one sample, one allele had a 29bp deletion and the other allele a 44bp deletion in the same region. The alignment created using BWA mem had failed to span the 44bp deletion, so no neither Geneious nor FreeBayes would call this indel from this alignment. I generated a better alignment using Geneious, and then Geneious called both indels, although split it a way that made it difficult to infer the two alleles. FreeBayes still failed to identify the two alleles in this case even when provided with an improved alignment.

    For indels like this, I recommend aligning using either Geneious, or BBMap which both successfully span large indels. Or maybe other aligners have settings to tweak that will improve results around indels.

    And for variant calling on this type of data, both Geneious or FreeBayes do OK, although neither works perfectly on the data Alex provided even when I generated a better alignment.

    Leave a comment:


  • alexholman
    replied
    sorry, double post
    Last edited by alexholman; 02-24-2015, 07:03 AM. Reason: double post

    Leave a comment:


  • alexholman
    replied
    Well, after much frustration, I stumbled onto the package freebayes "Bayesian haplotype-based polymorphism discovery and genotyping"
    Bayesian haplotype-based genetic polymorphism discovery and genotyping. - GitHub - freebayes/freebayes: Bayesian haplotype-based genetic polymorphism discovery and genotyping.


    The caller seems to work well on amplicon data and ended up being the cleanest and most complete VCF file (with ref and alt allele frequencies).

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Advanced Methods for the Detection of Infectious Disease
    by seqadmin




    The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
    ...
    11-27-2023, 01:15 PM
  • seqadmin
    Strategies for Investigating the Microbiome
    by seqadmin




    Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
    11-09-2023, 07:02 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 10:48 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-29-2023, 08:26 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-29-2023, 08:12 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-27-2023, 08:12 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Working...
X