Originally posted by my_bio
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
Originally posted by fkrueger View PostWe strongly recommend adapter and quality trimming of sequencing files before the alignments are carried out in the first place, and indeed we run all our samples through Trim Galore to do this (a protocol is available here). If you adhere to this procedure there is no need to filter for good quality basecalls afterwards.
GEEEEEFFCFFEEEEEECEEEGGGGEFFFFDGDGFGGGGDGFGDFGGFDG#CCCCCBBBAA.
there is a base has lower quality("#") in this read, if this base have unique aligned reference genome's cytosine, this base may affect the accuracy of the methylation level, So we need ignore this base.
Comment
-
Originally posted by my_bio View PostThank you for your prompt reply. Trim Galore is a powerful tools to perform quality control and I have run our data through Trim_galore. but a read may still have a little lower sequencing quality bases after quality trimming. for example, sequencing quality of a read after trimming may like this:
GEEEEEFFCFFEEEEEECEEEGGGGEFFFFDGDGFGGGGDGFGDFGGFDG#CCCCCBBBAA.
there is a base has lower quality("#") in this read, if this base have unique aligned reference genome's cytosine, this base may affect the accuracy of the methylation level, So we need ignore this base.
After all, it would only matter if the base in question was a cytosine position in the genome (so ~20% of the time). And even then, the call might have been correct (albeit with a poor quality score), or might be one of the other bases that are not involved in methylation calling at all anyway, i.e. A, G or N (N is quite likely if the score was '#'). So overall I agree that this *might* result in an incorrect methylation call very occasionally, however if there were 10 such calls in ~5 billion correct calls (which you may easily get from one lane of HiSeq), I believe this is something one could live with (especially since quality checking would slow down the methylation extraction process quite noticeably...). Don't you think?
Comment
-
Originally posted by fkrueger View PostSome individual bases with a low base call quality can make it through indeed; this is due to the way the quality trimming is performed by Cutadapt. While I wouldn't have a certain number to go with it I would imagine that the amount of these is tiny (my guess is well below 0.1%). In the example you linked, the '#' is probably the Illumina flag for expressing that the pipeline had trouble determining the base signal, and this does not equal a general poor quality call.
After all, it would only matter if the base in question was a cytosine position in the genome (so ~20% of the time). And even then, the call might have been correct (albeit with a poor quality score), or might be one of the other bases that are not involved in methylation calling at all anyway, i.e. A, G or N (N is quite likely if the score was '#'). So overall I agree that this *might* result in an incorrect methylation call very occasionally, however if there were 10 such calls in ~5 billion correct calls (which you may easily get from one lane of HiSeq), I believe this is something one could live with (especially since quality checking would slow down the methylation extraction process quite noticeably...). Don't you think?
Comment
-
Filtering out low poorly converted reads
I'm using the non-CpG context cytosines as a measure of conversion efficiency for my sample, and I'd like to filter out any reads with a particularly low efficiency. This might be a bigger problem for my application (different sequencing platform, longer reads and locus specific) than most users doing RRBS, but I imagine this type of filter would be good for any bisulfite mapping application...
Ideally, you would be able to adjust a threshold in the command line and select which context to use as the measure (non-CpG vs. only CHH?).
Does this sound like something that other users would find useful?
Comment
-
We have just released a new version of Bismark (version 0.7.7), which mainly extends the functionality of the Bismark methylation extractor, as recently discussed here on SeqAnswers. The methylation extractor does now include the functionality of the two additional scripts genome_methylation_bismark2bedGraph as well as genome_wide_cytosine_report; this means that it can, in addition to the standard methylation extractor output, generate sorted bedGraph and/or genome-wide cytosine report output files directly using the options --bedGraph or --cytosine_report, respectively.
Here are all changes in more detail:
Bismark
• When reading in the genome file Bismark does now automatically remove \r line ending characters as well. This sometimes caused problems when genome files had been edited on Windows machines.
• Added support for the Bowtie 2 options '--rdg int1,int2' and '--rfg int1,int2' to adjust the gap open and extension penalties for both read and reference sequence. This might be useful for very special conditions (e.g. PacBio data...)
Bismark methylation extractor
• Renamed methylation_extractor to bismark_methylation_extractor
• Added new function '-o/--output' to specify an output directory. This became necessary for integration into Galaxy
• Added new function '--no_header' to suppress the Bismark version header in the output files if plain alignment data is more desirable
• Added option '--bedGraph' to produce a bedGraph output file once the methylation extraction has finished; this reports the genomic location of a cytosine and its methylation state (in %). By default, only cytosines in CpG context will be sorted/reported
• Implemented option '--cutoff threshold' to set the minimum number of times a methylation state has to be seen for that nucleotide before its methylation percentage is reported
• Implemented option '--counts' which adds two additional columns to the bedGraph output file to enable further calculations:
Column 5: count of methylated calls per position
Column 6: count of unmethylated calls per position
• Implemented option '--CX_context' so that the sorted bedGraph output file contains information on every single cytosine that was covered in the experiment irrespective of its sequence context
• Added option '--cytosine_report' which produces a genome-wide methylation report for all cytosines. By default, the output uses 1-based chromosome coordinates and reports CpG context only. The output considers all Cs on both forward and reverse strands and reports their position, strand, trinucleotide content and methylation state
• Option '--CX_context' applies to the cytosine report as well. The output file wil contain information on every single cytosine in the genome irrespective of its context. This applies to both forward and reverse strands
• Implemented option '--zero_based' to use zero-based coordinates like used in e.g. bed files instead of 1-based coordinates
• Implemented option '--genome_folder PATH' to be used to extract sequences from. Accepted formats are FastA files ending with '.fa' or '.fasta'
• Added an option '--split_by_chromosome' which writes the cytosine report output to individual chromosome files instead of to one single very large file
Bismark is available for download at www.bioinformatics.babraham.ac.uk/projects/
Comment
-
Hello,
everyone, I'm new here i have a lot of questions about Bs-seq and more precisely about bismark.
I see that with paire-end seq we can use Bismark to do the mapping but before do you know if I have to remore adaptors, short fragments?
If it is yes do you know if there is some programs to do that or juste a remove directly in my reads.
Thanks
Comment
-
It is highly recommended to remove adapters and poor quality portions from reads to increase the mapping efficiency and confidence in the methylation data.
A typical workflow would be:
Raw data --> FastQC (quality control) --> Trim Galore (adapter/quality trimming) --> Bismark (alignments) --> deduplication --> downstream analysis of your choice
Here is a guide-document explaining all these steps in more detail.
Best,
Felix
Comment
-
Originally posted by shadow19c View Posthello,
thank you very much.
I have a question a bout the deduplication (what is mean?)
Comment
-
I would personally use the defaults to start with (0-500 bp) since often the size selection step does not quite what you would expect it to do. Only come back and change them if you are trying to track down errors such as low mapping efficiency.
Comment
-
Hello,
thank you for your answer so I made the mapping with default parameters :
Bismark report for: /data/a2e/kassam/BS-seq-WT/1.fq and /data/a2e/kassam/BS-seq-WT/2.fq (version: v0.7.7)
Bowtie was run against the bisulfite genome of /import/gr_a2e/TAIR9/ with the specified options: -q -n 1 -k 2 --best --maxins 500 --chunkmbs 512
1) Is it normal to have just the 1 sam file, because I have only 1.fq_bismark_pe.sam?
-------------
Sorry I have the answer so It is yes.
------------------------------------------------------
2)I have a question concerning the description of the vertical coverage, how to do that after the mapping and the filtering ?
ThanksLast edited by shadow19c; 10-14-2012, 11:52 PM.
Comment
-
Originally posted by shadow19c View PostHello,
2)I have a question concerning the description of the vertical coverage, how to do that after the mapping and the filtering ?
Thanks
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
50 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment