Originally posted by shadow19c
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
-
hello,
thank you very much.
I have a question a bout the deduplication (what is mean?)
If I am doing a BS-seq in Thaliana, the deduplication is needeed?
Leave a comment:
-
It is highly recommended to remove adapters and poor quality portions from reads to increase the mapping efficiency and confidence in the methylation data.
A typical workflow would be:
Raw data --> FastQC (quality control) --> Trim Galore (adapter/quality trimming) --> Bismark (alignments) --> deduplication --> downstream analysis of your choice
Here is a guide-document explaining all these steps in more detail.
Best,
Felix
Leave a comment:
-
Hello,
everyone, I'm new here i have a lot of questions about Bs-seq and more precisely about bismark.
I see that with paire-end seq we can use Bismark to do the mapping but before do you know if I have to remore adaptors, short fragments?
If it is yes do you know if there is some programs to do that or juste a remove directly in my reads.
Thanks
Leave a comment:
-
We have just released a new version of Bismark (version 0.7.7), which mainly extends the functionality of the Bismark methylation extractor, as recently discussed here on SeqAnswers. The methylation extractor does now include the functionality of the two additional scripts genome_methylation_bismark2bedGraph as well as genome_wide_cytosine_report; this means that it can, in addition to the standard methylation extractor output, generate sorted bedGraph and/or genome-wide cytosine report output files directly using the options --bedGraph or --cytosine_report, respectively.
Here are all changes in more detail:
Bismark
• When reading in the genome file Bismark does now automatically remove \r line ending characters as well. This sometimes caused problems when genome files had been edited on Windows machines.
• Added support for the Bowtie 2 options '--rdg int1,int2' and '--rfg int1,int2' to adjust the gap open and extension penalties for both read and reference sequence. This might be useful for very special conditions (e.g. PacBio data...)
Bismark methylation extractor
• Renamed methylation_extractor to bismark_methylation_extractor
• Added new function '-o/--output' to specify an output directory. This became necessary for integration into Galaxy
• Added new function '--no_header' to suppress the Bismark version header in the output files if plain alignment data is more desirable
• Added option '--bedGraph' to produce a bedGraph output file once the methylation extraction has finished; this reports the genomic location of a cytosine and its methylation state (in %). By default, only cytosines in CpG context will be sorted/reported
• Implemented option '--cutoff threshold' to set the minimum number of times a methylation state has to be seen for that nucleotide before its methylation percentage is reported
• Implemented option '--counts' which adds two additional columns to the bedGraph output file to enable further calculations:
Column 5: count of methylated calls per position
Column 6: count of unmethylated calls per position
• Implemented option '--CX_context' so that the sorted bedGraph output file contains information on every single cytosine that was covered in the experiment irrespective of its sequence context
• Added option '--cytosine_report' which produces a genome-wide methylation report for all cytosines. By default, the output uses 1-based chromosome coordinates and reports CpG context only. The output considers all Cs on both forward and reverse strands and reports their position, strand, trinucleotide content and methylation state
• Option '--CX_context' applies to the cytosine report as well. The output file wil contain information on every single cytosine in the genome irrespective of its context. This applies to both forward and reverse strands
• Implemented option '--zero_based' to use zero-based coordinates like used in e.g. bed files instead of 1-based coordinates
• Implemented option '--genome_folder PATH' to be used to extract sequences from. Accepted formats are FastA files ending with '.fa' or '.fasta'
• Added an option '--split_by_chromosome' which writes the cytosine report output to individual chromosome files instead of to one single very large file
Bismark is available for download at www.bioinformatics.babraham.ac.uk/projects/
Leave a comment:
-
Filtering out low poorly converted reads
I'm using the non-CpG context cytosines as a measure of conversion efficiency for my sample, and I'd like to filter out any reads with a particularly low efficiency. This might be a bigger problem for my application (different sequencing platform, longer reads and locus specific) than most users doing RRBS, but I imagine this type of filter would be good for any bisulfite mapping application...
Ideally, you would be able to adjust a threshold in the command line and select which context to use as the measure (non-CpG vs. only CHH?).
Does this sound like something that other users would find useful?
Leave a comment:
-
Originally posted by fkrueger View PostSome individual bases with a low base call quality can make it through indeed; this is due to the way the quality trimming is performed by Cutadapt. While I wouldn't have a certain number to go with it I would imagine that the amount of these is tiny (my guess is well below 0.1%). In the example you linked, the '#' is probably the Illumina flag for expressing that the pipeline had trouble determining the base signal, and this does not equal a general poor quality call.
After all, it would only matter if the base in question was a cytosine position in the genome (so ~20% of the time). And even then, the call might have been correct (albeit with a poor quality score), or might be one of the other bases that are not involved in methylation calling at all anyway, i.e. A, G or N (N is quite likely if the score was '#'). So overall I agree that this *might* result in an incorrect methylation call very occasionally, however if there were 10 such calls in ~5 billion correct calls (which you may easily get from one lane of HiSeq), I believe this is something one could live with (especially since quality checking would slow down the methylation extraction process quite noticeably...). Don't you think?
Leave a comment:
-
Originally posted by my_bio View PostThank you for your prompt reply. Trim Galore is a powerful tools to perform quality control and I have run our data through Trim_galore. but a read may still have a little lower sequencing quality bases after quality trimming. for example, sequencing quality of a read after trimming may like this:
GEEEEEFFCFFEEEEEECEEEGGGGEFFFFDGDGFGGGGDGFGDFGGFDG#CCCCCBBBAA.
there is a base has lower quality("#") in this read, if this base have unique aligned reference genome's cytosine, this base may affect the accuracy of the methylation level, So we need ignore this base.
After all, it would only matter if the base in question was a cytosine position in the genome (so ~20% of the time). And even then, the call might have been correct (albeit with a poor quality score), or might be one of the other bases that are not involved in methylation calling at all anyway, i.e. A, G or N (N is quite likely if the score was '#'). So overall I agree that this *might* result in an incorrect methylation call very occasionally, however if there were 10 such calls in ~5 billion correct calls (which you may easily get from one lane of HiSeq), I believe this is something one could live with (especially since quality checking would slow down the methylation extraction process quite noticeably...). Don't you think?
Leave a comment:
-
Originally posted by fkrueger View PostWe strongly recommend adapter and quality trimming of sequencing files before the alignments are carried out in the first place, and indeed we run all our samples through Trim Galore to do this (a protocol is available here). If you adhere to this procedure there is no need to filter for good quality basecalls afterwards.
GEEEEEFFCFFEEEEEECEEEGGGGEFFFFDGDGFGGGGDGFGDFGGFDG#CCCCCBBBAA.
there is a base has lower quality("#") in this read, if this base have unique aligned reference genome's cytosine, this base may affect the accuracy of the methylation level, So we need ignore this base.
Leave a comment:
-
Originally posted by my_bio View PostTo accurately calculate methylation level of cytosine, it's necessary to add another option to filter low sequencing quality reads. that is to say, if a base's sequencing quality is lower than 20, methylation extractor will ignore it.
Leave a comment:
-
To accurately calculate methylation level of cytosine, it's necessary to add another option to filter low sequencing quality reads. that is to say, if a base's sequencing quality is lower than 20, methylation extractor will ignore it.
Leave a comment:
-
If the new version of methylation extractor have been updated, please inform us, thanks.
Leave a comment:
-
Originally posted by my_bio View PostIt seems to work alright by now and I strongly suggest you to add these functions to methylation extractor. By the way, to my opinion, it is needed to splits output into different files for each chromosome. So we can parallel process by chromosome in subsequent analysis.Last edited by fkrueger; 09-21-2012, 08:14 AM.
Leave a comment:
-
Originally posted by fkrueger View PostHi my_bio,
I have now changed the output to be in the following format:
<chromosome> <position> <strand> <count methylated> <count non-methylated> <C context> <trinucleotide context>
I also fixed the compile errors, strangely enough it ran without any warnings on our system... I hope it'll work nicely now.
It seems to work alright by now and I strongly suggest you to add these functions to methylation extractor. By the way, to my opinion, it is needed to splits output into different files for each chromosome. So we can parallel process by chromosome in subsequent analysis.
Leave a comment:
-
Hi my_bio,
I have now changed the output to be in the following format:
<chromosome> <position> <strand> <count methylated> <count non-methylated> <C context> <trinucleotide context>
I also fixed the compile errors, strangely enough it ran without any warnings on our system... I hope it'll work nicely now.Attached Files
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Leave a comment: