We've just released a new version of Bismark 0.14.1 which mainly addresses a few bugs/glitches:
Bismark: Fixed the cleaning up stage in a --multicore run when --gzip had been specified as well
Bismark: Fixed the handling of files in a --multicore run when the input files had been specified including file path information
Bismark: Please note that the option -B/--basename in conjunction with --multicore is currently not supported (as in: disabled), but we are aiming to address this soon
Methylation Extractor: Fixed a bug with the position adjustment of paired-end reads when the reads should have been trimmed from their 3' ends (option --ignore_3prime)
deduplicate_bismark: Now also removing newline characters from the read conversion tag in case other programs interfered with the tag ordering and put this tag into the very last column
Download is available from the Bismark project page: http://www.bioinformatics.babraham.a...jects/bismark/
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hello again,
So, I decided to try single end dataset because I think paired end takes too many times. With current dataset, I've successfully use Bismark to do aligning process with 66% efficiency. But, I still have some problem. I read the paper published together with the dataset, it stated that they can get map efficiency 72% and higher percentage of CpG methylation. So, is it possible to increase the mapping efficiency with changing some parameter in Bismark or Trim Galore? In the paper, they didn't use Bismark or Trim galore. They developed their own code to do the trimming and methlyation call and use Bowtie as aligner. Thank you.
Leave a comment:
-
Fastq-dump is painfully slow. Sometimes you can get lucky and either ENA or DDBJ (conveniently located in Japan) has the files in fastq format. That speeds things up quite a bit, but unfortunately these other sources sometimes only have the SRA file (e.g., for this is experiment). Whenever you're dealing with large files, this can start taking a while. This is why many of us push jobs onto clusters where we use multiple machine at once (e.g., I used 13 nodes on ours to perform the alignment of this dataset in 30-40 minutes, which would have been less had someone else not decided to start 1000 jobs at the same time...).
Leave a comment:
-
Ok, thank you for your guidance. I will try to check the first million sequence to see which argument is best suited for analyzing so that I don't need to wait for hours to check whether I make a mistake or not.
Btw, is it really takes a lot of time for analysis using Bismark? My computer spesification is i7 (8 cores) and 28 GB of RAM. The analysis process seems take too long, for fastqdump, trim galore and bismark. It's around 7 hours or maybe more to completeLast edited by barbarian; 03-18-2015, 07:21 PM.
Leave a comment:
-
For the sake of comparison, below are some metrics that I get using local alignment on that dataset. You won't get identical metrics (we're using different tools and trimming differently), but they won't be that terribly different (except CpG methylation). So >75% alignment is definitely possible with this dataset (at least after removing very low quality reads...of which there are many).
Code:Alignment: 76660262 total reads analysed 59596692 paired-end reads mapped ( 77.74%). 27577367 concordant pairs 1545115 discordant pairs 1351728 reads aligned as singletons Number of hits aligning to each of the orientations: 11086480 14.46% OT (original top strand) 10666480 13.91% OB (original bottom strand) 19288820 25.16% CTOT (complementary to the original top strand) 18554912 24.20% CTOB (complementary to the original bottom strand) Cytosine Methylation (N.B., statistics from overlapping mates are added together!): Number of C's in a CpG context: 188158298 Percentage of methylated C's in a CpG context: 44.23% Number of C's in a CHG context: 173622589 Percentage of methylated C's in a CHG context: 2.27% Number of C's in a CHH context: 348611484 Percentage of methylated C's in a CHH context: 7.04%
Leave a comment:
-
Seems I made a mistake while performing trimming_galore. Thank you so much. I will add non directional parameter for tonight and tomorrow I will se the result again.
Leave a comment:
-
The simplest method is to just take a million or so reads and align them in a non-directional manner. If there's considerable alignment to the CTOB and CTOT strands, then it's non-directional.
Leave a comment:
-
I'll also note that you can probably get >75% alignment rate with this dataset, at least I did with a subset of it and using local alignment. This would probably be 80-85% if I included all of the multimappers in the metrics.
Leave a comment:
-
Thank you for your info. I always think it is directional. Seems I need to change the command again. Can you tell me how to check whether a sequence is directional or not?
Leave a comment:
-
FYI, this is a non-directional dataset, so make sure to use the appropriate options.
Leave a comment:
-
Thank you for your reply. I've just realized it this afternoon. Now I'm waiting for the result. Maybe tomorrow I will have another question because usually it will not finish today. Good luck with your meeting.
Leave a comment:
-
For paired-end files you need to run Trim Galore in paired-end mode like this:
trim_galore --rrbs --paired <fastq1> <fastq2>
If you run it in twice in single-end mode it will break the sequence-by-sequence order of the files which then results in very low mapping efficiency.
I am in a meeting all day but can take a look myself at the file in question tonight or tomorrow.
Leave a comment:
-
Ok, it's strange. I tried with another sample data. The result for mapping efficiency of both files is 0.1% and if it is only one file it's 13.5%. Before this step, what I do is using
fastq-dump --split-files <sra file>
trim_galore --rrbs <fastq1>
trim_galore --rrbs <fastq2>
For both files:
bismark --bowtie2 <ref> -1 <fastq1> -2 <fastq2>
For 1 file:
bismark --bowtie2 <ref> <fastq1>
For reference, I'm sure that I already build with bowtie2 and I have checked it with Bismark data samples and the result is similar with the document. I'm trying to do with the next sample to see if it's the sample fault or my command fault. Any suggestion? By the way, I download the sample from NCBI data. Here is the link : http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE61150
The sample that I checked is the first sample. Here : http://www.ncbi.nlm.nih.gov/geo/quer...acc=GSM1498453
Thank you for your help.
Additional:
Tried to check it again using Fastqc after trimming, the result for both Fastq file is 50-50, not all good. The bad result is in per tile sequence quality, per base sequence content, sequence duplication levels, Kmre constantLast edited by barbarian; 03-17-2015, 06:06 PM.
Leave a comment:
-
Thanks Devon for jumping in. Here is a protocol that is worth reading in order to achieve good mapping results in most cases: http://www.epigenesys.eu/en/protcols...q-data-prot-57
Leave a comment:
-
Ok. I will try now. Maybe will have another question tomorrow after the result is out
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
31 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Leave a comment: