I didn't because the manual says --gzip is for zipping the temp files, not for unzipping the input fastq.gz files. And without --gzip, the running was successful until the end.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by chxu02 View PostHi Felix,
I'm reporting a bug from v0.14.0. When I used fastq in gz format to run bismark --multicore, in the end bismark failed to assemble all separate files into one. The files were named in *.fastq.gz_* initially, but in the end of running, bismark unambiguously tried to assemble files with name *.fastq_*. Obviously it failed. Hope it helps.
Youyou
Attached is the latest development version of Bismark which should also understand the option --gzip.Attached Files
Comment
-
Ah good that explains it. As a said a few posts before --gzip was a corner case that wasn't handled properly, so it was not intended that the merging went wrong... If you use the development version I attached in the last post --gzip should be working now.
Comment
-
Hello Felix,
Thank you for your response. I have installed samtools. I found another problem. From Biostar, I found it that I should generate 2 different fastq files for paired end reads. So, I use fastq-dump --split files to extract from my SRA. I got 2 files of fastq and seems I have no problem so far (before, I only dump it to 1 files and bismark found duplicate ID error). The only problem is, I only got 1 files of BAM from Bismark. The file name is the same as the first fastq file with bam extension so I assume it should have the bam file with the second file name,But it's only one, for the first fastq. Is it normal or wrong? I use bismark <options> -1 first_1.fq -2 second_2.fq. The result is only first_1.bam.
This is my Final ALignment report :
Sequence pairs analysed in total: 29829521
Number of paired-end alignments with a unique best hit: 4156425
Mapping efficiency: 13.9%
Sequence pairs with no alignments under any condition: 23649277
Sequence pairs did not map uniquely: 2023819
Sequence pairs which were discarded because genomic sequence could not be extracted: 0
Mapping efficiency is really low. What do you think it caused?
For Bismark example data, I got this result:
Final Alignment report
======================
Sequences analysed in total: 10000
Number of alignments with a unique best hit from the different alignments: 4732
Mapping efficiency: 47.3%
Sequences with no alignments under any condition: 4279
Sequences did not map uniquely: 989
Sequences which were discarded because genomic sequence could not be extracted: 0
So I think my human genome reference is not bad.Last edited by barbarian; 03-16-2015, 05:44 PM.
Comment
-
I just replied to you on Biostars, but producing 1 BAM file from paired-end reads is the appropriate result. The reads from each file are indicated appropriately in the BAM format.
The low mapping efficiency is a different question then. There are a number of likely causes of that, the most common being fastq files that are out of sync. Try mapping fastq_1.fq by itself and see if the mapping efficiency jumps up.
Comment
-
Thanks Devon for jumping in. Here is a protocol that is worth reading in order to achieve good mapping results in most cases: http://www.epigenesys.eu/en/protcols...q-data-prot-57
Comment
-
Ok, it's strange. I tried with another sample data. The result for mapping efficiency of both files is 0.1% and if it is only one file it's 13.5%. Before this step, what I do is using
fastq-dump --split-files <sra file>
trim_galore --rrbs <fastq1>
trim_galore --rrbs <fastq2>
For both files:
bismark --bowtie2 <ref> -1 <fastq1> -2 <fastq2>
For 1 file:
bismark --bowtie2 <ref> <fastq1>
For reference, I'm sure that I already build with bowtie2 and I have checked it with Bismark data samples and the result is similar with the document. I'm trying to do with the next sample to see if it's the sample fault or my command fault. Any suggestion? By the way, I download the sample from NCBI data. Here is the link : http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE61150
The sample that I checked is the first sample. Here : http://www.ncbi.nlm.nih.gov/geo/quer...acc=GSM1498453
Thank you for your help.
Additional:
Tried to check it again using Fastqc after trimming, the result for both Fastq file is 50-50, not all good. The bad result is in per tile sequence quality, per base sequence content, sequence duplication levels, Kmre constantLast edited by barbarian; 03-17-2015, 06:06 PM.
Comment
-
For paired-end files you need to run Trim Galore in paired-end mode like this:
trim_galore --rrbs --paired <fastq1> <fastq2>
If you run it in twice in single-end mode it will break the sequence-by-sequence order of the files which then results in very low mapping efficiency.
I am in a meeting all day but can take a look myself at the file in question tonight or tomorrow.
Comment
Latest Articles
Collapse
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
-
by seqadmin
The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...-
Channel: Articles
05-06-2024, 07:48 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 03:16 PM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
Yesterday, 03:16 PM
|
||
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability
by seqadmin
Started by seqadmin, 05-29-2024, 01:32 PM
|
0 responses
14 views
0 likes
|
Last Post
by seqadmin
05-29-2024, 01:32 PM
|
||
Started by seqadmin, 05-24-2024, 07:15 AM
|
0 responses
203 views
0 likes
|
Last Post
by seqadmin
05-24-2024, 07:15 AM
|
||
Started by seqadmin, 05-23-2024, 10:28 AM
|
0 responses
225 views
0 likes
|
Last Post
by seqadmin
05-23-2024, 10:28 AM
|
Comment