I didn't because the manual says --gzip is for zipping the temp files, not for unzipping the input fastq.gz files. And without --gzip, the running was successful until the end.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by chxu02 View PostHi Felix,
I'm reporting a bug from v0.14.0. When I used fastq in gz format to run bismark --multicore, in the end bismark failed to assemble all separate files into one. The files were named in *.fastq.gz_* initially, but in the end of running, bismark unambiguously tried to assemble files with name *.fastq_*. Obviously it failed. Hope it helps.
Youyou
Attached is the latest development version of Bismark which should also understand the option --gzip.Attached Files
Comment
-
Ah good that explains it. As a said a few posts before --gzip was a corner case that wasn't handled properly, so it was not intended that the merging went wrong... If you use the development version I attached in the last post --gzip should be working now.
Comment
-
Hello Felix,
Thank you for your response. I have installed samtools. I found another problem. From Biostar, I found it that I should generate 2 different fastq files for paired end reads. So, I use fastq-dump --split files to extract from my SRA. I got 2 files of fastq and seems I have no problem so far (before, I only dump it to 1 files and bismark found duplicate ID error). The only problem is, I only got 1 files of BAM from Bismark. The file name is the same as the first fastq file with bam extension so I assume it should have the bam file with the second file name,But it's only one, for the first fastq. Is it normal or wrong? I use bismark <options> -1 first_1.fq -2 second_2.fq. The result is only first_1.bam.
This is my Final ALignment report :
Sequence pairs analysed in total: 29829521
Number of paired-end alignments with a unique best hit: 4156425
Mapping efficiency: 13.9%
Sequence pairs with no alignments under any condition: 23649277
Sequence pairs did not map uniquely: 2023819
Sequence pairs which were discarded because genomic sequence could not be extracted: 0
Mapping efficiency is really low. What do you think it caused?
For Bismark example data, I got this result:
Final Alignment report
======================
Sequences analysed in total: 10000
Number of alignments with a unique best hit from the different alignments: 4732
Mapping efficiency: 47.3%
Sequences with no alignments under any condition: 4279
Sequences did not map uniquely: 989
Sequences which were discarded because genomic sequence could not be extracted: 0
So I think my human genome reference is not bad.Last edited by barbarian; 03-16-2015, 05:44 PM.
Comment
-
I just replied to you on Biostars, but producing 1 BAM file from paired-end reads is the appropriate result. The reads from each file are indicated appropriately in the BAM format.
The low mapping efficiency is a different question then. There are a number of likely causes of that, the most common being fastq files that are out of sync. Try mapping fastq_1.fq by itself and see if the mapping efficiency jumps up.
Comment
-
Thanks Devon for jumping in. Here is a protocol that is worth reading in order to achieve good mapping results in most cases: http://www.epigenesys.eu/en/protcols...q-data-prot-57
Comment
-
Ok, it's strange. I tried with another sample data. The result for mapping efficiency of both files is 0.1% and if it is only one file it's 13.5%. Before this step, what I do is using
fastq-dump --split-files <sra file>
trim_galore --rrbs <fastq1>
trim_galore --rrbs <fastq2>
For both files:
bismark --bowtie2 <ref> -1 <fastq1> -2 <fastq2>
For 1 file:
bismark --bowtie2 <ref> <fastq1>
For reference, I'm sure that I already build with bowtie2 and I have checked it with Bismark data samples and the result is similar with the document. I'm trying to do with the next sample to see if it's the sample fault or my command fault. Any suggestion? By the way, I download the sample from NCBI data. Here is the link : http://www.ncbi.nlm.nih.gov/geo/quer...i?acc=GSE61150
The sample that I checked is the first sample. Here : http://www.ncbi.nlm.nih.gov/geo/quer...acc=GSM1498453
Thank you for your help.
Additional:
Tried to check it again using Fastqc after trimming, the result for both Fastq file is 50-50, not all good. The bad result is in per tile sequence quality, per base sequence content, sequence duplication levels, Kmre constantLast edited by barbarian; 03-17-2015, 06:06 PM.
Comment
-
For paired-end files you need to run Trim Galore in paired-end mode like this:
trim_galore --rrbs --paired <fastq1> <fastq2>
If you run it in twice in single-end mode it will break the sequence-by-sequence order of the files which then results in very low mapping efficiency.
I am in a meeting all day but can take a look myself at the file in question tonight or tomorrow.
Comment
Latest Articles
Collapse
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
-
by seqadmin
Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.
Nobel Prize for MicroRNA Discovery
This week,...-
Channel: Articles
10-07-2024, 08:07 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, Yesterday, 05:31 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 05:31 AM
|
||
Started by seqadmin, 10-24-2024, 06:58 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
10-24-2024, 06:58 AM
|
||
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types
by seqadmin
Started by seqadmin, 10-23-2024, 08:43 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
10-23-2024, 08:43 AM
|
||
Started by seqadmin, 10-17-2024, 07:29 AM
|
0 responses
58 views
0 likes
|
Last Post
by seqadmin
10-17-2024, 07:29 AM
|
Comment