Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dpryan
    replied
    That's correct. You essentially act as though you have a single-end dataset. If the mapping efficiency jumps to a more reasonable level when doing that, then either the fastq files are out of sync or there's something weird with fastq_2.fq.

    Leave a comment:


  • barbarian
    replied
    oh, do you mean only use the first file, not together with the second file?

    Leave a comment:


  • dpryan
    replied
    "by itself", not "to itself", big difference. This is purely to diagnose the cause of the low mapping efficiency.

    Leave a comment:


  • barbarian
    replied
    If mapping fastq_1.fq to itself, is there any biological meaning behind that? Will the result still represent the actual methylation condition? Thank you.

    Leave a comment:


  • dpryan
    replied
    I just replied to you on Biostars, but producing 1 BAM file from paired-end reads is the appropriate result. The reads from each file are indicated appropriately in the BAM format.

    The low mapping efficiency is a different question then. There are a number of likely causes of that, the most common being fastq files that are out of sync. Try mapping fastq_1.fq by itself and see if the mapping efficiency jumps up.

    Leave a comment:


  • barbarian
    replied
    Hello Felix,

    Thank you for your response. I have installed samtools. I found another problem. From Biostar, I found it that I should generate 2 different fastq files for paired end reads. So, I use fastq-dump --split files to extract from my SRA. I got 2 files of fastq and seems I have no problem so far (before, I only dump it to 1 files and bismark found duplicate ID error). The only problem is, I only got 1 files of BAM from Bismark. The file name is the same as the first fastq file with bam extension so I assume it should have the bam file with the second file name,But it's only one, for the first fastq. Is it normal or wrong? I use bismark <options> -1 first_1.fq -2 second_2.fq. The result is only first_1.bam.

    This is my Final ALignment report :

    Sequence pairs analysed in total: 29829521
    Number of paired-end alignments with a unique best hit: 4156425
    Mapping efficiency: 13.9%
    Sequence pairs with no alignments under any condition: 23649277
    Sequence pairs did not map uniquely: 2023819
    Sequence pairs which were discarded because genomic sequence could not be extracted: 0

    Mapping efficiency is really low. What do you think it caused?

    For Bismark example data, I got this result:
    Final Alignment report
    ======================
    Sequences analysed in total: 10000
    Number of alignments with a unique best hit from the different alignments: 4732
    Mapping efficiency: 47.3%
    Sequences with no alignments under any condition: 4279
    Sequences did not map uniquely: 989
    Sequences which were discarded because genomic sequence could not be extracted: 0

    So I think my human genome reference is not bad.
    Last edited by barbarian; 03-16-2015, 05:44 PM.

    Leave a comment:


  • fkrueger
    replied
    Ah good that explains it. As a said a few posts before --gzip was a corner case that wasn't handled properly, so it was not intended that the merging went wrong... If you use the development version I attached in the last post --gzip should be working now.

    Leave a comment:


  • chxu02
    replied
    Sorry Felix, I'm bad. I checked my running history yesterday and found I used --gzip. But why did it happen though, if the purpose of --gzip is just to zip temp conversion files?

    Leave a comment:


  • fkrueger
    replied
    Originally posted by chxu02 View Post
    Hi Felix,

    I'm reporting a bug from v0.14.0. When I used fastq in gz format to run bismark --multicore, in the end bismark failed to assemble all separate files into one. The files were named in *.fastq.gz_* initially, but in the end of running, bismark unambiguously tried to assemble files with name *.fastq_*. Obviously it failed. Hope it helps.

    Youyou
    Hmm, in case you didn't use --gzip I don't think I quite understand the error you are reporting then. Both running files ending in .fastq or .fastq.gz works fine for me here. Would you mind sending me the entire error message you are seeing as email?

    Attached is the latest development version of Bismark which should also understand the option --gzip.
    Attached Files

    Leave a comment:


  • chxu02
    replied
    I didn't because the manual says --gzip is for zipping the temp files, not for unzipping the input fastq.gz files. And without --gzip, the running was successful until the end.

    Leave a comment:


  • fkrueger
    replied
    Have you specified --gzip for this run?

    Leave a comment:


  • chxu02
    replied
    Hi Felix,

    I'm reporting a bug from v0.14.0. When I used fastq in gz format to run bismark --multicore, in the end bismark failed to assemble all separate files into one. The files were named in *.fastq.gz_* initially, but in the end of running, bismark unambiguously tried to assemble files with name *.fastq_*. Obviously it failed. Hope it helps.

    Youyou

    Leave a comment:


  • fkrueger
    replied
    Hi Barbarian,

    I am afraid the problem you are seeing is indeed caused by the fact that you don't have Samtools installed. In a bid to get multicore processing working in a reasonable time I assumed that everyone is running Samtools already so it is currently not designed to deal with sam.gz files.

    Just as a heads-up, the initial release (v0.14.0) also doesn't deal correctly with some corner cases such as the --gzip option for temp files (which is already fixed in the development version), and the -B (basename) option (which is yet to be looked at).

    So bottom line: If you install Samtools and don't try to run all cornercases at the same time (--gzip, -B) it should work nicely.

    Leave a comment:


  • barbarian
    replied
    Hello,I use the recent Bismark application. I tried to execute multicore command

    bismark --multicore 4 <renome> -n 1 <filename>

    The result is 4 split file of sam.gz (I haven't installed samtools). After that, I tried to call the methylation extractor

    bismark_methylation_extractor -p --comprehensive *.sam.gz

    The parameter *.sam.gz is to select all 4 sam.gz file but this is failed. Which one I should use to call the methylation extractor command? Thank you.


    *note :
    I notice there is a one .bam file. I don't know how this file generated because I haven't installed samtools, but this bam file is only 16 kb size and it's nothing
    And I notice that the alignment process is done assuming it's single end and my dat ais paired end, how can I set the alignment so that it use paired end?
    Last edited by barbarian; 03-15-2015, 05:53 PM.

    Leave a comment:


  • fkrueger
    replied
    Bismark finally supporting parallel alignments

    We would like to announce that a new version of Bismark (v0.14.0) has just been released. This version adds a parallelization switch to the Bismark alignment step, and also changes a couple of other issues detailed below:

    o Bismark: Eventually added parallelization to the Bismark alignment step using the option '--muticore int' which sets the number of parallel instances of Bismark to be run concurrently. At least in this first distribution this is achieved by forking the Bismark alignment step very early on so that each individual Spawn of Bismark (SoB?) processes only every n-th sequence (n being set by --multicore). Once all processes have completed, the individual BAM files, mapping reports, unmapped or ambiguous FastQ files are merged into single files in very much the same way as they would have been generated running Bismark conventionally with only a single instance.

    If system resources are plentiful this is a viable option to speed up the alignment process (we observed a near linear speed increase for up to --multicore 8 tested so far). However, please note that a typical Bismark run will use several cores already (Bismark itself, 2 or 4 threads of Bowtie/Bowtie2, Samtools, gzip etc...) and ~10-16GB of memory depending on the choice of aligner and genome. WARNING: Bismark Parallel (BP?) is resource hungry! Each value of --multicore specified will effectively lead to a linear increase in compute and memory requirements, so --multicore 4 for e.g. the GRCm38 mouse genome will probably use ~20 cores and eat ~40GB or RAM, but at the same time reduce the alignment time to ~25-30%. You have been warned...

    o Bismark: Changed the default output to BAM. SAM output may be requested using the option --sam

    o Bismark: No longer generates a piechart (.png) with the alignment stats. bismark2report generates a much nicer report anyway

    o Methylation Extractor: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required. In some instances files containing e.g. -1-2 in their filename might previously have been identified as paired-end incorrectly

    o deduplicate_bismark: To detect paired-end alignment mode from the @PG header line, white spaces before and after -1 and -2 are now required

    o deduplicate_bismark: Added option --version so that Clusterflow can report a version number

    o bismark2bedGraph: Fixed path handling for cases where the input files were given with path information and an output directory had been specified as well

    o coverage2cytosine: Fixed a typo in the shebang which prevented coverage2cytosine from running

    Even though we have tried out several corner cases this release is still somewhat experimental and we would appreciate any comments! Bismark can be downloaded from the Babraham Bioinformatics website.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Best Practices for Single-Cell Sequencing Analysis
    by seqadmin



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
    06-06-2024, 07:15 AM
  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 06-07-2024, 06:58 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-06-2024, 08:18 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-06-2024, 08:04 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 06-03-2024, 06:55 AM
0 responses
14 views
0 likes
Last Post seqadmin  
Working...
X