Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fkrueger
    replied
    Hi Reco,
    RRBS reads should start (and end) at MspI restriction sites in the genome. The read distribution is therefore governed by the length of the genomic MspI fragments, and whether reads overlap or not depends on the read length you used. If you would generate an annotation track of all theoretical MspI fragments in the genome (which are typically in the size range of 70-220bp) you should see that all reads line up nicely with these fragments. RRBS is very different to 'normal' sequencing, and you wouldn't necessarily expect reads to form contigs.

    Leave a comment:


  • reco
    replied
    low coverage of mapping of reads from RRBS library

    The problem is with low coverage (NOT a significant overlapping between the single-end reads) with RRBS library. Bismark was run with the options -N 1 -L 20 --bowtie2 with an outcome of 40% mapping efficiency (Uniquely mapped) and about 25% Ambiguous reads in all the 7 samples in the RRBS Experiment. Adapters were removed before the mapping but it turned out to be that mapping efficiency was the same even without removal.

    In all locations that were checked there is often multiple copies of a read aligning at a location but not a overlap between different reads (such as formation of a contig). So the mapping seems to be erroneous somehow.

    Thanks.
    Last edited by reco; 06-13-2014, 04:32 PM.

    Leave a comment:


  • fkrueger
    replied
    Originally posted by reco View Post
    Please set the number of multiseed mismatches for Bowtie 2 with '-N <int>' (where <int> can be 0 or 1
    As the error message hinted already, -N can only be 0 or 1, but not 3 (nor 5...). Cheers, Felix
    Last edited by fkrueger; 06-13-2014, 11:27 PM. Reason: The closing bracket was in fact there, so not a bug...

    Leave a comment:


  • reco
    replied
    I seem to be having a rare problem with bismark; it might not be accepting the Bowtie2 option, the multiseed alignment mismatch Number, greater than 1 (one). Following is the command line run and the reported error,

    bismark -L 10 -N 3 -p 4 --un --non_bs_mm --ambiguous --bowtie2 Sequence/Chromosomes/ SRR001725.fastq -o Cleandata/Output/OutputL15

    ERROR reported : Please set the number of multiseed mismatches for Bowtie 2 with '-N <int>' (where <int> can be 0 or 1

    Thanks.

    Leave a comment:


  • fkrueger
    replied
    Even if you wanted to use the --no-mixed option you would still have to supply paired-end files that have the same number of sequences in Read 1 and Read 2. Bowtie 2 would then report singleton alignments if a read pair fails to align in paired-end mode. This is however not supported by the current way Bismark works, but you could run Bismark in paired-end mode first, write out unmapped reads using --unmapped and then run it again in SE mode for R1 (default) and R2 (--pbat mode) individually if you really wanted to.

    First you need to get your paired-end files fixed though. The easiest way would be to use Trim Galore (with --paired) or another adapter/quality trimmer that is paired-end aware and does not leave orphan singleton reads in the FastQ file. Cheers, Felix

    Leave a comment:


  • sunggong
    replied
    how to disable '--no-mixed'

    Hi,
    Is it possible to disable '--no-mixed' option? I wanted to rescue some short reads filtered in previous bismark run, but not all reads are paired. With '--no-mixed' on, bowtie2 spits following errors:
    Error, fewer reads in file specified with -1 than in file specified with -2
    terminate called after throwing an instance of 'int'
    (ERR): bowtie2-align died with signal 6 (ABRT)


    If '--no-mixed' cannot be disabled, I might need to remove those problematic orphan reads.

    Thanks.

    Leave a comment:


  • blakeoft
    replied
    With the help of Felix, I determined that the problem had to do with the way I saved the test dataset. To anyone who also wants to use the test data, I recommend right clicking the link to the test dataset, and selecting "Save Link As..." instead of opening the link, selecting all text and pasting it into a word processor.

    Thank you Felix!

    Leave a comment:


  • fkrueger
    replied
    Originally posted by blakeoft View Post
    I've downloaded the test dataset from the Bismark homepage and tried aligning the reads. The mapping efficiency is 0%. Here's my command line:
    Code:
    bismark -n 1 -l 50 /home/user/genome/hg19 test_dataset.fastq
    Any idea why none of the reads mapped?
    Can you please link or send me the text you see on your screen to [email protected]?

    Cheers, Felix

    Leave a comment:


  • blakeoft
    replied
    I've downloaded the test dataset from the Bismark homepage and tried aligning the reads. The mapping efficiency is 0%. Here's my command line:
    Code:
    bismark -n 1 -l 50 /home/user/genome/hg19 test_dataset.fastq
    Any idea why none of the reads mapped?

    Leave a comment:


  • fkrueger
    replied
    Hi Andrew,

    thanks for this, it looks alright to me so I already changed it in the current development version. I wanted to change a few things regarding the NM, MD and XX tags in the SAM output, but once this is done I'll put out a new release.

    Cheers, Felix

    Leave a comment:


  • adeirossi
    replied
    Hi Felix,

    On the topic of bismark_methylation_extractor, I have run into an error when I try to run it on BAM files with very few reads:
    $ perl bismark_methylation_extractor --paired-end --no_overlap --ignore 2 --ignore_r2 33 --output $outDir --report --samtools_path /mnt/ngs/analysis/software/samtools-0.1.19 --gzip --bedGraph --buffer_size 4G 2_267_5_CTTGTA_021.bam
    *** Bismark methylation extractor version v0.12.2 ***

    Summarising Bismark methylation extractor parameters:
    ===============================================================
    Bismark paired-end SAM format specified (default)
    First 2 bp will be disregarded when processing the methylation call string of Read 1
    First 33 bp will be disregarded when processing the methylation call string of Read 2
    Output path specified as: meth/ctl_results/A73L06H9O/2_267_5_CTTGTA_021/

    ...

    Processed 8 lines from meth/ctl_results/A73L06H9O/2_267_5_CTTGTA_021/2_267_5_CTTGTA_021.bam in total
    Total number of methylation call strings processed: 16

    Determining maximum read lengths for M-Bias plots
    Maximum read length of Read 1: 73
    Maximum read length of Read 2: 0

    No data sets or points at /mnt/ngs/analysis/software/bismark/bismark_v0.12.2/bismark_methylation_extractor line 649, <IN> line 20.

    The script dies when plot() fails to generate a read 2 M-bias plot. Since the absence of M-bias plots does not particularly bother me, I came up with this patch (also see attached) so that plot() failure causes a warning instead of a fatal error:

    Code:
    --- a/bismark_v0.12.2/bismark_methylation_extractor
    +++ b/bismark_methylation_extractor-0.12.2-mod
    @@ -558,11 +558,14 @@ sub produce_mbias_plots{
    
         $graph1->set_legend('CpG methylation','CHG methylation','CHH methylation','CpG total calls','CHG total calls','CHH total calls');
    
    -    my $gd1 = $graph1->plot(\@mbias_read1) or die $graph1->error;
    +    if (my $gd1 = $graph1->plot(\@mbias_read1)) {
    +      open (MBIAS_G1,'>',$mbias_graph_1) or die "Failed to write to file for M-bias plot 1: $!\n\n";
    +      binmode MBIAS_G1;
    +      print MBIAS_G1 $gd1->png;
    +    } else {
    +      warn "WARNING: Cannot generate read 1 M-bias plot: " . $graph1->error;
    +    }
    
    -    open (MBIAS_G1,'>',$mbias_graph_1) or die "Failed to write to file for M-bias plot 1: $!\n\n";
    -    binmode MBIAS_G1;
    -    print MBIAS_G1 $gd1->png;
       }
    
       if ($paired){
    @@ -646,11 +649,13 @@ sub produce_mbias_plots{
                      ) or die $graph2->error;
    
           $graph2->set_legend('CpG methylation','CHG methylation','CHH methylation','CpG total calls','CHG total calls','CHH total calls');
    -      my $gd2 = $graph2->plot(\@mbias_read2) or die $graph2->error;
    -
    -      open (MBIAS_G2,'>',$mbias_graph_2) or die "Failed to write to file for M-bias plot 2: $!\n\n";
    -      binmode MBIAS_G2;
    -      print MBIAS_G2 $gd2->png;
    +      if (my $gd2 = $graph2->plot(\@mbias_read2)) {
    +        open (MBIAS_G2,'>',$mbias_graph_2) or die "Failed to write to file for M-bias plot 2: $!\n\n";
    +        binmode MBIAS_G2;
    +        print MBIAS_G2 $gd2->png;
    +      } else {
    +        warn "WARNING: Cannot generate read 2 M-bias plot: " . $graph2->error;
    +      }
    
         }
       }
    There may be a more elegant way to avoid the error, but this patch at least allowed the extractor to get past the plotting step and generate bedGraph output, so I thought I'd share. I'd be curious to hear your thoughts.

    Thanks,
    Andrew
    Attached Files

    Leave a comment:


  • blancha
    replied
    Hi Felix,

    Looks great. Thanks.
    I'll try it out.

    Thanks.

    Alexis

    Leave a comment:


  • fkrueger
    replied
    Originally posted by blancha View Post
    Hi,

    Would it be possible to add an option to bismark_methylation_extractor to generate a .cov file with 0-based start coords and 1-based end coords, in the same format as the .bedGraph file?

    That way, I could do a downstream analysis with bedtools directly.
    After extracting the methylation counts, I could also generate the bigWig files without first having to convert the start genomic coordinates.

    There is a --zero_based parameter, but my understanding of the documentation is that it only applies to the genome-wide cytosine methylation report. I'm also not sure if the end-coords will still be 1-based. If I use this option, will the report have 0-based start coords and 1-based end-coords, or will both start and end coords be 0-based?

    It seems to me that it would be nice to have the option to generate all the files with 0-based start coordinates and 1-based genomic coordinates, to respect the bedtools and internal UCSC format. Otherwise, having files in different formats becomes quite confusing, and requires an extra step to convert the genomic coordinates to the desired format.

    Thank you.
    I have just changed the '--zero_based' option of the methylation extractor to write out an additional coverage file (ending in .zero.cov) which uses the UCSC zero-based, half-open standard. The cytosine report only writes out a single position (as it a single cytosine), which uses 1-based coords by default but can be changed to 0-based if desired. Please find the files attached, I hope this is what you were after.
    Attached Files

    Leave a comment:


  • fkrueger
    replied
    Hi blancha, that sounds like a sensible suggestion, I'll look into this when I am back from my holidays.

    Leave a comment:


  • blancha
    replied
    Hi,

    Would it be possible to add an option to bismark_methylation_extractor to generate a .cov file with 0-based start coords and 1-based end coords, in the same format as the .bedGraph file?

    That way, I could do a downstream analysis with bedtools directly.
    After extracting the methylation counts, I could also generate the bigWig files without first having to convert the start genomic coordinates.

    There is a --zero_based parameter, but my understanding of the documentation is that it only applies to the genome-wide cytosine methylation report. I'm also not sure if the end-coords will still be 1-based. If I use this option, will the report have 0-based start coords and 1-based end-coords, or will both start and end coords be 0-based?

    It seems to me that it would be nice to have the option to generate all the files with 0-based start coordinates and 1-based genomic coordinates, to respect the bedtools and internal UCSC format. Otherwise, having files in different formats becomes quite confusing, and requires an extra step to convert the genomic coordinates to the desired format.

    Thank you.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Latest Developments in Precision Medicine
    by seqadmin



    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

    Somatic Genomics
    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
    05-24-2024, 01:16 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 06-03-2024, 06:55 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-30-2024, 03:16 PM
0 responses
26 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-29-2024, 01:32 PM
0 responses
29 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-24-2024, 07:15 AM
0 responses
216 views
0 likes
Last Post seqadmin  
Working...
X