Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • akramdi
    replied
    Hi Felix,

    Thanks for the quick response!

    I have another question: I recently tried to understand why my paired-end BS-seq data maps with a mapping efficiency of 40% (it seemed rather low to me), so I conducted some tests as suggested in previous messages dealing with low mappability efficiency. One thing I did is map a subset (-u 1000000) of read 1 and read 2 individually in single end mode to see if the mapping efficiency would go up. Plus, to be safe, I ran the mapping in --non_directional mode because I am not sure of the library type.

    Here's the part I don't understand: I have different OT, OB, CTOT, CTOB ratios between read1 and read2.

    Mapping report for read1
    Final Alignment report
    ======================
    Sequences analysed in total: 1000000
    Number of alignments with a unique best hit from the different alignments: 602427
    Mapping efficiency: 60.2%
    Sequences with no alignments under any condition: 177212
    Sequences did not map uniquely: 220361
    Sequences which were discarded because genomic sequence could not be extracted: 0

    Number of sequences with unique best (first) alignment came from the bowtie output:
    CT/CT: 299725 ((converted) top strand)
    CT/GA: 302357 ((converted) bottom strand)
    GA/CT: 150 (complementary to (converted) top strand)
    GA/GA: 195 (complementary to (converted) bottom strand)

    Mapping report for read2
    Final Alignment report
    ======================
    Sequences analysed in total: 1000000
    Number of alignments with a unique best hit from the different alignments: 457737
    Mapping efficiency: 45.8%
    Sequences with no alignments under any condition: 365712
    Sequences did not map uniquely: 176551
    Sequences which were discarded because genomic sequence could not be extracted: 1

    Number of sequences with unique best (first) alignment came from the bowtie output:
    CT/CT: 3950 ((converted) top strand)
    CT/GA: 3937 ((converted) bottom strand)
    GA/CT: 223725 (complementary to (converted) top strand)
    GA/GA: 226124 (complementary to (converted) bottom strand)

    Have you come across a similar situation? what could cause such discordance?

    Thanks again,
    Amira

    Leave a comment:


  • fkrueger
    replied
    Hi Amira,

    The numbers do indeed suggest that the library was constructed in a directional manner. The alignments to the CTOT and CTOB strands are < 0.3% so this is not terrible (and it is quite expected).

    To move on I would either suggest you re-align the data in the default mode, i.e. without --non_directional specified, or if you wanted to continue with the data as it is I would discard the CTO* files before continuing with the bismark2bedGraph step.

    Cheers, Felix

    Leave a comment:


  • akramdi
    replied
    Hi Felix,

    I was sent some bismark outputs (+ reports) to analyze. I am looking at the mapping reports, and I wondering if the ratio of mapped reads between OT, OB, CTOT, CTOB is strange or not. Can I conclude that the library was constructed in a directionnal manner? If so, I would have expected to see less reads aligned to CTOT, CTOB.


    Final Alignment report
    ======================
    Sequences analysed in total: 44844469
    Number of alignments with a unique best hit from the different alignments: 30974987
    Mapping efficiency: 69.1%
    Sequences with no alignments under any condition: 4171455
    Sequences did not map uniquely: 9698027
    Sequences which were discarded because genomic sequence could not be extracted: 36

    Number of sequences with unique best (first) alignment came from the bowtie output:
    CT/CT: 15485820 ((converted) top strand)
    CT/GA: 15393574 ((converted) bottom strand)
    GA/CT: 47834 (complementary to (converted) top strand)
    GA/GA: 47723 (complementary to (converted) bottom strand)

    Thanks!

    Amira

    Leave a comment:


  • alim123
    replied
    Sir, one more question, as you suggest to trim the sequences further..So would i used raw seq or already trimmed sequence for further trimming.

    Thank you

    Leave a comment:


  • fkrueger
    replied
    Thanks for sending some QC plots via email. The images were fairly small but it looks like you are using 2x100bp reads generated with the EpiGnome kit (there is a hefty bias at the start of reads that needs to be removed before mapping.

    Also you are using a version of Bismark that is almost 3 years and 22 releases outdated. Just get the latest versions here: https://github.com/FelixKrueger/Bismark or here: http://www.bioinformatics.babraham.a...jects/bismark/.

    You then need to run Trim Galore with --clip_r1 6 --clip_r2 6 (potentially more), and then run the alignments using Bismark v0.16.1 in default mode.

    Also please make sure to read up on biases using paired-end PBAT at QC Fail. I am pretty sure that you will see nice mapping efficiencies after following the steps in the cookbooks attached. Cheers, Felix

    Leave a comment:


  • fkrueger
    replied
    Originally posted by alim123 View Post
    Dear Felix

    I am getting really low mapping efficiencies nearly 37% when i am using Bismark ( bowtie1, n=3).
    when I am using BSmap with same sequences (pair end) on the same parameter, i am getting mapping efficiency nearly 80%.

    But the problem is with bsmap i am further struggling for data analysis, like how to get methylated and non methylated reads from the .txt file.

    So plz suggest me for better mapping efficiency. it will be really helpful for me.

    Thank you
    I am copying below the reply I have sent you via email already:

    Hi Alim,

    In order to give you a better diagnosis of the issues it would be good to have more information about your experiment, such as the sequencing length, the type of library, the genome in question and the version numbers of all programs involved etc. It would also be good to get access to the FastQC profiles of your samples.

    General rules you should adhere to are:

    1) Adapter and quality trimming, e.g. using Trim Galore
    2) Use Bowtie2 mode (which is the default mode anyway)
    3) Update to the latest version (v0.16.1)

    More detailed hands-on information may also be found here:

    or here:


    Hope this helps,
    Best, Felix

    Leave a comment:


  • alim123
    replied
    Dear Felix

    I am getting really low mapping efficiencies nearly 37% when i am using Bismark ( bowtie1, n=3).
    when I am using BSmap with same sequences (pair end) on the same parameter, i am getting mapping efficiency nearly 80%.

    But the problem is with bsmap i am further struggling for data analysis, like how to get methylated and non methylated reads from the .txt file.

    So plz suggest me for better mapping efficiency. it will be really helpful for me.

    Thank you

    Leave a comment:


  • fkrueger
    replied
    Given that the majority of reads come from the CTOT and CTOB strands it is to be expected that C is high while G is low. The fraction aligning to the OT and OB strands would do the exact opposite, so I think overall the picture you are seeing is quite alright.

    If you were operating on SAM files then cat should have worked (because this works on a line by line basis and doesn't do any clever (or weird) things as samtools merge does without -n. You only might have extra header lines in the middle somewhere but they should be ignored by the methylation extractor. Merging right from the start is probably the cleanest solution though.

    Leave a comment:


  • johnstonL
    replied
    Felix, thanks for the reply.

    Yes this library was prepared with the Zymo Meth-Seq kit as you have guessed. I have been thinking perhaps the library was not prepared that well. I ran a assessment on a raw sample in Fastqc and got the following base sequence content:



    The % of cytosine is awfully high compared to other reports I have seen and considering this is RRBS data.


    Actually, I have been simply concatenating the output files from the alignment step for input into the methylation extractor rather than using any sort of merging tool (i.e. cat sample1_1.sam sample1_2.sam > sample1). Would this cause the extractor to make incorrect methylation calls? I'm going to switch to combining fastq (with cat) instead like you have suggested to avoid this issue altogether.
    Attached Files

    Leave a comment:


  • fkrueger
    replied
    This is indeed a somewhat odd ratio of strands, it reminds me very much of the single-cell BS-Seq technique which undergoes several rounds of PBAT-type pull-down and strand-regeneration. I have not heard of an RRBS-method that would be doing this, do you happen to know how these libraries were generated (e.g. using the Zymo Pico Methyl-Seq kit?).
    I would probably just try to align only Read 1 (also in --non-directional mode) just to see if the mapping efficiency goes up a bit because 38% seems rather low).

    In principle it is fine to merge files at the BAM level and then feed the merged file to the methylation extractor. A word of caution is in order for paired-end end files though if you are using Samtools merge because this merging does not normally guarantee that Read 1 and Read 2 are kept in the same order in the following file unless you specify the option -n:
    Code:
    Options: -n       sort by read names
    Similarly, already merged files my be sorted by read name using
    Code:
    samtools sort -n
    to bring them back into the correct format.

    If the R1/R2 order is muddled with then the overlap detection and the strand assignments would go a little mad so this is to be avoided.

    I generally tend to merge files right from the start at the FastQ file level because this also reduces the number of files generated in the process considerably. Hope this helps, Felix

    Leave a comment:


  • johnstonL
    replied
    Hi Felix,

    I am working on analyzing some pair ended non-directional RRBS libraries with Bismark and have come across a few confusions. I begin by trimming the pair end files with Trim Galore (making use of the -rrbs -non-directional options and everything else default) and then aligning with bismark. What strikes me the most from the alignment report is that I get a strange ratio of mapped reads between OT, OB, CTOT, CTOB:

    Final Alignment report
    ======================
    Sequence pairs analysed in total: 12664432
    Number of paired-end alignments with a unique best hit: 4529206
    Mapping efficiency: 35.8%
    Sequence pairs with no alignments under any condition: 5506764
    Sequence pairs did not map uniquely: 2628462
    Sequence pairs which were discarded because genomic sequence could not be extracted: 0

    Number of sequence pairs with unique best (first) alignment came from the bowtie output:
    CT/GA/CT: 608018 ((converted) top strand)
    GA/CT/CT: 1626505 (complementary to (converted) top strand)
    GA/CT/GA: 1673560 (complementary to (converted) bottom strand)
    CT/GA/GA: 621123 ((converted) bottom strand)


    Final Cytosine Methylation Report
    =================================
    Total number of C's analysed: 117899629

    Total methylated C's in CpG context: 7232502
    Total methylated C's in CHG context: 362290
    Total methylated C's in CHH context: 1218612
    Total methylated C's in Unknown context: 2


    Total unmethylated C's in CpG context: 13074777
    Total unmethylated C's in CHG context: 29946985
    Total unmethylated C's in CHH context: 66064463
    Total unmethylated C's in Unknown context: 3


    C methylated in CpG context: 35.6%
    C methylated in CHG context: 1.2%
    C methylated in CHH context: 1.8%
    C methylated in unknown context (CN or CHN): 40.0%


    Also, the way the library was given to me is that each sample is split into 2 pairs (sample1_pair1_forward, sample1_pair1_reverse, sample1_pair2_forward, sample1_pair2_reverse). If I wanted to run the methylation extractor on the samples, would it be a problem if I simply gave it the concatenated output of the alignment pairs?

    i.e
    sample1_pair1_aligned + sample1_pair2_aligned > sample1_aligned
    methylation extractor sample1_aligned ...other samples

    Thanks

    Leave a comment:


  • hsiehph
    replied
    Originally posted by fkrueger View Post
    Hi hsiehph,

    I believe this problem has been fixed by now in this issue. You can get the latest development version of Bismark by cloning it from Github.
    Thanks Felix. I will test it with the latest development version.

    Leave a comment:


  • fkrueger
    replied
    Hi hsiehph,

    I believe this problem has been fixed by now in this issue. You can get the latest development version of Bismark by cloning it from Github.

    Leave a comment:


  • hsiehph
    replied
    Originally posted by fkrueger View Post
    Hi Dipro,

    This was indeed a typo which will be fixed in the next release which is actually due out today or tomorrow (and will finally support parallel alignments – so stay tuned!).

    A couple of things about the command you used:

    bismark_methylation_extractor -s -o --samtools_path --bedGraph --counts --remove_spaces --buffer_size --cytosine_report --genome_folder

    'Failed to read from file /path/to/file_fq.gz_bismark_bt2.bismark.cov: No such file or directory'
    Sorry if it is a stupid question, but did you change the ‘/path/to/file’ by a valid path of the file on your system?

    -s: not necessary (will be determined automatically)
    -o /requires/path/to/output/folder
    --samtools_path /requires/path/to/samtools/executable
    --counts: not necessary (used by default)
    --remove_spaces: only use this if really necessary, will otherwise cost time and temporary space
    --buffer_size: requires input, e.g. 10G
    --genome_folder /requires/path/to/genome/folder

    input file is required

    If you still struggle can you just send me the onscreen-text via email? This would make spotting mistakes in the command much easier. Cheers, Felix
    I also have a similar error message:
    gzip: output_folder/input.bismark.cov.gz: No such file or directory
    No last chromosome was defined, something must have gone wrong while reading the data in (e.g. specified wrong file path for a gzipped coverage file?). Please check your command!

    However, I can see the input_folder.bismark.cov.gz file exist in my output_folder/.

    The command I used as follows:
    bismark_methylation_extractor -p --no_overlap --bedGraph --counts --buffer_size 10G --cytosine_report --CX --split_by_chromosome -o output_folder/ --genome_folder genome_bowtie1/ --multicore 6 input_folder/input_folder.sam

    Interestingly, when I removed the -o output_folder/ (i.e. to the current directory), the script will finish properly. Does anyone have similar experience? The version of bismark is the latest one. Thanks for help.

    Leave a comment:


  • fkrueger
    replied
    Thanks for spotting that, and yes you are absolutely right it needs to be methylated / (methylated + unmethylated) *100.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X