Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • fkrueger
    replied
    The data used as a test data set is from the 2009 Lister et al paper, the reads were not specifically trimmed for adapters but just shortened to 50bp. Still, a lot of the reads suffer from poor quality sequence (as was the norm back in those days) and possibly adapter contamination. I am sure if you would remove them you would also see an increased mapping efficiency. If you follow this QC and trimming guide you should see fairly good results for your application (very long reads might need some specific attention though, e.g. using Bowtie2 for mapping).

    The test dataset is meant as a quick test that the program runs correctly after installation, and was not intended to showcase a staggeringly high mapping efficiency of Bisulfite-Seq in general .

    Leave a comment:


  • litc
    replied
    sorry fkrueger, I have made an wrong conclusion for writing a wrong sequence of OB(-) and OB(+)

    in the my above post, the OB(-) should be TTAGTGT, OB(+) should be ACACTAA.
    OB(C>T) ATATTAA
    OB(G>A) ACACTAA, can map to Genome(G>A),

    so in this case, all the stand(OT, OB) can theoretically map to either Genome(C>T) or Genome(G>A) no matter whether there was(were) methylated base(s) in the original strands. so the mapping efficiency of BS-seq can not be too low.

    I finally found out the reason why my data's mapping efficiency is 0.1%, my data is 250PE, it is the adapter in the last part of the read that cause the failure of mapping. After trimming the reads to 50bp, it can map to 76%. But I don't know why the Bismark test dataset(http://www.bioinformatics.babraham.a...d.html#bismark) be with a low mapping efficiency of 47.6%, it make me confusion and give me an impression that the BS-seq's mapping efficiency is low.

    Leave a comment:


  • litc
    replied
    Thank fkrueger, you are very kind. I want to know the reason deeply.

    here is an example:
    genome sequence is: ACGCTGA
    the real sample's sequence is:
    ACGCTGA
    TGCGACT
    the Red"C" is methylated base.

    then:
    Genome(C>T) is ATGTTGA
    Genome(G>A) is ACACTAA
    OT(+) is ACGTTGA
    OB(-) is TGTGATT
    OB(+) is AATCACA

    In the directional library, both OT and OB strand can be sequenced.
    OT(C>T) ATGTTGA, which can be map to Genome(C>T)
    OB(C>T) AATTATA, can not be map to Genome(C>T) or Genome(G>A)
    OB(G>A) AATCACA, can not be map to Genome(C>T) or Genome(G>A)

    so, in this example, only OT can be aligned, OB can not, so is this the problem of low mapping efficiency for BS-seq?
    Last edited by litc; 09-11-2013, 06:07 PM. Reason: formatting for easy reading.

    Leave a comment:


  • fkrueger
    replied
    The mapping efficiency for very short bisulfite converted sequences is substantially lower than for 'normal' sequencing, but for read lengths of 40bp or longer the difference is only a few percent. Fig. 2a of this review compares the mapping efficiencies of BS-Seq vs. normal alignments as a function of read length.

    0.1% mapping efficiency sounds very very low, this is already something you would probably see if you aligned sequences to a wrong genome ... (e.g. human/mouse).

    Leave a comment:


  • litc
    replied
    I want to know about mapping efficiency of bisulfite-sequencing, I have tested the test data(Bismark test dataset on http://www.bioinformatics.babraham.a...d.html#bismark), it's mapping efficiency is of 47.6%, also,my own bisulfite-sequencing data with mapping efficiency of 0.1%(this may be caused by mostly lab stuff's wrong protocol).

    I want to know if the mapping efficiency of bisulfite-sequencing is lower than other normal sequencing? Can every template's C>T version and G>A version of OT stand and OB stand map to Geneome(C>T) and Genome(G>A)?

    Leave a comment:


  • fkrueger
    replied
    Originally posted by frozenlyse View Post
    Ah sure, that makes sense - I may test it out for myself on the unaligned read from a cancer cell line with some known translocations and see if anything falls apart - if I get around to testing it I'll let you know how it goes.
    Great, I'll be interested to hear about the outcome!
    Last edited by fkrueger; 09-11-2013, 12:54 AM. Reason: typo

    Leave a comment:


  • frozenlyse
    replied
    Ah sure, that makes sense - I may test it out for myself on the unaligned read from a cancer cell line with some known translocations and see if anything falls apart - if I get around to testing it I'll let you know how it goes.

    Leave a comment:


  • fkrueger
    replied
    Hi Aaron,
    I have to admit that I haven't spent any time thinking about whether it would be possible or if it would be difficult to allow these settings. I would imagine that just enabling these options in the code would probably lead to some other part failing in some way, even though it is difficult to predict how. This is something that sounds very straight forward to implement, but might turn out to be surprisinglly difficult ...

    Leave a comment:


  • frozenlyse
    replied
    Hi,

    I was wondering about using WGBS data for structural variant prediction - according to the bismark manual, the bowtie2 paired end options --no-mixed and --no-discordant are always set on - is there any way of disabling this apart from editing the source code? Perhaps change these options to --allow-mixed and --allow-discordant so that the default behaviour does not change? It seems a bit odd to have options which impossible to turn off!
    Cheers,
    Aaron

    Leave a comment:


  • dpryan
    replied
    Offhand, I can't think of any application where this would cause a problem. With genome viewers, you need to coordinate sort anyway and the pairing isn't done at the read-name level (there's no fast index for querying the position of reads in BAM files by name).

    Leave a comment:


  • oria34
    replied
    Hi all,

    Running the last version of Bismark and focusing on the name of the reads (we discuss about that a while ago here) I have found that the names of the pairs are no longer /1 & /2. In my case both member of a pair are names "...../1" & "....../1".

    I know it doesn't matter too much since Bismark do the methylation call properly but I was wondering whether it can interfere with other downstream applications or genome viewers.



    Left alignment
    ----------------------
    Read name = FCD1LHLACXX:8:2308:5026:30317#ACCAGACT/1
    Location = groupXXI:767
    Alignment start = 756 (+)
    Cigar = 99M
    Mapped = yes
    Mapping quality = 255
    ----------------------
    Base = A
    Base phred quality = 39
    ----------------------
    Pair start = groupXXI:900 (-)
    Pair is mapped = yes
    Insert size = 242
    Pair orientation = F2R1
    ----------------------
    Second in pair
    -------------------
    XG = GA
    NM = 16
    XM = ...........x.............h..........x...xh......hh..xh..........x.h...
    x........h..x.....x.h........

    XR = GA
    XX = 11G13G10G3GG6GG2GG10G1G3G8G2G5G1G8
    -------------------Right alignment
    ----------------------
    Read name = FCD1LHLACXX:8:2308:5026:30317#ACCAGACT/1
    Location = groupXXI:767
    Alignment start = 900 (-)
    Cigar = 98M
    Mapped = yes
    Mapping quality = 255
    ----------------------
    ----------------------
    Pair start = groupXXI:756 (+)
    Pair is mapped = yes
    Insert size = -242
    Pair orientation = F2R1
    ----------------------
    First in pair
    -------------------
    XG = GA
    NM = 19
    XM = ........x..hh..x.Z..xh....x.....xh.....h.........Z...xh......x........
    ....xh..xh.......x.....Z....

    XR = CT
    XX = 8G2GG2G4GG4G5GG5G13GG6G12GG1TGG7G10
    -------------------

    Leave a comment:


  • fkrueger
    replied
    Seems like a point worthy addressing... But now I shall focus on holiday!

    Leave a comment:


  • gerald2545
    replied
    jute a follow-up, the same file took 1500 minutes to sort with -k3,3 parameter

    gerald

    Leave a comment:


  • gerald2545
    replied
    Thank you Felix for your time, but don't forget : you are on holidays

    Gérald

    Leave a comment:


  • fkrueger
    replied
    Sorting by chromosome in addition to the position might indeed be a relict of former versions of the script back when files weren't sorted into individual chromosome files. I'll take a look at this once I am back, but for the moment you should be fine just deleting the -k 3,3 from the sort command.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, 04-25-2024, 11:49 AM
0 responses
19 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-24-2024, 08:47 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
62 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X