Seqanswers Leaderboard Ad

**kshankar** · 06-22-2012, 12:29 PM

We are having a recent issue with RRBS libraries. I wonder if someone might shed some light on or have encountered a similar problem. The library preparation is fine, except that only 30% of all sequences start with CGG/TGG. The quality of the DNA is good and shows high molecular weight band on a both agarose and PAGE gels. We cut 500 ng of DNA using 50 U of MspI overnight at 37 C. After ligation with methylated adapters and two rounds of BS conversion, the ligated DNA is amplified using pfu Turbo Hotstart. I assume that you would expect a much greater proportion of reads beginning with CGG/TGG. Any suggestions on what we are missing?

**kalyankpy** · 06-23-2012, 02:13 AM

@kshankar

It is not uncommon to find reads that doesnt start with CGG or TGG. However, observing only 30% of reads with CGG or TGG as starting triplet is of concern. In our experience we also noticed around 10-15 % of reads that start with a different triplet. It could be because of the non-specificity of the enzyme or due to degraded or broken down DNA at the starting step. I noticed that you use 50U of MspI for just 500ng. This may be too much for this reaction. Generally we follow the Gu et al 2011 protocol, that works fine. This protocol uses 10U for 300ng. We have also tested with Fermentas FastDigest MspI (1ul for 500ng) for 2hrs. This also works fine for us! You may also refer to the recent paper by Altuna Akalin et 2012. They explained a similar kind of observation.

**fkrueger** · 07-16-2012, 04:22 AM

We have just released Bismark version 0.7.5. This version incorporates changes that were made to the latest release of Bowtie 2 (2.0.0-beta7):

- Trailing read ID segment numbers (e.g. /1,/2 or /3) are now removed internally for Bowtie 2 alignments in paired-end mode as this might have caused no reads to align at all if the segment number was not 1 or 2. As of Bowtie 2 version 2.0.0-beta7 this behavior has been disabled for unpaired reads
- The Bowtie 2 option -M is now deprecated (as of Bowtie 2 version 2.0.0-beta7). What used to be called -M mode is still the default mode, but adjusting the -M setting is deprecated. The options -D and -R should be used to adjust the effort expended to find valid alignments
- Changed the default seed mismatch parameter (controlled by -n) to 1 (down from 2). This increases alignment speed noticably and typically produces very similar results for good quality read data
- Fixed a bug where the chromosomal sequence could not be extracted for very short genomic sequences for alignments with Bowtie 2
- The methylation extractor and the Bismark alignment output deduplication script do now read both raw and gzipped (.gz) Bismark mapping files

Bismark is available for download from http://www.bioinformatics.babraham.a...jects/bismark/.

**yuggoth** · 07-25-2012, 11:10 PM

Running the Bismark (bowtie2):
I got the following SAM file.
read1 0 chrPt 61863 255 16M1I33M * 0 0 GGTTTTATAAATGGTATTTTTTTGATATTGTATTTGAAGTAGTTGTTAAA FFFFHHGHHJJJJJCHHIJJJJJJIJJJJJIJJJJJJJJHIIIJJJJJIJ NM:i:11 XX:Z:4CC1C3C4NC2CC5C11C10 XM:Z:....hh.h...z.....h..hx.....x...........x.......... XR:Z:CT XG:Z:CT
and running the Bismark methylation_extractor with this SAM file.
I got the following results.
chrPt 61883 h and chrPt 61884 x
But the chromosome position is different from my expectation.
chrPt 61882 h and chrPt 61883 x

61863 |||||||0||||5|||-|0||||5||||0||||5||||0||||5||||0|
61863 GGTTCCACAAACGGTA-CTTCCTGATACTGTATTTGAAGCAGTTGTTAAA reference
61863 GGTTTTATAAATGGTATTTTTTTGATATTGTATTTGAAGTAGTTGTTAAA read
61863 ....hh.h...z.....h..hx.....x...........x..........

Can not the methylation_extractor treat an alignment including in/del?

**dpryan** · 07-26-2012, 01:32 AM

Oddly, it appears that the answer is no, it can't. Looking at the copy of the methylation_extractor that I have, the CIGAR string isn't passed into, for example, the print_individual_C_methylation_states_paired_end_files function. So the address of the methylation calls is set to start+index, which will be off for reads containing indels. This is easy enough to fix by first parsing the CIGAR string and creating a nucleotide position array from it.

@Felix: As part of the methylation extractor I wrote to deal with my non-standard data I included the ability to deal with this issue (at least in the circumstances present in my data). I'd be happy to send you some code if it's helpful (though I wrote it in C, so it's probably easier to just write it from scratch in perl).

**fkrueger** · 07-26-2012, 02:05 AM

This is indeed a shortcoming of the methylation extractor which I have simply not thought about... I'll try to fix this as soon as possible. In the meantime, could someone please send me a few lines of Bismark (Bowtie2) alignment output with and without indels via email (just 10 lines will do)? I could make something up myself but I am currently not at work so it would be a lot easier.

@Devon: I wouldn't mind taking a look at your code, but maybe you are right and it's quicker to just write it quickly (it's too hot to go outside anyway...).

Thanks,
Felix

**dpryan** · 07-26-2012, 04:19 AM

Check your inbox, I sent the source code as well as some example aligned reads.

Best,
Devon

**fkrueger** · 07-26-2012, 06:23 AM

Thanks for that, I am on it already. Cheers, Felix

**drdna** · 07-26-2012, 12:07 PM

Dear Felix,

My colleague and I have been trying to run the bismark program and it has been driving us up the wall! Everything appears to proceed just fine up to reading in the fastq file (for example, it finds the two "preliminary" alignments) but then it seems to skip the actual "bismarking" and proceeds immediately to produce an empty results report:

Reading in the sequence file 010CA_KUEY_108_trim230.fastq
Processed 1000000 sequences so far
Processed 2000000 sequences so far
Processed 2115193 sequences in total

Successfully deleted the temporary file 010CA_KUEY_108_trim230.fastq_C_to_T.fastq

Final Alignment report
======================
Sequences analysed in total: 2115193
Number of alignments with a unique best hit from the different alignments: 0
Mapping efficiency: 0.0%

Sequences with no alignments under any condition: 2115193
Sequences did not map uniquely: 0
Sequences which were discarded because genomic sequence could not be extracted: 0

Number of sequences with unique best (first) alignment came from the bowtie output:
CT/CT: 0 ((converted) top strand)
CT/GA: 0 ((converted) bottom strand)
GA/CT: 0 (complementary to (converted) top strand)
GA/GA: 0 (complementary to (converted) bottom strand)

Number of alignments to (merely theoretical) complementary strands being rejected in total: 0

Final Cytosine Methylation Report
=================================
Total number of C's analysed: 0

Total methylated C's in CpG context: 0
Total methylated C's in CHG context: 0
Total methylated C's in CHH context: 0

Total C to T conversions in CpG context: 0
Total C to T conversions in CHG context: 0
Total C to T conversions in CHH context: 0

Can't determine percentage of methylated Cs in CpG context if value was 0
Can't determine percentage of methylated Cs in CHG context if value was 0
Can't determine percentage of methylated Cs in CHH context if value was 0

Not sure what we are doing wrong but the folowing is our command line:

bismark_v0.7.5/bismark -n 1 -l 20 --bowtie2 --path_to_bowtie /usr/local/bin AGTC_CLIENTS/CHEN/Rattus 010CA_KUEY_108_trim230.fastq

I'd appreciate any insights you might have.

Thanks.

**fkrueger** · 07-26-2012, 12:18 PM

Hi drdna,

This looks indeed like the alignment part is being skipped completely, and it is almost certainly caused by the read ID of your reads (e.g. trailing /3 or so).

Could you send me a few lines (10 will do) of your FastQ file via mail so I can take a look at it tomorrow?

**fkrueger** · 07-26-2012, 12:36 PM

Just as another thought, have you upgraded to the latest version of Bowtie 2 (beta7)? This might fix the problem as the authors have changed the way read IDs are handled for single-end files...

**fkrueger** · 07-30-2012, 08:10 AM

Originally posted by yuggoth View Post

Running the Bismark (bowtie2):
I got the following SAM file.
read1 0 chrPt 61863 255 16M1I33M * 0 0 GGTTTTATAAATGGTATTTTTTTGATATTGTATTTGAAGTAGTTGTTAAA FFFFHHGHHJJJJJCHHIJJJJJJIJJJJJIJJJJJJJJHIIIJJJJJIJ NM:i:11 XX:Z:4CC1C3C4NC2CC5C11C10 XM:Z:....hh.h...z.....h..hx.....x...........x.......... XR:Z:CT XG:Z:CT
and running the Bismark methylation_extractor with this SAM file.
I got the following results.
chrPt 61883 h and chrPt 61884 x
But the chromosome position is different from my expectation.
chrPt 61882 h and chrPt 61883 x

61863 |||||||0||||5|||-|0||||5||||0||||5||||0||||5||||0|
61863 GGTTCCACAAACGGTA-CTTCCTGATACTGTATTTGAAGCAGTTGTTAAA reference
61863 GGTTTTATAAATGGTATTTTTTTGATATTGTATTTGAAGTAGTTGTTAAA read
61863 ....hh.h...z.....h..hx.....x...........x..........

Can not the methylation_extractor treat an alignment including in/del?

Dear Yuggoth,

I have now spent quite some time adapting the methylation extractor to handle InDels correctly. I have done some testing here already where it seemed to work as expected but it is possible that I have missed something. Could you run the attached version on your file and see if it fixes your problems? If it does I'll release a new version as soon as possible.

Best,
Felix

Attached Files

methylation_extractor_indel_aware.pl (111.5 KB, 30 views)

**fkrueger** · 07-31-2012, 07:26 AM

We have just released a new version of Bismark (v0.7.6). This version mainly fixes the way in which SAM files (both single and paired-end) are handled in the methylation extractor because reads containing insertions or deletion would result in slighlty offset methylation calls. Reads containing InDels, which may be generated by Bismark using Bowtie 2, are now handled as intended. Bismark users employing Bowtie 2 for alignments are strongly encouraged to upgrade to this version.

We have also changed the way in which the methylation extractor identifies the read and genome conversion flags in SAM output. This might become relevant if the Bismark SAM mapping output was compressed/decompressed with CRAM or Goby at some point, since these tools may change the order of optional tags in a SAM entry. Thanks to Z. Zeno for pointing this out and contributing a patch.

Bismark is available from here: http://www.bioinformatics.babraham.a...jects/bismark/ (you might have to force a cache update with Shift + refresh).

**shawpa** · 08-13-2012, 05:40 AM

error with Prinseq output file to bismark?

I was wondering if anyone has used prinseq to trim fastq files then tried to put those fastq files into bismark for alignment and had any issues. I generated some trimmed fastq files and when I try and put those into bismark I keep getting the error of "no such file or directory" for the fastq file. The name tab completes in my command line so I know it is there and the name is correct. The fastq files that I generated can be uploaded into fastqc just fine. Just wondering if it is a formatting issue that I need to adjust. Any suggestions as how I can move forward would be great.

**shawpa** · 08-21-2012, 04:55 AM

temp directory for bismarktobedgraph script

I am using the bismarktobedgraph script from the website. Because it doesn't have a temp directory option I am running it from the directory that has the space. It seems that it is running out of memory still and my standard error file says "sort: write failed: /tmp/sortMpgIOA: No space left on device". I am not sure why it is still writing to this directory. Is there something in the script that needs to be adjusted?

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News