That looks like it is working! Thanks a lot.
I did see that flag in the source code but I used "null" instead which was an invalid argument.
My new BAM file is being written to disk now. I will check results and hopefully it makes sense this time.
Thanks again.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Hi Dan
Try adding "VALIDATION_STRINGENCY=LENIENT" to the end of your command line and see if this solves the problem.
Often incorrect flags can be set in the output of an aligner which downstream tools can take exception to.
Alternatively you can edit your BAM file to correct the record which is throwing up this error.
Hope this helps.
Leave a comment:
-
Picard - MakeDuplicates (remove pcr duplicates)
There seems to be only a few options for removing PCR duplicates from Illumina fastq data and/or alignment data. I have use FASTX for (fastx_collapse) for removing duplicates in fastq files but this takes a long time when running against human data of ~10Gb per lane.
The other option I have tried is SAMTOOLS (rmdup) but the documentation admits that it doesn't work well (or at all) with single end data. I have also noticed strange results when using this on my own data sets. (e.g. reads removed that were clearly not duplicates).
The alternative posted on the SAMTOOLS site is to use PICARD. Implementation is straightforward and uses percompiled jar files with well documented options.
I am unfortunately getting a strange Exception thrown when I run the MakeDuplicates jar file. It seems it is recognizing something in my data that wants to call it paired end data. However, my data are single end reads.
Does anyone have any experience with using this command in picard? Here is my call and the error being thrown.
Call:
Code:java -jar /home/bornmand/tools/picard/MarkDuplicates.jar INPUT=101119_first4lanes_bwa.bam OUTPUT=101119_first4lanes_bwaNoD up.bam METRICS_FILE=101119_first4lanes_duplicateINFO.txt REMOVE_DUPLICATES=true ASSUME_SORTED=true
Code:INFO 2011-03-16 08:09:46 MarkDuplicates Start of doWork freeMemory: 62375408; totalMemory: 63111168; maxMemory: 935854080 INFO 2011-03-16 08:09:46 MarkDuplicates Reading input file and constructing read end information. INFO 2011-03-16 08:09:46 MarkDuplicates Will retain up to 3713706 data points before spilling to disk. INFO 2011-03-16 08:09:46 MarkDuplicates Assuming input is coordinate sorted. [Wed Mar 16 08:09:46 EDT 2011] net.sf.picard.sam.MarkDuplicates done. Runtime.totalMemory()=92864512 Exception in thread "main" java.lang.RuntimeException: SAM validation error: ERROR: Record 1, Read name 4:20:14143:2730:Y, MRNM should not be set for unpaired read. at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:334) at net.sf.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:449) at net.sf.samtools.BAMFileReader$BAMFileIterator.<init>(BAMFileReader.java:413) at net.sf.samtools.BAMFileReader$BAMFileIterator.<init>(BAMFileReader.java:403) at net.sf.samtools.BAMFileReader.getIterator(BAMFileReader.java:206) at net.sf.samtools.SAMFileReader.iterator(SAMFileReader.java:288) at net.sf.samtools.SAMFileReader.iterator(SAMFileReader.java:37) at net.sf.picard.sam.MarkDuplicates.buildSortedReadEndLists(MarkDuplicates.java:271) at net.sf.picard.sam.MarkDuplicates.doWork(MarkDuplicates.java:113) at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:156) at net.sf.picard.sam.MarkDuplicates.main(MarkDuplicates.java:97)
Thanks.
DanTags: None
Latest Articles
Collapse
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
-
by seqadmin
Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...-
Channel: Articles
03-22-2024, 06:39 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
32 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
||
Started by seqadmin, 04-10-2024, 09:21 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 09:21 AM
|
||
Started by seqadmin, 04-04-2024, 09:00 AM
|
0 responses
52 views
0 likes
|
Last Post
by seqadmin
04-04-2024, 09:00 AM
|
Leave a comment: