Oh, wait, is says "truncated" so presumably the problem is at the end of the file. Can you run "tail" on the file and post the last two lines?
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
HISEQHI:525:HCYWJADXX:2:2213:8924:55099 256 * 942639 0 43M * 0 0 CAAAGGGCTGAGAAGCACTTGAAAAAATGTTCAACATCCTTAA CCCFFFFFHHHHHJJJJJJJJJJJJJJJJIIJJJJJJJJJJJJ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:43 YT:Z:UU NH:i:20 CC:Z:chrX CP:i:128687718 XS:A:+ HI:i:17
HISEQLN:122:HCW3JADXX:2:2207:7052:25724 272 * 944767 0 43M * 0 0 TACTTACATATAATAAATAAATAAATAAATATTTTTTAAAAAA IFIIGJIJIIIGGIJIJIGFFCIHGIGIIHDHFFHFFDDF@@@ AS:i:0 XN:i:0 XM:i:0 XO:i:0 XG:i:0 NM:i:0 MD:Z:43 YT:Z:UU NH:i:11 CC:Z:chr6 CP:i:52981629 XS:A:- HI:i:9
HISEQLN:121:HCYV3ADXX:1:1203:18633:64996 0 * 949324 043M * 0 0 CAGAACCCCTGAAATTGGCAAGATAGACGTCAGTGTTAGCAGA CCCFFFFFHHHHHJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJ AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:5G37 YT:Z:UU NH:i:20 CC:Z:chr6 CP:i:6419658 XS:A:+ HI:i:12
HISEQLN:122:HCW3JADXX:1:1112:13385:80114 272 * 949722 043M * 0 0 GGTGTCCGCTAGTGTCCTGAGGCCTGAGCGAGGGGCTCCTCTC ##A7'?DFD;BD:3GGDDDIHG@EFFEFADB?<7DD::@=1 AS:i:-2 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:11T31 YT:Z:UU NH:i:20 CC:Z:chr6 CP:i:71166409 XS:A:- HI:i:15
Comment
-
Assuming all of the things that look like spaces are actually tabs (sorry, tabs often get replaced by spaces on the console), I don't see anything wrong with the sam file and I don't know what the problem is. It may have something to do with a negative number being detected where a positive number is expected, but I'm just speculating.
You could try Picard rather than Samtools, and see if you have better luck. Or, try the most recent version of Samtools, or else v0.1.19. Sometimes there's a problem with a specific version.
Comment
-
Hi,
Sorry to revive this thread, but I have a similar desire to filter based on length and was excited to learn about reformat!
I've run into some issue, but I'm pretty dumb so I'm sure I've just confused something simple.
I've downloaded bbmap and have tried to get reformat to work but I'm not having any luck.
When I try the following:
sh ~/tools/bbmap/reformat.sh in=input.bam out=output.bam minlength=1 maxlength=100
I get the following error message:
Found samtools.
Input is being processed as unpaired
[samopen] SAM header is present: 84 sequences.
java.lang.AssertionError
at stream.SamLine.toShortMatch(SamLine.java:1257)
at stream.SamLine.toRead(SamLine.java:1879)
at stream.SamLine.toRead(SamLine.java:1749)
at stream.SamReadInputStream.toReadList(SamReadInputStream.java:119)
at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:90)
at stream.SamReadInputStream.nextList(SamReadInputStream.java:74)
at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:656)
at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:635)
Input: 110600 reads 16384426 bases
Short Read Discards: 110034 reads (99.49%) 16340390 bases (99.73%)
Output: 566 reads (0.51%) 44036 bases (0.27%)
Time: 1.287 seconds.
Reads Processed: 110k 85.94k reads/sec
Bases Processed: 16384k 12.73m bases/sec
Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
at jgi.ReformatReads.process(ReformatReads.java:1098)
at jgi.ReformatReads.main(ReformatReads.java:43)
I'm still really excited by the potential of reformat, any advice would be greatly appreciated.
Comment
-
Wow! Thanks for the quick reply GenoMax!
Sadly that doesn't alleviate my issue:
Exception in thread "main" java.lang.RuntimeException: ReformatReads terminated in an error state; the output may be corrupt.
at jgi.ReformatReads.process(ReformatReads.java:1098)
at jgi.ReformatReads.main(ReformatReads.java:43)
Comment
-
It appears that there was some problem processing the line's MD tag. In this case, since you are just filtering based on length, that should not matter and you can just add the flag "-da" to ignore the error, which does not affect the output in this case. I added code to print out the problematic line when that happens in the future. If it's a very small bam file you could email it to me so I can see what the problem is.
Comment
-
Brian,
Would it be possible to use reformat.sh to filter on the fragment length rather than the read length? I'm looking for a way to split paired-end ATAC-Seq .sam files into "nucleosome-free" and "nucleosome-bound" regions based on size of the fragment, and the proposed solutions I've found elsewhere have been a dead end. Thanks!
Comment
Latest Articles
Collapse
-
by seqadmin
While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...-
Channel: Articles
06-06-2024, 07:15 AM -
-
by seqadmin
Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.
Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...-
Channel: Articles
05-24-2024, 01:16 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:58 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:58 AM
|
||
Started by seqadmin, 06-06-2024, 08:18 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
06-06-2024, 08:18 AM
|
||
Started by seqadmin, 06-06-2024, 08:04 AM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
06-06-2024, 08:04 AM
|
||
Started by seqadmin, 06-03-2024, 06:55 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
06-03-2024, 06:55 AM
|
Comment