Seqanswers Leaderboard Ad
Collapse
X
-
Cool, glad that's working for you. I totally forgot the -h in my example and only added the header stuff after the fact, which is why I didn't notice the missing continue :P Glad you were able to fix that properly!
-
-
Thanks a LOT Devon,
I added -h to the upstream samtools view cmd to forward the sam header and 'continue' to the code to process header lines and directly go to the next loop
HTML Code:for read in f : #deal with the header if(read[0] == '@') : of.write("%s" % read) continue
HTML Code:samtools view -h <name_sorted.bam> | \ bam_re-pair.py | \ samtools view -bSo <name_sorted.filtered.bam> -
Thanks you really for this code, it helped me a lot
S
### Perl translation of Devon python code
HTML Code:#!/usr/bin/perl -w # filter unpaired reads from a - read-name sorted - BAM file # bam_re-pair.pl # author: Stephane Plaisance (translated from python version by Devon Ryan # http://seqanswers.com/forums/showthread.php?p=118936#post118936 # usage: # samtools view -h <name_sorted.bam> | \ # bam_re-pair.pl | \ # samtools view -bSo <name_sorted.filtered.bam> - use warnings; use strict; # variables my $read = ""; my $read1 = "none"; my $read2 = "none"; my $name1 = "none"; my $name2 = "none"; my ($ln,$ok,$no)=(0,0,0); while (my $read = <>) { # forward header lines if ($read =~ /^@/){ print STDOUT $read; next; } # process data $ln++; if( $name1 eq "none" ){ $read1 = $read; $name1 = (split("\t", $read1))[0]; } else { $name2 = (split("\t", $read))[0]; if( $name1 eq $name2 ){ # is paired $ok++; print STDOUT sprintf("%s%s", $read1, $read); $read1 = "none"; $name1 = "none"; } else { # is not paired $no++; $read1 = $read; $name1 = (split("\t", $read1))[0]; } } } # report counts print STDERR sprintf("\n########################\n# Results\n# processed:\t%8d\n# passed:\t%8d\n# rejected\t%8d\n", $ln, $ok, $no); exit 0;
Leave a comment:
-
-
Fair enough. The follow isn't perl, which I generally loathe, but it's a simple python solution:
Code:#!/usr/bin/env python import sys f = sys.stdin of = sys.stdout read1 = None name1 = None for read in f : #deal with the header if(read[0] == '@') : of.write("%s" % read) if(name1 == None) : read1 = read name1 = read1.split("\t")[0] else : name2 = read.split("\t")[0] if(name1 == name2) : of.write("%s%s" % (read1, read)) read1 = None name1 = None else : read1 = read name1 = read1.split("\t")[0]
Code:samtools view name_sorted.bam | blah.py | samtools view -bSo name_sorted.filtered.bam -
Edit: A perl solution could be similar. You'd use while(<>) for the loop and then probably just chomp() that to split things, though perhaps there are more appropriate perl methods than those. The general work-flow could be the same, though.Last edited by dpryan; 10-16-2013, 02:14 AM.
Leave a comment:
-
-
I would like to learn how to clean that file in order to be able to redo such operation with future data having similar issues.
Thanks for the links anyway.
Leave a comment:
-
-
cleaning partial PE sam data
Hello there,
I obtained PE data from Illumina (chr21 subset of NA18507 - ftp://webdata:[email protected]..._100_chr21.bam).
After a lot of misery and systematic LENIENT use of picard I could manage to use the data to extract fastQ paired reads from it and remap them to another reference build.
BUT
I discovered (among many other serious SAM compliancy problems) that not all reads are present in that file and many pairs have lost one end (which probably did not map on chr21 and was filtered out uncleanly!)
My question is how to clean such SAM/BAM where FLAGS indicate paired reads but one read of the pair is not present anymore.
- I cannot use the SAM flags to do it because they are erroneous
- I could not fix the flags to reflect the true paired status
Below is the head of the original name-sorted reads showing the absence of one of the 'EAS51_0210:7:33:5109:13959' reads (many 1000's like that)
Thanks for your suggestions on which command to use and how to eliminate these reads to obtain a fully paired file on which bam2fastq will run smoothly. Preferentially using picard and not with some fancy perl code keeping only '*' in the 7th column.
Thanks a lot for your lights,
Stephane
Code:EAS51_0210:3:6:3797:7459 165 chr21 9719702 255 * * 0 0 AACCTTTGTTTGGATGGAGCAGTTTGTAAACAATCCTTTTGTAGAATCTGCAAAGGTATATTTCTGAGCCCATTGAGGCCTATGGTGAAATACGAAATAT GGGGGGGGGGGGGFGGFEGGGGEGGGEGGGFDFBGGEFEFGEEGEGFEGGEGEEED?EEEGEEGBEBDGEEEEED=DCCCEBEEEEEEEAAC@DDB:CCC H0:i:0 H1:i:0 H2:i:2 SM:i:-1 AS:i:0 EAS51_0210:3:6:3797:7459 89 chr21 9719702 73 100M * 0 0 ATATTTGGAGCGCTTTGAGGCCTATGGTAAAAAAGGAAATACCATCACATAAAAATTCGATGGAAGAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTC DDC@BEEEEEEGEEBFGEGG@EEDDBEEGEGGGGFFFFGEECGFGGEGGGGGGGGDFGGGFFGGGGFGFEGGGFGGGGGGBGGEGGFGGGGGGGGGGGGG XD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN34 SM:i:73 AS:i:0 EAS51_0210:7:33:5109:13959 145 chr21 9719707 254 100M chrY 10653706 0 TGGAGCGCTTTGAGGCCTATGGTAAAAAAGGAAATACCATCACATAAAAATTCGATGGAAGAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATCTC FEGDGEGEEEEGFEGEEGEEDFEGGGGGGEFGFGFAFFFEGEGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGFGGGGGGGGGGGGGGGGGGGGGGGG XD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN36T2 SM:i:461 AS:i:0 EAS25_0078:8:23:14907:11377 165 chr21 9719708 255 * * 0 0 CTTTTGTAGAATCTGCAAAGGTATATTTCTGAGCCCATTGAGGCCTATGGTGAAATACGAAATATCTTCCCATAAAAACTAGACAGAAGGTTTCTAAGAA GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGEGEFGGGGGGGGGEGGGGGGGGGGFGFGDGGGGGFDFBFGCEBFFFEGFF H0:i:1 H1:i:6 H2:i:60 SM:i:-1 AS:i:0 EAS25_0078:8:23:14907:11377 89 chr21 9719708 254 100M * 0 0 TGGAGCGCTTTGAGGCCTATGGTAAAAAAGGAAATACCATCACATGAAATTCGATGGAAGAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATCTCA EGEEEEGEBGFGGEEEEEGGDEBGFGGGGGGGGGGAGFGGGEGGGGGGGGGGGGGGGGGGGFGGGGGGGGEEGGGGGGGGGGGGGGGGGGGGGGGGGGGG XD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN36T3 SM:i:461 AS:i:0 EAS25_0078:3:4:1830:9254 101 chr21 9719709 255 * * 0 0 TTGAACCTTTGTTTGGATGGAGCAGTTTGTAAACAATCCTTTTGTAGAATCTGCAAAGGTATATTTTTGAGCCCATTGAGGCCTATGGTGAAATACGAAA GGGGGGGGGGGGGGGGGGFGBGGGGGGGFEGGGGEGGGGGGGGGEGGGGGGEFGGEFGGEFFFFF/&8?@EEECCFGFGGFDFGFEGF?DEEDEFEEFEE H0:i:0 H1:i:0 H2:i:3 SM:i:-1 AS:i:0 EAS25_0078:3:4:1830:9254 153 chr21 9719709 254 100M * 0 0 GGAGCGCTTTGAGGCCTATGGTAAAAAAGGAAATACCATCACATGAAATTCGATGGAAGAATTCTGAGAAACTTCTTTGTGAGGGTTGGATTCATCTCAC EEBE?EEEEEEEEGEEBEEEBGGGEGGGEFAFFECEEE=EDGGFGGGGDGFFGGGGGGGGGEEGGGGDGGGEGFGFGFGGGGGGGGGDGGGEGGFGGGGG XD:Z:NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN36T4 SM:i:461 AS:i:0
Tags: None
Latest Articles
Collapse
-
by seqadmin
The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...-
Channel: Articles
03-24-2025, 11:48 AM -
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-20-2025, 05:03 AM
|
0 responses
41 views
0 reactions
|
Last Post
by seqadmin
03-20-2025, 05:03 AM
|
||
Started by seqadmin, 03-19-2025, 07:27 AM
|
0 responses
49 views
0 reactions
|
Last Post
by seqadmin
03-19-2025, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
36 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
191 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Leave a comment: