Yes, still interested!
Was just about to start doing this myself but I thought I'd check this forum again...actually, would really like to use BFAST for solid but just don't have the computing resources available to me...
anyway, thanks again...
Announcement
Collapse
No announcement yet.
X
-
Just in case anyone's still interested:
I wrote a small (rather dirty) python script to integrate those tags. It is slow and relies on the fact that bwa solid2fastq, bwa aln and bwa samse step doesn't change the order of the reads. Anyway it might be of interest to someone...
Code:#! /usr/bin/python #ADD CS and CQ tags from original CSfasta and csqual file import sys #try getting file names from comand line try: SAMfile = sys.argv[1] csfastafile = sys.argv[2] qualfile = sys.argv[3] outputfile=sys.argv[4] except: print ("Usage: ./add_CSCQ.py <input SAM> <input csfasta> <input csqual> <output SAM>") sys.exit() #try open files specified in command line try: SAM = open(SAMfile) csfasta = open (csfastafile) qual = open (qualfile) output = open (outputfile, "w") except: print ("Couldn't open SAMfile") sys.exit() #reading the first lines of the three files SAMline = SAM.readline() cs = csfasta.readline() #iterate till no comment startcs=cs[0:1] while startcs=='#': cs=csfasta.readline() startcs=cs[0:1] cq = qual.readline() startcq=cq[0:1] while startcq=='#': cq=qual.readline() startcq=cq[0:1] count = 0 #iterate through all the files and add CS / CQ tags in the reads #assuming solid2fastq didn't change the order of the reads while SAMline: info=SAMline.split() start=SAMline[0:1] alignment=SAMline[:-1] if (((count % 100000) == 0) and (count != 0)): print count, "alignments processed" #print out header section if start == '@': output.write(alignment+'\n') #print alignment section and add CS and CQ tags else: #read csfasta file until no comments cs=csfasta.readline() cq=qual.readline() cs="\tCS:Z:"+cs cs=cs[:-1] #encode quals to sanger quals cq=cq[:-1] intquals=cq.split() asciiquals="" for quality in intquals: quality = int (quality) ascii=chr(quality+33) asciiquals=asciiquals+ascii cq="\tCQ:Z:"+asciiquals #write alignment to file output.write(alignment+cs+cq+'\n') cs=csfasta.readline() cq=qual.readline() SAMline = SAM.readline() count = count + 1 SAM.close() csfasta.close() qual.close()
Leave a comment:
-
bfast+bwa only replaces the match step. Postprocess is the same as in traditional bfast. You will have those tags.
Leave a comment:
-
Thanks Kevin! I will take a look.
BTW, did you ever try the BWA alignment from within BFAST (bfast+bwa)? That would seem to solve the problem -- assuming it includes the CS and CQ tags. I will be trying that soon.
Todd
Leave a comment:
-
Hi Todd,
Unfortunately, I decided to switch mapper for SOLID reads in the end.
I am trying to find the fastq indexer program that I intended to use for this with scripts to post process the bam
but I can only find this http://ivory.idyll.org/blog/mar-10/s...ving-sequences
Hope it helps!
Leave a comment:
-
Solution?
Hi Kevin,
Did you work out a program to integrate the color space data into the SAM files for BWA alignments? If so, could you share? I am running into the same issue, and would like avoid re-implementation if possible.
Thanks,
Todd
Leave a comment:
-
Originally posted by KevinLam View PostThanks Nils,
I have an example of CS CQ tags
VAB_S1332068_1358_1351 131 1 227 255 25M = 1373 1171 CTAACCCCTAACCCTAACCCTAAAC !A@B?@@@CAC?@?AAC?
??B@AA! RG:Z:TG133 CS:Z:G3230100023010023010023001 CQ:Z:<<<<<<<<<<;:<;* MD:Z:25 OQ:Z:!@@@@@@@@@@@@@@@@@@@@@@@!
So am I correct in saying that CS and CQ are essentially the original csfasta and qual line?
or at least bfast outputs it in this way?
additionally
but if I were to do the same I might have problems as bwa does trimming?
Leave a comment:
-
Thanks Nils,
I have an example of CS CQ tags
VAB_S1332068_1358_1351 131 1 227 255 25M = 1373 1171 CTAACCCCTAACCCTAACCCTAAAC !A@B?@@@CAC?@?AAC?
??B@AA! RG:Z:TG133 CS:Z:G3230100023010023010023001 CQ:Z:<<<<<<<<<<;:<;* MD:Z:25 OQ:Z:!@@@@@@@@@@@@@@@@@@@@@@@!
So am I correct in saying that CS and CQ are essentially the original csfasta and qual line?
or at least bfast outputs it in this way?
additionally
but if I were to do the same I might have problems as bwa does trimming?
Leave a comment:
-
Originally posted by KevinLam View PostI see...
So far, I only have these info. so presumably I have to reverse the order of the original CS and CQ if the read is mapped in another direction?
the trimming bit might indeed be a problem though even if trying to construct a query db of the original csfasta and qual files isn't computationally intensive.
Color read sequence on the same strand as the reference 4
CS Z
Color read quality on the same strand as the reference; encoded in the same way as <QUAL> 4
CQ Z
On a raw SOLiD read, the first nucleotide is the primer base and the first color is the one between the primer base
and the first nucleotide from the sample being sequenced. The primer base and the first color must be present in CS.
Leave a comment:
-
I see...
So far, I only have these info. so presumably I have to reverse the order of the original CS and CQ if the read is mapped in another direction?
the trimming bit might indeed be a problem though even if trying to construct a query db of the original csfasta and qual files isn't computationally intensive.
Color read sequence on the same strand as the reference 4
CS Z
Color read quality on the same strand as the reference; encoded in the same way as <QUAL> 4
CQ Z
On a raw SOLiD read, the first nucleotide is the primer base and the first color is the one between the primer base
and the first nucleotide from the sample being sequenced. The primer base and the first color must be present in CS.
Leave a comment:
-
Originally posted by KevinLam View PostHi Nils, can you share how bfast generates the CS tag?
I think I probably will have to write a post alignment script to add this tag
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
...-
Channel: Articles
11-27-2023, 01:15 PM -
-
by seqadmin
Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...-
Channel: Articles
11-09-2023, 07:02 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:37 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
Today, 07:37 AM
|
||
Started by seqadmin, Yesterday, 08:23 AM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:23 AM
|
||
Started by seqadmin, 12-01-2023, 09:55 AM
|
0 responses
22 views
0 likes
|
Last Post
by seqadmin
12-01-2023, 09:55 AM
|
||
Started by seqadmin, 11-30-2023, 10:48 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
11-30-2023, 10:48 AM
|
Leave a comment: