Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • aldino
    replied
    Yes, still interested!

    Was just about to start doing this myself but I thought I'd check this forum again...actually, would really like to use BFAST for solid but just don't have the computing resources available to me...

    anyway, thanks again...

    Leave a comment:


  • ulz_peter
    replied
    Yes. Works nicely with GATK (at least in my case).

    Leave a comment:


  • KevinLam
    replied
    Nice! Have u tried it with gatk?

    Leave a comment:


  • ulz_peter
    replied
    Just in case anyone's still interested:
    I wrote a small (rather dirty) python script to integrate those tags. It is slow and relies on the fact that bwa solid2fastq, bwa aln and bwa samse step doesn't change the order of the reads. Anyway it might be of interest to someone...


    Code:
    #! /usr/bin/python
    
    #ADD CS and CQ tags from original CSfasta and csqual file
    
    import sys
    
    #try getting file names from comand line
    try:
      SAMfile = sys.argv[1]
      csfastafile = sys.argv[2]
      qualfile = sys.argv[3]
      outputfile=sys.argv[4]
    except: 
      print ("Usage: ./add_CSCQ.py <input SAM> <input csfasta> <input csqual> <output SAM>")
      sys.exit()
    
    #try open files specified in command line
    try:
       SAM = open(SAMfile)
       csfasta = open (csfastafile)
       qual = open (qualfile)
       output = open (outputfile, "w")
    except:
      print ("Couldn't open SAMfile")
      sys.exit()
    
    #reading the first lines of the three files
    SAMline = SAM.readline()
    cs = csfasta.readline()
    
    #iterate till no comment
    startcs=cs[0:1]
    while startcs=='#':
      cs=csfasta.readline()
      startcs=cs[0:1]
      
    cq = qual.readline()
    startcq=cq[0:1]
    while startcq=='#':
      cq=qual.readline()
      startcq=cq[0:1]
    
    count = 0
    
    #iterate through all the files and add CS / CQ tags in the reads
    #assuming solid2fastq didn't change the order of the reads
    while SAMline:
      info=SAMline.split()
      start=SAMline[0:1]
      alignment=SAMline[:-1]
      
      if (((count % 100000) == 0) and (count != 0)):
        print count, "alignments processed"
    
    #print out header section
      if start == '@':    
        output.write(alignment+'\n')
    
    #print alignment section and add CS and CQ tags
      else:
        #read csfasta file until no comments
        cs=csfasta.readline()
        cq=qual.readline()
        cs="\tCS:Z:"+cs
        cs=cs[:-1]
        
        #encode quals to sanger quals
        cq=cq[:-1]
        intquals=cq.split()
        asciiquals=""
        for quality in intquals:
           quality = int (quality)
           ascii=chr(quality+33)
           asciiquals=asciiquals+ascii
        cq="\tCQ:Z:"+asciiquals
        
        #write alignment to file
        output.write(alignment+cs+cq+'\n')
        cs=csfasta.readline()
        cq=qual.readline()
      SAMline = SAM.readline()
      count = count + 1
      
    SAM.close()
    csfasta.close()
    qual.close()

    Leave a comment:


  • drio
    replied
    bfast+bwa only replaces the match step. Postprocess is the same as in traditional bfast. You will have those tags.

    Leave a comment:


  • Todd Scheetz
    replied
    Thanks Kevin! I will take a look.

    BTW, did you ever try the BWA alignment from within BFAST (bfast+bwa)? That would seem to solve the problem -- assuming it includes the CS and CQ tags. I will be trying that soon.

    Todd

    Leave a comment:


  • KevinLam
    replied
    Ah found it!
    This is a long over due tool for those trying to do non-typical analysis with your reads. Finally you can index and compress your NGS reads...

    Leave a comment:


  • KevinLam
    replied
    Hi Todd,
    Unfortunately, I decided to switch mapper for SOLID reads in the end.
    I am trying to find the fastq indexer program that I intended to use for this with scripts to post process the bam
    but I can only find this http://ivory.idyll.org/blog/mar-10/s...ving-sequences

    Hope it helps!

    Leave a comment:


  • Todd Scheetz
    replied
    Solution?

    Hi Kevin,

    Did you work out a program to integrate the color space data into the SAM files for BWA alignments? If so, could you share? I am running into the same issue, and would like avoid re-implementation if possible.

    Thanks,
    Todd

    Leave a comment:


  • nilshomer
    replied
    Originally posted by KevinLam View Post
    Thanks Nils,
    I have an example of CS CQ tags

    VAB_S1332068_1358_1351 131 1 227 255 25M = 1373 1171 CTAACCCCTAACCCTAACCCTAAAC !A@B?@@@CAC?@?AAC?
    ??B@AA! RG:Z:TG133 CS:Z:G3230100023010023010023001 CQ:Z:<<<<<<<<<<;:<;* MD:Z:25 OQ:Z:!@@@@@@@@@@@@@@@@@@@@@@@!

    So am I correct in saying that CS and CQ are essentially the original csfasta and qual line?
    or at least bfast outputs it in this way?

    additionally
    but if I were to do the same I might have problems as bwa does trimming?
    They should be the qualities from the csfasta/qual lines. I don't think it has to be matched to the trimming. When it means "original", it means unaltered "original" values IMHO.

    Leave a comment:


  • KevinLam
    replied
    Thanks Nils,
    I have an example of CS CQ tags

    VAB_S1332068_1358_1351 131 1 227 255 25M = 1373 1171 CTAACCCCTAACCCTAACCCTAAAC !A@B?@@@CAC?@?AAC?
    ??B@AA! RG:Z:TG133 CS:Z:G3230100023010023010023001 CQ:Z:<<<<<<<<<<;:<;* MD:Z:25 OQ:Z:!@@@@@@@@@@@@@@@@@@@@@@@!

    So am I correct in saying that CS and CQ are essentially the original csfasta and qual line?
    or at least bfast outputs it in this way?

    additionally
    but if I were to do the same I might have problems as bwa does trimming?

    Leave a comment:


  • nilshomer
    replied
    Originally posted by KevinLam View Post
    I see...
    So far, I only have these info. so presumably I have to reverse the order of the original CS and CQ if the read is mapped in another direction?

    the trimming bit might indeed be a problem though even if trying to construct a query db of the original csfasta and qual files isn't computationally intensive.


    Color read sequence on the same strand as the reference 4
    CS Z
    Color read quality on the same strand as the reference; encoded in the same way as <QUAL> 4
    CQ Z

    On a raw SOLiD read, the first nucleotide is the primer base and the first color is the one between the primer base
    and the first nucleotide from the sample being sequenced. The primer base and the first color must be present in CS.
    I don't ever match the direction of the CS/CQ tags to the reference, since the reverse (not compliment) is not symmetric. The adapter would then be the last base.

    Leave a comment:


  • KevinLam
    replied
    I see...
    So far, I only have these info. so presumably I have to reverse the order of the original CS and CQ if the read is mapped in another direction?

    the trimming bit might indeed be a problem though even if trying to construct a query db of the original csfasta and qual files isn't computationally intensive.


    Color read sequence on the same strand as the reference 4
    CS Z
    Color read quality on the same strand as the reference; encoded in the same way as <QUAL> 4
    CQ Z

    On a raw SOLiD read, the first nucleotide is the primer base and the first color is the one between the primer base
    and the first nucleotide from the sample being sequenced. The primer base and the first color must be present in CS.

    Leave a comment:


  • nilshomer
    replied
    Originally posted by KevinLam View Post
    Hi Nils, can you share how bfast generates the CS tag?

    I think I probably will have to write a post alignment script to add this tag
    The CSFASTQs I use as input are not "double-encoded" but instead contain the origin color sequence (unadulterated) and color qualities. This allows the CS/CQ tags to filled out easily. BWA "double-encodes" the color sequence and does some trimming too, thus making it impossible to recover the CS/CQ without going back to the original data (CSFASTA and QUAL).

    Leave a comment:


  • KevinLam
    replied
    Originally posted by nilshomer View Post
    It doesn't. Contact the developer if you want this feature in the future.
    Hi Nils, can you share how bfast generates the CS tag?

    I think I probably will have to write a post alignment script to add this tag

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Advanced Methods for the Detection of Infectious Disease
    by seqadmin




    The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
    ...
    11-27-2023, 01:15 PM
  • seqadmin
    Strategies for Investigating the Microbiome
    by seqadmin




    Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
    11-09-2023, 07:02 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:37 AM
0 responses
11 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:23 AM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-01-2023, 09:55 AM
0 responses
22 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-30-2023, 10:48 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Working...
X