Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • individual position stats for pileups/contigs of SOLiD reads

    Hello,
    I'm trying to get statistics on an individual position basis in a pileup/contig of SOLiD reads mapped to a reference sequence (e.g. what percentage of reads have nucleotide G at position 236 of the reference sequence).

    I'm starting with a SOLiD .csfasta.ma file (output of SOLiD's mapping program). Is there a script that can take this file and return a big table/array in base-space in which the columns are the individual positions of the padded reference sequence and the rows are the individual reads (table entries are the nucleotides of the reads at each position, or blank/padding)?

    I know there are viewers that can produce graphs etc. with some of this information, but I actually need the full table itself which I could then process with R, Matlab etc.

    If there is no such script, are there at least scripts that can convert the color-space .csfasta.ma alignment file into some base-space format like .ace or SAM, which I could then parse myself to create the table I need.

    Thanks, g_solid

  • #2
    Originally posted by g_solid View Post
    Hello,
    I'm starting with a SOLiD .csfasta.ma file (output of SOLiD's mapping program). Is there a script that can take this file and return a big table/array in base-space in which the columns are the individual positions of the padded reference sequence and the rows are the individual reads (table entries are the nucleotides of the reads at each position, or blank/padding)?
    You will not be able to, easily, get your information from the .csfasta.ma file. What you need to do is to continue on past the mapping step into the SNP calling step. That will produce .gff files with the information you need.

    The problem with parsing the .csfasta.ma file directly is that it does not contain information on which part of the reads do not match to the reference -- e.g., are SNPs, sequencing errors, indels, etc. All the .csfasta.ma contains is (a) the read in colorspace, (b) position number(s) on the reference the read matches, (c) how many mismatches the read has at that position. While it is theoretically possible for you to take the .csfasta.ma file and determine what are the bases in read via your own SNP calling and error correction routine, it seems easier just to use the SOLiD SNP calling program.

    Comment


    • #3
      AB has a tool (http://solidsoftwaretools.com/gf/project/sam/) to convert the SOLiD alignment to SAM. It will convert color reads to bases in this process. You may use samtools to get pileup. I have never used that, though.

      Comment


      • #4
        lh3 points out a tool that I have not used before -- the conversion of GFF to SAM format. It is a newer tool. As far as I can tell you will need to use the MaToGff.sh tool to convert from the csfasta.ma file to GFF. Then onto SAM.

        Therefore -- unlike what I implied in my previous post -- it does appear that you can go directly from matching to SAM without having to do the SNP calling. I have not tried this so let us know if it works for you.

        Comment


        • #5
          SOLid match files to SAM

          Hi all,
          Thanks for your suggestions. The conversion of SOLiD match files to SAM format worked with the tools provided at the SOLiD website (http://solidsoftwaretools.com/gf/):
          - Match to GFF conversion with the SOLiD™ System GFF Conversion Tool, using modules MaToGff.sh and AnnotateChanges.sh. AnnotateChanges is needed to mark read sequence mismatches with the reference and translate sequence to base-space.
          - GFF to SAM conversion with the SOLiD™ BaseQV Tool (GffToSam module). To make GffToSam compile correctly, I had to include a library (#include <cstdlib>), as was described in an earlier post by hingamp here on SEQanswers (post titled "Convertion (sic) of SOLiD3 gff to SAM/BAM for IGV browser", dated 09-01-2009).

          The SAM output I got further worked with samtools pileup.

          Best, g_solid

          Comment


          • #6
            ma2bam gone missing

            Hi,
            It seems that ABI has taken a lot of the software tools mentioned in this post off their website.
            Does anyone have the ma2bam and/or ma2gff hanging around?
            What if I have csfasta.ma files from previous analysis and I want to do something with them? Any suggestions?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X