Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kpboh
    Junior Member
    • Oct 2014
    • 2

    replacing specific positions in fasta from vcf/list

    I have a reference assembly (in fasta format) and vcf file containing a list of specific sites. I'd like to edit the fasta file to change these positions to 'Ns'.

    Does anyone have any suggestions for a tool to accomplish this? I also have a trimmed down version of the vcf that just contains chrom# and position...

    Thanks in advance for any suggestions!
  • neavemj
    Member
    • Feb 2014
    • 58

    #2
    Sounds like a job for python or perl.

    You could read through the vcf file and gather the positions in a dictionary. Then read through the fasta file and make the change to N at positions that match in the dictionary..

    Comment

    • kpboh
      Junior Member
      • Oct 2014
      • 2

      #3
      thanks for the reply. yeah--seems to be the way to go, but i'm unfortunately not fluent enough in either language.

      i did find this example but couldn't get it to run properly (it output an entire new fasta for each individual position as it looped through the vcf instead of accumulating all the changes in the vcf before printing a single, mutated fasta). i suspect it's a trivial change to get it to work properly.

      at any rate, i managed to hack a solution by changing the 'alt' allele in my vcf to 'N', modifying (using sed) all the GT values to "1/1", then feeding this file into GATK's FastaAlternateReferenceMaker tool. clearly far from elegant, but i checked the positions in question in the output and it seemed to have worked.

      Comment

      • neavemj
        Member
        • Feb 2014
        • 58

        #4
        Clever solution!

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          07-01-2026, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Yesterday, 11:08 AM
        0 responses
        7 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        11 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        19 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        53 views
        0 reactions
        Last Post SEQadmin2  
        Working...