Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ykdang
    Junior Member
    • Oct 2011
    • 7

    How to generate GFF?

    I am working on a fungus call Neurosprora. Previous lab member built a gbrowser (1.70) to visualize the deep-seq data with GFF2 file format. Now I obtain some new deep-seq data and want to load it onto that browser. I heard that he used bowtie to do alignment and convert into gff2, but I cannot find the script to convert. Does anyone have similar script which could share? Or just indicate any tools can do this trick? I am not a perl person and don't think I could learn it in a short while.
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    #2
    If you have a BAM file (which bowtie can produce), this tutorial suggests that GBrowse can handle BAM directly:



    You could also have a look at the admin tutorial:

    Comment

    • gringer
      David Eccles (gringer)
      • May 2011
      • 845

      #3
      FWIW, I've just today ended up writing a python script to convert from SAM/BAM files to GFF3 files using pysam. The code may be useful for you if you can't find anything else suitable for your conversion.

      For a bit of context, I broke up genomic contigs into 100bp fragments, naming the sequences <contig>#<start>-<end>, then used bowtie2 to map them to the genome -- that's why I've got the 'read.qname.find' bits in the code. My contigs started with 'v', which bowtie replaced with 'N' for some odd reason, so I had to do a bit of extra fiddling to add the 'v' back in.

      Here's the relevant part of my code which does the SAM->GFF conversion:
      Code:
      samFile = pysam.Samfile(samFileName, "r")
      totalCount = 0
      sys.stderr.write("Getting reads from SAM file...")
      sys.stdout.write("##gff-version 3\n")
      gffWriter = csv.writer(sys.stdout, delimiter = '\t')
      for read in samFile:
          totalCount += 1
          if(read.tid > 0):
              qContig = 'v' + read.qname[1:read.qname.find("#")]
              qContigStart = read.qname[read.qname.find("#")+1:read.qname.find("-")]
              qContigEnd = read.qname[read.qname.find("-")+1:]
              tContig = samFile.getrname(read.tid)
              strand = "-" if read.is_reverse else "+"
              score = "."
              for tag in read.tags:
                  if(tag[0] == "XS"):
                      score = str(tag[1]+1000)
              gffWriter.writerow((qContig,
                                  "sam2gff3-"+os.path.basename(samFileName),
                                  "nucleotide_match", qContigStart, qContigEnd,
                                  score, strand, ".",
                                  "Name=SAM_%s,ID=%s-%s;Target=%s %d %d" %
                                  (tContig, qContig, tContig, tContig,
                                   read.pos, read.aend)))
          if(totalCount % 100000 == 0):
              sys.stderr.write(".")
      Last edited by gringer; 11-02-2011, 12:17 PM. Reason: fixed up GFF3 parsing errors, added Name attribute to avoid contig clashes

      Comment

      • ykdang
        Junior Member
        • Oct 2011
        • 7

        #4
        Thanks a lot, that's could be very useful

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          Yesterday, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Today, 11:08 AM
        0 responses
        6 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-30-2026, 05:37 AM
        0 responses
        11 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        19 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        53 views
        0 reactions
        Last Post SEQadmin2  
        Working...