Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to generate GFF?

    I am working on a fungus call Neurosprora. Previous lab member built a gbrowser (1.70) to visualize the deep-seq data with GFF2 file format. Now I obtain some new deep-seq data and want to load it onto that browser. I heard that he used bowtie to do alignment and convert into gff2, but I cannot find the script to convert. Does anyone have similar script which could share? Or just indicate any tools can do this trick? I am not a perl person and don't think I could learn it in a short while.

  • #2
    If you have a BAM file (which bowtie can produce), this tutorial suggests that GBrowse can handle BAM directly:



    You could also have a look at the admin tutorial:

    Comment


    • #3
      FWIW, I've just today ended up writing a python script to convert from SAM/BAM files to GFF3 files using pysam. The code may be useful for you if you can't find anything else suitable for your conversion.

      For a bit of context, I broke up genomic contigs into 100bp fragments, naming the sequences <contig>#<start>-<end>, then used bowtie2 to map them to the genome -- that's why I've got the 'read.qname.find' bits in the code. My contigs started with 'v', which bowtie replaced with 'N' for some odd reason, so I had to do a bit of extra fiddling to add the 'v' back in.

      Here's the relevant part of my code which does the SAM->GFF conversion:
      Code:
      samFile = pysam.Samfile(samFileName, "r")
      totalCount = 0
      sys.stderr.write("Getting reads from SAM file...")
      sys.stdout.write("##gff-version 3\n")
      gffWriter = csv.writer(sys.stdout, delimiter = '\t')
      for read in samFile:
          totalCount += 1
          if(read.tid > 0):
              qContig = 'v' + read.qname[1:read.qname.find("#")]
              qContigStart = read.qname[read.qname.find("#")+1:read.qname.find("-")]
              qContigEnd = read.qname[read.qname.find("-")+1:]
              tContig = samFile.getrname(read.tid)
              strand = "-" if read.is_reverse else "+"
              score = "."
              for tag in read.tags:
                  if(tag[0] == "XS"):
                      score = str(tag[1]+1000)
              gffWriter.writerow((qContig,
                                  "sam2gff3-"+os.path.basename(samFileName),
                                  "nucleotide_match", qContigStart, qContigEnd,
                                  score, strand, ".",
                                  "Name=SAM_%s,ID=%s-%s;Target=%s %d %d" %
                                  (tContig, qContig, tContig, tContig,
                                   read.pos, read.aend)))
          if(totalCount % 100000 == 0):
              sys.stderr.write(".")
      Last edited by gringer; 11-02-2011, 12:17 PM. Reason: fixed up GFF3 parsing errors, added Name attribute to avoid contig clashes

      Comment


      • #4
        Thanks a lot, that's could be very useful

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Analysis Tools
          by seqadmin


          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
          05-06-2024, 07:48 AM
        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 05-07-2024, 06:57 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-06-2024, 07:17 AM
        0 responses
        16 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 05-02-2024, 08:06 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-30-2024, 12:17 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Working...
        X