Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • shivshiv
    Junior Member
    • Sep 2012
    • 1

    Generating genome with splice junctions - STAR

    Hi,

    I am wondering how to go about including splice junction's database when generating a genome with STAR. I am using bovine genome downloaded from UCSC (BosTau7/Btau_4.6.1)

    I have successfully generated this genome as it is and run mapping jobs against it. I'm just wondering if anyone can point me in the right direction for finding a .gtf file with annotated introns in the three column format:

    Chr Start End Strand +/-

    I can't see one anywhere on the UCSC site, do I have to create one myself?

    Any help would be much appreciated, I'm new to this!
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    You can find the GFF files for (BosTau7/Btau_4.6.1) at NCBI: ftp://ftp.ncbi.nlm.nih.gov/genomes/Bos_taurus/GFF/

    Comment

    • shivshiv
      Junior Member
      • Sep 2012
      • 1

      #3
      Thanks a million. I'm having a bit of trouble now with generating the genome though.

      code is:

      /shared/STAR_2.3.0e/STAR --runMode generateGenome --runThreadN 8 --genomeDir /data/shared/genomes/bosTau7 --genomeFastaFiles /data/shared/genomes/bosTau7/bosTau7.fa

      to generate genome without spliced junctions. Not quite sure how to include the gff files but I was going to add --sjdbFileChrStartEnd alt_BosTau_4.6.1_scaffolds.gff3 to the end of the above. Please let me know if this is wrong

      However, whenever I try anything now all that comes up is:

      "Aug 06 13:29:03 ..... Started STAR run
      shivshiv@compute:/data/shared/genomes/bosTau7>"

      Its just returning to the command line without any error message?

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Check to make sure that the file from NCBI is in the 4-column format (quotes from STAR manual).

        Chr \tab\ Start \tab\ End \tab\ Strand(+or-)
        According to the manual you need to specify for GFF3 files
        --sjdbGTFtagExonParentTranscript Parent
        along with:
        --sjdbOverhang <N>: the length of the "overhang" on each side of a splice junctions. Ideally it should be equal to (MateLength - 1).

        Comment

        • bruce01
          Senior Member
          • Mar 2011
          • 160

          #5
          You have "--runMode generateGenome" when it should be "--runMode genomeGenerate"

          Comment

          • alexdobin
            Senior Member
            • Feb 2009
            • 161

            #6
            Hi @shivshiv,

            @bruce01 and #GenoMax advices were spot on. So your command for generating the genome with GFF3 annotation would look like:

            STAR --runThreadN 8 --runMode genomeGenerate --genomeDir /data/shared/genomes/bosTau7/ --genomeFastaFiles /data/shared/genomes/bosTau7/bosTau7.fa --sjdbGTFfile /data/shared/genomes/bosTau7/alt_BosTau_4.6.1_scaffolds.gff3 --sjdbGTFtagExonParentTranscript Parent --sjdbOverhang 100

            Comment

            • lfg
              Junior Member
              • Oct 2012
              • 1

              #7
              Hi - @shivshiv I realise you've already solved your original problem, but in case anyone else has the same problem here is how I made my own table:

              1. UCSC table browser for your genome
              2. Extract fasta sequence for each intron (keep each one separate)
              3. Grep out the fasta headers (will contain all the relevant details)
              4. Open in excel, use data-to-columns to pull out the relevant details as separate columns, delete the rest.

              Definitely not the most high tech efficient way of doing it but it got the job done quickly!

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                #8
                If you want to do cross-species alignment, particularly RNA-seq, I suggest BBMap. I don't know of anything else capable of cross-species RNA-seq. BBMap does not use GFF files, but for RNA-seq, you do need to set the maxindel flag appropriately, e.g. "maxindel=200000" if you expect the most introns to be under 200kb.

                Comment

                • kurban910
                  Member
                  • Jul 2014
                  • 58

                  #9
                  Originally posted by Brian Bushnell View Post
                  If you want to do cross-species alignment, particularly RNA-seq, I suggest BBMap. I don't know of anything else capable of cross-species RNA-seq. BBMap does not use GFF files, but for RNA-seq, you do need to set the maxindel flag appropriately, e.g. "maxindel=200000" if you expect the most introns to be under 200kb.
                  thanks. let me see can i find more info about BBMap to help me understand it further more.
                  Last edited by kurban910; 09-27-2014, 02:31 AM.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Today, 08:59 AM
                  0 responses
                  11 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  21 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  17 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...