Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • frymor
    Senior Member
    • May 2010
    • 151

    Identifying splice junction with STAR

    Hi,

    as mentioned here, we are mapping drosophila samples using the STAR aligner. To get an impression of the mapping quality we compared them also to tophat2 using these commands:

    Code:
    ~/software/STAR-STAR_2.4.1c/STAR --runThreadN 15 --genomeDir genomes/Drosophila_melanogaster/STARindex/Dmel/ --readFilesIn $file --readFilesCommand zcat  --sjdbGTFfile genes.gtf --outFilterType BySJout --outFilterMultimapNmax 1 --alignSJoverhangMin 8 --outFileNamePrefix $NEW_FILE.STAR. --outSAMtype BAM Unsorted --outReadsUnmapped Fastx --outFilterMismatchNoverLmax 0.05 --outFilterScoreMinOverLread 0  --outFilterMatchNminOverLread 0 --alignIntronMax 1
    
     tophat2 -p 15 -g 1 -G genes.gtf -o $NEW_FILE.tophat.out  genomes/Drosophila_melanogaster/Ensembl/BDGP6.80/bowtie2index/genome $file
    The mapping percentage varies between better and a lot better towards the STAR algorithm, but when comparing the splice junction files, tophat2 can identify 56058 junctions while STAR only 46855.
    I have looked at the bam files with IGV (images attached below) and it is very clear, that tophat2 can identify a lot of very long splice junctions which STAR can't deal with.

    As you can see (light blue lines in IGV), the short splice junctions are identified by both algorithms, but for the longer ones, tophat2 has a lot more of them.

    Is there a way to adjust the STAR parameters so that i can also find these junctions?

    thanks
    Assa

    Last edited by frymor; 08-14-2015, 12:29 AM.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Tophat2 is doing a 2pass search for junctions while STAR is doing just a single pass. There's an option to have STAR do a second pass over things, though I don't recall what it is off-hand (have a look through the documentation).

    Comment

    • alexdobin
      Senior Member
      • Feb 2009
      • 161

      #3
      Hi Assa,

      you are using --alignIntronMax 1, i.e. maximum intron size =1 for unannotated junctions.
      This, of course, prevents STAR from reporting any unannotated junctions - my guess these "long" junctions are all unannotated.
      Please choose whatever max splice gap is reasonable for the fly, I would use --alignIntronMax 100000 .

      If this does not solve the discrepancy, we could look at the alignments in more detail.

      Cheers
      Alex

      Comment

      • frymor
        Senior Member
        • May 2010
        • 151

        #4
        Hi Alex,

        thanks for the response for both questions.
        the --alignIntronMax 1 was a mistake I have already taken care of. It is true, that a lot of the junctions were than found, but it was also a combination of multiple parameters. Both the multi-mappers as well as the thresholds for the number of mismatches, quality and quantity were the reason, that a lot of the reads were unaccounted for.
        After changing the parameters a few time (even with multi-mappers up to 50 just for the fun of it), I was able to see, that the reads are there, but apparently were discarded due to specific parameters.

        Assa

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Pathogen Surveillance with Advanced Genomic Tools
          by seqadmin




          The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
          03-24-2025, 11:48 AM
        • seqadmin
          New Genomics Tools and Methods Shared at AGBT 2025
          by seqadmin


          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

          The Headliner
          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
          03-03-2025, 01:39 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 03-20-2025, 05:03 AM
        0 responses
        41 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-19-2025, 07:27 AM
        0 responses
        47 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-18-2025, 12:50 PM
        0 responses
        36 views
        0 reactions
        Last Post seqadmin  
        Started by seqadmin, 03-03-2025, 01:15 PM
        0 responses
        191 views
        0 reactions
        Last Post seqadmin  
        Working...