Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • harrike
    Member
    • Jun 2010
    • 29

    STAR coupled with Cufflinks

    I am working on a plant, in which the genome has been sequenced but the gene-annotation quality are not as good as I expected. I am trying to use STAR and Cufflinks to annotate novel transcripts from pair-end RNA-seq data. When I used the default settings, I got many alternative-spliced transcripts, some of which don't look like real. Therefore, for each novel splicing junction, I would like to have a filter, that is, at least ten reads (unique mapped or multi-mapped) support the junction. I don't know how to do that. What I am currently using is to do 2-pass mapping using STAR. After the 1st pass mapping, I will have the unsupported (with less than 10 reads) junctions removed from the *SJ.out.tab, and use this modified *SJ.out.tab as "annotated" junctions (--sjdbFileChrStartEnd value) for the 2nd pass mapping. Will this step improve the the mapping and get a more reliable list of novel transcripts.

    Thanks!
  • alexdobin
    Senior Member
    • Feb 2009
    • 161

    #2
    Hi @harrike,

    --outSJfilterCountUniqueMin 10 10 10 10 --outFilterType BySJout
    will remove all junctions that are supported by <10 uniquely mapping reads. The 4 numbers are for different intron motifs: non-canonical, GT/AG, GC/AG, AT/AC.
    You can also use --outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout, which will remove junctions supported by <10 reads including unique and multimappers.

    This option will work with 1-pass or 2-pass mapping.

    There are also other splice junction filtering options - please check the manual for outSJfilter* otpions, they can all be used together in "AND" fashion.

    Cheers
    Alex

    Comment

    • harrike
      Member
      • Jun 2010
      • 29

      #3
      Hi Alex,

      Great ! Thank you very much for your reply. I am testing those settings.

      I am curious about what happens to those uniquely mapped reads spanning a junction with <10 reads support. Are they considered as unmapped reads?

      Thanks,

      Rui

      Comment

      • harrike
        Member
        • Jun 2010
        • 29

        #4
        Hi Alex,

        I tried one of my libraries, using the "--outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout". In the *SJ.out.tab file, some of the junctions are still of <10 reads support (see below). Should the sum of the column 7 and 8 be >=10?

        scaffold2131 22287 22614 1 1 0 62 0 39
        scaffold2131 22322 22614 1 1 0 3 0 23
        scaffold2131 22504 22614 1 1 0 1 0 35
        scaffold2131 22527 22614 1 1 0 1 0 16
        scaffold2131 22702 22827 1 1 0 40 0 44
        scaffold2131 22933 23197 1 1 0 31 0 44
        scaffold2131 23273 23348 1 1 0 157 0 45
        scaffold2131 23479 23582 1 1 0 94 61 44
        scaffold2131 23675 23875 1 1 0 134 24 44
        scaffold2131 23927 24382 1 1 0 150 0 44
        scaffold2131 24460 24565 1 1 0 131 23 44
        scaffold2131 24621 24694 1 1 0 2 0 34
        scaffold2131 24621 24699 1 1 0 1 0 40

        The command I used is:

        STAR --genomeDir /home/harrikex/workspot/litchi/RNAseq/star_index/ --runThreadN 24 --readFilesIn 121224_I663_FCC1L8RACXX_L6_LITgqjTABRAAPEI-62_1.fq 121224_I663_FCC1L8RACXX_L6_LITgqjTABRAAPEI-62_2.fq --outSAMtype BAM Unsorted --outFilterMultimapNmax 20 --alignIntronMax 10000 --alignMatesGapMax 10000 --outSAMstrandField intronMotif --alignEndsType EndToEnd --outFilterIntronMotifs RemoveNoncanonical --outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout --outFileNamePrefix FMO_121224_

        Thanks,

        Rui

        Comment

        • Macspider
          Member
          • Feb 2016
          • 36

          #5
          The 2-pass pipeline for STAR works very fine, I used it many times. Moreover, STAR with Cufflinks scores among the best results in alignment-based quantification pipelines when dealing with not very specific samples, e.g. tissue samples or whole-organism samples.

          Comment

          • alexdobin
            Senior Member
            • Feb 2009
            • 161

            #6
            Originally posted by harrike View Post
            Hi Alex,

            I tried one of my libraries, using the "--outSJfilterCountTotalMin 10 10 10 10 --outFilterType BySJout". In the *SJ.out.tab file, some of the junctions are still of <10 reads support (see below). Should the sum of the column 7 and 8 be >=10?

            ...

            Thanks,

            Rui
            Hi Rui,

            sorry, I forgot that --outSJfilterCountUniqueMin and --outSJfilterCountTotalMin filters are used in the OR fashion, i.e. if one or the other is satisifed, the junction will be kept - so you have to specify both options. If you want only the Total filter, you need to make --outSJfilterCountUniqueMin numbers larger or -1, e.g. --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10 .

            The reads that cross the junctions that do not pass this filter will be re-mapped forcing them not to cross these junctions. Most often this will result in soft-clipped reads, i.e. donor or acceptor part of the read over the prohibited junction will be dropped.

            Cheers
            Alex

            Comment

            • harrike
              Member
              • Jun 2010
              • 29

              #7
              Originally posted by alexdobin View Post
              Hi Rui,

              sorry, I forgot that --outSJfilterCountUniqueMin and --outSJfilterCountTotalMin filters are used in the OR fashion, i.e. if one or the other is satisifed, the junction will be kept - so you have to specify both options. If you want only the Total filter, you need to make --outSJfilterCountUniqueMin numbers larger or -1, e.g. --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10 .

              The reads that cross the junctions that do not pass this filter will be re-mapped forcing them not to cross these junctions. Most often this will result in soft-clipped reads, i.e. donor or acceptor part of the read over the prohibited junction will be dropped.

              Cheers
              Alex

              Hi Alex,

              Thanks for your explanation. I tried the combination of "--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10", it works. It has only half of the junctions (using the default setting) reported. It seems to have greater reliability.

              One more question: After the 1st-pass mapping, using the above setting, only the junctions with >=10 reads support will be reported into the *SJ.out.tab file. As I have a couple of samples, I will get a few *SJ.out.tab files. In the 2nd-pass, all these *SJ.out.tab will be used as references (annotated junctions) for the mapping, do I still need to use the setting of "--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10" ? as it is possible, a few reads, <10 in one library, are mapped to a junction which is of >10 read support in another library; in this case, I still want those read not to be filtered out. On the other hand, if I don't use those settings, I feel many false positives will be introduced in the 2nd-pass mapping.

              In other words, in the 2nd-pass mapping, will all the reads be preferentially mapped according the annotated junctions (provide in the *SJ.out.tab files using --sjdbFileChrStartEnd)? If this is true, I feel it is likely good without the >10 filter (--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10) in the 2nd-pass mapping.

              Thanks,

              Rui

              Comment

              • alexdobin
                Senior Member
                • Feb 2009
                • 161

                #8
                Hi Rui,

                the --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10 filter will not apply to the "annotated" junctions, i.e. junctions that are introduced via --sjdbFileChrStartEnd or --sjdbGTFfile options. So in the 2nd pass the junctions from your combined-SJ file will not be filtered out.
                However, STAR will still be detecting the junctions which never made it to combined-SJ file, and you will have to use this filter to get rid of them.

                Cheers
                Alex

                Comment

                • harrike
                  Member
                  • Jun 2010
                  • 29

                  #9
                  Hi Alex,

                  Great, thanks for your clarification.

                  I decide to using the filter "--outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin 10 10 10 10" for my 2nd-pass mapping.

                  Rui

                  Comment

                  • biokloot
                    Junior Member
                    • Feb 2016
                    • 2

                    #10
                    Dear Rui and Alex,
                    Many thanks for asking and explaining the SJout filter combinations so well!
                    I want to completely suppress detection of any and all SJ in the second pass. Do need to set --outSJfilterCountUniqueMin -1 -1 -1 -1 AND --outSJfilterCountTotalMin -1 -1 -1 -1 for the second passes to achieve that?

                    Comment

                    • harrike
                      Member
                      • Jun 2010
                      • 29

                      #11
                      Hi biokloot,

                      To me, all "-1" does suppress all SJ, as "-1 means no output for that motif". But I am not sure how a read spanning an intron will be mapped in this case.

                      I am sure Alex will give a good clarification.

                      Rui

                      Comment

                      • alexdobin
                        Senior Member
                        • Feb 2009
                        • 161

                        #12
                        Originally posted by biokloot View Post
                        Dear Rui and Alex,
                        Many thanks for asking and explaining the SJout filter combinations so well!
                        I want to completely suppress detection of any and all SJ in the second pass. Do need to set --outSJfilterCountUniqueMin -1 -1 -1 -1 AND --outSJfilterCountTotalMin -1 -1 -1 -1 for the second passes to achieve that?
                        Hi @biokloot,

                        the best way to suppress detection of any new junctions in 2nd pass (i.e. junctions not detected in the 1st pass) is to use --alignSJoverhang 100000 (or any number > read length).
                        Alternatively, you can use
                        --outSJfilterCountUniqueMin -1 -1 -1 -1 --outSJfilterCountTotalMin -1 -1 -1 -1 --outFilterType BySJout
                        or
                        --outSJfilterOverhangMin -1 -1 -1 -1 --outFilterType BySJout
                        but it's slightly slower.

                        Cheers
                        Alex

                        Comment

                        • biokloot
                          Junior Member
                          • Feb 2016
                          • 2

                          #13
                          Thanks for the quick replies, Rui and Alex!

                          Comment

                          Latest Articles

                          Collapse

                          • SEQadmin2
                            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                            by SEQadmin2


                            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                            Here are nine questions we think about, in roughly the order they matter, before...
                            06-18-2026, 07:11 AM
                          • SEQadmin2
                            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                            by SEQadmin2


                            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                            ...
                            06-02-2026, 10:05 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, Today, 05:37 AM
                          0 responses
                          5 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-26-2026, 11:10 AM
                          0 responses
                          16 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-17-2026, 06:09 AM
                          0 responses
                          50 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-09-2026, 11:58 AM
                          0 responses
                          110 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...