Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Originally posted by SamCurt View Post
    So, just for gene expression profiling purposes, should I keep my sjDb file set for second-pass alignment constant?

    Complete story: I have a set of ~40 samples already completed the entire set of double-pass alignment for both gene expression and variation analysis purposes. sjDb files from the first-passes of these samples were used for their second-pass alignments.

    Now I have received a further ~15 samples within the same project of which I'd perform gene expression only. I wonder whether I should I do a first-pass on these new samples and pool their sjDb's with the old ones for second-pass, or just do a "second-pass" with the old sjDb's? My concern is obviously not about time, but rather whether using a different sjDb set would make the gene counts less comparable.
    Hi Sam,

    To avoid quantification bias it's better to use the same splice junctions for the 2nd pass mapping. However, this affects only the novel (unannotated junctions), so if you are quantifying only annotated genes, the bias is likely to be very small.

    The ideal solution is to combine splice junctions files (SJ.out.tab) from the 1st pass of all samples (old and new), and then run the 2nd pass on *all* samples.

    The 2nd best solution (for differential expression) is to use only the junctions from the old samples for the "2nd" pass mapping of the new samples (you would not need the 1st pass mapping for the new samples, nor another 2nd pass on the old samples). This way you would avoid bias for junctions detected only in the new samples.

    Cheers
    Alex

    Comment


    • Hi everyone
      Do you think we can align with Star on a laptop with Intel Core Extreme i7-4940MX and 32GB RAM, even overnight? I will have about 130 million reads, to align on human genome.
      Thank you

      Comment


      • Originally posted by mdidish View Post
        Hi everyone
        Do you think we can align with Star on a laptop with Intel Core Extreme i7-4940MX and 32GB RAM, even overnight? I will have about 130 million reads, to align on human genome.
        Thank you
        Hi,

        depending on the read length, the speed should 20-50M reads per hour per core, so it should be doable. 32GB is just enough for human genome.

        Cheers
        Alex

        Comment


        • Hi,
          Thank you or your response. Finally, I should have a laptop with Intel Core Extreme i7-4940MX and 64GB RAM.
          The duration is not important, I just wanted to make sure I can start the analysis.
          Marc

          Comment


          • dear alex, or other star experts

            which parameters should I set to get ALL non-canonical (i.e. back) spliced reads in the unmapped sam file. I want call circular RNAs with these reads.

            I have 50 bp paired-end unstranded RNA-seq reads, and a genome index with a splice database from the same data. 2-pass over all samples. I hope back-spliced junctions are NOT present in this (joined) splice database - or should I filter theses databases accordingly to remove the back-splice junctions?...


            is

            --outFilterIntronMotifs RemoveNoncanonicalUnannotated

            the correct setting. i.e. will all spliced reads not present in the splice junctions database get in the unmapped sam-file?

            best wishes and thank's in advance,

            dietmar

            Comment


            • Originally posted by dietmar13 View Post
              which parameters should I set to get ALL non-canonical (i.e. back) spliced reads in the unmapped sam file. I want call circular RNAs with these reads.

              I have 50 bp paired-end unstranded RNA-seq reads, and a genome index with a splice database from the same data. 2-pass over all samples. I hope back-spliced junctions are NOT present in this (joined) splice database - or should I filter theses databases accordingly to remove the back-splice junctions?...


              is

              --outFilterIntronMotifs RemoveNoncanonicalUnannotated

              the correct setting. i.e. will all spliced reads not present in the splice junctions database get in the unmapped sam-file?

              best wishes and thank's in advance,

              dietmar
              Hi Dietmar,

              the non-canonical junctions have non-canonical motifs, but they are still "linear" in the genome, i.e. acceptor site follows the donor site. The circular junctions are classified as "chimeric", so you need to enable chimeric detection, e.g.: --chimSegmentMin 15 --chimJunctionOverhangMin 15 . You can extract the circular junctions from the Chimeric.out.junction (see this post), an example script is in STAR source distribution: extras/scripts/filterCirc.awk . The chimeric alignments are also written in the SAM/BAM files.

              Cheers
              Alex

              Comment


              • Hi Alex

                I've been using Star now for several weeks and I love it! Thanks for creating such a great tool.

                I'd like to use Star to try to align Macaque reads to the human genome. I think this might work best if I relax the alignment stringency. Do you have any recommendations for how I should do this?

                Comment


                • Hi Alex,
                  I understand that with --quantMode TranscriptomeSAM --quantTranscriptomeBan Singleend I can generate a transcript-coordinate bam file with indels and soft-clips. But do you consider it acceptable for variant-calling (eg for allele-specific expression purposes)?

                  Comment


                  • duplicated reference genomes

                    I don't understand why the genomeGenerate mode is creating a duplicated (concatenated) reference. This is resulting in at least two identical alignments for every read:

                    Command issued:
                    Code:
                    STAR --runMode genomeGenerate --genomeDir NPB_Pi9 --genomeFastaFiles NPB_Pi9.fasta --runThreadN 2 --genomeSAindexNbases 14
                    resulting SAM header and first two alignments:
                    Code:
                    @HD	VN:1.4
                    @SQ	SN:chr01	LN:43270923
                    @SQ	SN:chr02	LN:35937250
                    @SQ	SN:chr03	LN:36413819
                    @SQ	SN:chr04	LN:35502694
                    @SQ	SN:chr05	LN:29958434
                    @SQ	SN:chr06	LN:31248787
                    @SQ	SN:chr07	LN:29697621
                    @SQ	SN:chr08	LN:28443022
                    @SQ	SN:chr09	LN:23012720
                    @SQ	SN:chr10	LN:23207287
                    @SQ	SN:chr11	LN:29021106
                    @SQ	SN:chr12	LN:27531856
                    @SQ	SN:AC155918	LN:32941
                    @SQ	SN:AC156495	LN:88500
                    @SQ	SN:AC160949	LN:128256
                    @SQ	SN:AP008246	LN:206004
                    @SQ	SN:AP008247	LN:157458
                    @SQ	SN:AC174930	LN:15426
                    @SQ	SN:Syng_TIGR_002	LN:14476
                    @SQ	SN:Syng_TIGR_004	LN:19457
                    @SQ	SN:Syng_TIGR_005	LN:21787
                    @SQ	SN:Syng_TIGR_007	LN:7820
                    @SQ	SN:Syng_TIGR_008	LN:16676
                    @SQ	SN:Syng_TIGR_009	LN:10296
                    @SQ	SN:Syng_TIGR_010	LN:15493
                    @SQ	SN:Syng_TIGR_011	LN:10901
                    @SQ	SN:Syng_TIGR_012	LN:16417
                    @SQ	SN:Syng_TIGR_013	LN:10512
                    @SQ	SN:Syng_TIGR_014	LN:21421
                    @SQ	SN:Syng_TIGR_015	LN:10595
                    @SQ	SN:Syng_TIGR_016	LN:12792
                    @SQ	SN:Syng_TIGR_019	LN:10422
                    @SQ	SN:Syng_TIGR_020	LN:10699
                    @SQ	SN:Syng_TIGR_021	LN:17477
                    @SQ	SN:Syng_TIGR_022	LN:9889
                    @SQ	SN:Syng_TIGR_023	LN:24772
                    @SQ	SN:Syng_TIGR_024	LN:10060
                    @SQ	SN:Syng_TIGR_026	LN:19971
                    @SQ	SN:Syng_TIGR_027	LN:11522
                    @SQ	SN:Syng_TIGR_028	LN:31094
                    @SQ	SN:Syng_TIGR_029	LN:12884
                    @SQ	SN:Syng_TIGR_030	LN:10794
                    @SQ	SN:Syng_TIGR_031	LN:9548
                    @SQ	SN:Syng_TIGR_032	LN:9603
                    @SQ	SN:Syng_TIGR_033	LN:11093
                    @SQ	SN:Syng_TIGR_034	LN:10311
                    @SQ	SN:Syng_TIGR_035	LN:10686
                    @SQ	SN:Syng_TIGR_036	LN:10434
                    @SQ	SN:Syng_TIGR_037	LN:13061
                    @SQ	SN:Syng_TIGR_038	LN:8197
                    @SQ	SN:Syng_TIGR_039	LN:6269
                    @SQ	SN:Syng_TIGR_041	LN:10210
                    @SQ	SN:Syng_TIGR_042	LN:5510
                    @SQ	SN:Syng_TIGR_043	LN:4236
                    @SQ	SN:Syng_TIGR_044	LN:6000
                    @SQ	SN:Syng_TIGR_045	LN:22545
                    @SQ	SN:Syng_TIGR_046	LN:11447
                    @SQ	SN:Syng_TIGR_047	LN:20829
                    @SQ	SN:Syng_TIGR_048	LN:7140
                    @SQ	SN:Syng_TIGR_049	LN:6261
                    @SQ	SN:Syng_TIGR_050	LN:8529
                    @SQ	SN:Pi9_cDNA	LN:4650
                    @SQ	SN:chr01	LN:43270923
                    @SQ	SN:chr02	LN:35937250
                    @SQ	SN:chr03	LN:36413819
                    @SQ	SN:chr04	LN:35502694
                    @SQ	SN:chr05	LN:29958434
                    @SQ	SN:chr06	LN:31248787
                    @SQ	SN:chr07	LN:29697621
                    @SQ	SN:chr08	LN:28443022
                    @SQ	SN:chr09	LN:23012720
                    @SQ	SN:chr10	LN:23207287
                    @SQ	SN:chr11	LN:29021106
                    @SQ	SN:chr12	LN:27531856
                    @SQ	SN:AC155918	LN:32941
                    @SQ	SN:AC156495	LN:88500
                    @SQ	SN:AC160949	LN:128256
                    @SQ	SN:AP008246	LN:206004
                    @SQ	SN:AP008247	LN:157458
                    @SQ	SN:AC174930	LN:15426
                    @SQ	SN:Syng_TIGR_002	LN:14476
                    @SQ	SN:Syng_TIGR_004	LN:19457
                    @SQ	SN:Syng_TIGR_005	LN:21787
                    @SQ	SN:Syng_TIGR_007	LN:7820
                    @SQ	SN:Syng_TIGR_008	LN:16676
                    @SQ	SN:Syng_TIGR_009	LN:10296
                    @SQ	SN:Syng_TIGR_010	LN:15493
                    @SQ	SN:Syng_TIGR_011	LN:10901
                    @SQ	SN:Syng_TIGR_012	LN:16417
                    @SQ	SN:Syng_TIGR_013	LN:10512
                    @SQ	SN:Syng_TIGR_014	LN:21421
                    @SQ	SN:Syng_TIGR_015	LN:10595
                    @SQ	SN:Syng_TIGR_016	LN:12792
                    @SQ	SN:Syng_TIGR_019	LN:10422
                    @SQ	SN:Syng_TIGR_020	LN:10699
                    @SQ	SN:Syng_TIGR_021	LN:17477
                    @SQ	SN:Syng_TIGR_022	LN:9889
                    @SQ	SN:Syng_TIGR_023	LN:24772
                    @SQ	SN:Syng_TIGR_024	LN:10060
                    @SQ	SN:Syng_TIGR_026	LN:19971
                    @SQ	SN:Syng_TIGR_027	LN:11522
                    @SQ	SN:Syng_TIGR_028	LN:31094
                    @SQ	SN:Syng_TIGR_029	LN:12884
                    @SQ	SN:Syng_TIGR_030	LN:10794
                    @SQ	SN:Syng_TIGR_031	LN:9548
                    @SQ	SN:Syng_TIGR_032	LN:9603
                    @SQ	SN:Syng_TIGR_033	LN:11093
                    @SQ	SN:Syng_TIGR_034	LN:10311
                    @SQ	SN:Syng_TIGR_035	LN:10686
                    @SQ	SN:Syng_TIGR_036	LN:10434
                    @SQ	SN:Syng_TIGR_037	LN:13061
                    @SQ	SN:Syng_TIGR_038	LN:8197
                    @SQ	SN:Syng_TIGR_039	LN:6269
                    @SQ	SN:Syng_TIGR_041	LN:10210
                    @SQ	SN:Syng_TIGR_042	LN:5510
                    @SQ	SN:Syng_TIGR_043	LN:4236
                    @SQ	SN:Syng_TIGR_044	LN:6000
                    @SQ	SN:Syng_TIGR_045	LN:22545
                    @SQ	SN:Syng_TIGR_046	LN:11447
                    @SQ	SN:Syng_TIGR_047	LN:20829
                    @SQ	SN:Syng_TIGR_048	LN:7140
                    @SQ	SN:Syng_TIGR_049	LN:6261
                    @SQ	SN:Syng_TIGR_050	LN:8529
                    @SQ	SN:Pi9_cDNA	LN:4650
                    @PG	ID:STAR	PN:STAR	VN:STAR_2.5.4b	CL:STAR   --runThreadN 16   --genomeDir NPB_Pi9   --genomeFastaFiles NPB_Pi9.fasta      --genomeSAindexNbases 1   --readFilesIn STARFILES/MF046_S4_L002_R
                    1_001.fastq.gz      --readFilesCommand gunzip   -c      --outFileNamePrefix STARFILES/MF046.NPB_Pi9   --outFilterMatchNmin 40
                    @CO	user command line: STAR --runThreadN 16 --genomeDir NPB_Pi9 --genomeFastaFiles NPB_Pi9.fasta --genomeSAindexNbases 1 --readFilesCommand gunzip -c --readFilesIn STARFILES/MF046_S4_L002_R
                    1_001.fastq.gz --outFileNamePrefix STARFILES/MF046.NPB_Pi9 --outFilterMatchNmin 40
                    K00282:141:HJTJWBBXX:2:1101:2656:1068	16	chr01	12873883	3	50M1S	*	0	0	CTTGAGNCGANCACACTATAGCCATGTACATTAGTATAGGTTTACACTAGN	JJJJJJ#JJJ#J<FJFJJJJJJJJJ
                    JJJJJJJJJJJJJJJJJJJJJAFAA#	NH:i:2	HI:i:1	AS:i:47	nM:i:0
                    K00282:141:HJTJWBBXX:2:1101:2656:1068	272	chr01	12873883	3	50M1S	*	0	0	CTTGAGNCGANCACACTATAGCCATGTACATTAGTATAGGTTTACACTAGN	JJJJJJ#JJJ#J<FJFJJJJJJJJJ
                    JJJJJJJJJJJJJJJJJJJJJAFAA#	NH:i:2	HI:i:2	AS:i:47	nM:i:0
                    Last edited by GenoMax; 04-03-2018, 03:35 AM.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Choosing Between NGS and qPCR
                      by seqadmin



                      Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                      10-18-2024, 07:11 AM
                    • seqadmin
                      Non-Coding RNA Research and Technologies
                      by seqadmin




                      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                      Nobel Prize for MicroRNA Discovery
                      This week,...
                      10-07-2024, 08:07 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 05:31 AM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-24-2024, 06:58 AM
                    0 responses
                    20 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-23-2024, 08:43 AM
                    0 responses
                    48 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-17-2024, 07:29 AM
                    0 responses
                    58 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X