Hi folks,
I am relatively new to this kind of analysis. I am interested in discovering novel splice variants and differentially transcribed genes. I have several pairs of inputs (pre and post relapse CLL for 3 individuals). I am using STAR to map paired reads to hg19 without a GTF file. I also am using cufflinks/cuffcompare/cuffdiff to look for differential events. Is there an easy way to remove known transcript variants from the output of cuffdiff so as to focus on the novel stuff? Or, should I be using STAR with an annotated set of indexes and using a GTF file in the search? For example, to create the index for STAR:
STAR --runMode genomeGenerate --sjdbFileChrStartEnd hg19_intron_loci.txt --sjdbOverhang 75 --genomeDir ../blabla.fa
then to run it on a pair of reads:
STAR --genomeDir /data/Genomes/UCSC/hg19/STAR_ANNOTATED --sjdbGTFfile /data/Genomes/UCSC/hg19/knownGene_standardchromonly.gtf --readFilesIn sample1_1.clipped.fastq sample1_2.clipped.fastq --runThreadN 32
Or is there a better way? BTW I can already do the general differential expression analysis using a mapping to the transcriptome (i.e using BWA-eXpress and edgeR, or inverse beta binomial for the statistical test).
I'm also using diffsplice on the star output, and that is kind of interesting. I annotate the regions with annovar.
cheers,
karl_s
I am relatively new to this kind of analysis. I am interested in discovering novel splice variants and differentially transcribed genes. I have several pairs of inputs (pre and post relapse CLL for 3 individuals). I am using STAR to map paired reads to hg19 without a GTF file. I also am using cufflinks/cuffcompare/cuffdiff to look for differential events. Is there an easy way to remove known transcript variants from the output of cuffdiff so as to focus on the novel stuff? Or, should I be using STAR with an annotated set of indexes and using a GTF file in the search? For example, to create the index for STAR:
STAR --runMode genomeGenerate --sjdbFileChrStartEnd hg19_intron_loci.txt --sjdbOverhang 75 --genomeDir ../blabla.fa
then to run it on a pair of reads:
STAR --genomeDir /data/Genomes/UCSC/hg19/STAR_ANNOTATED --sjdbGTFfile /data/Genomes/UCSC/hg19/knownGene_standardchromonly.gtf --readFilesIn sample1_1.clipped.fastq sample1_2.clipped.fastq --runThreadN 32
Or is there a better way? BTW I can already do the general differential expression analysis using a mapping to the transcriptome (i.e using BWA-eXpress and edgeR, or inverse beta binomial for the statistical test).
I'm also using diffsplice on the star output, and that is kind of interesting. I annotate the regions with annovar.
cheers,
karl_s
Comment