Hello, I am new to running RNAseq data and I am getting confused about the terminologies used for running the program.
Right now, I am trying to run Tophat & bowtie2 using the Ugene software's workflow.
It requires me to enter:
1. Bowtie index base name
2. Known transcript file
3. Raw junctions
UGENE's software tutorial page is not very detailed in the instructions and so I visited Bowtie's website.
I found these files available for download
A) H. sapiens NCBI GRCh38 (ftp://ftp.ncbi.nlm.nih.gov/genomes/a...e_index.tar.gz) - >3.5gb size file
and also this:
B) H. sapiens, EMSEMBL GrCH37 (ftp://igenome:[email protected]..._GRCh37.tar.gz) -> more than 18gb size file
May I know if it's correct to use (A) as index file, and call it GrCH38 as bowtie index base name?
And is it correct to call (B) the transcript file ?
As for "raw junctions", where can I find the list of raw junctions?
Would really appreciate your help.
Right now, I am trying to run Tophat & bowtie2 using the Ugene software's workflow.
It requires me to enter:
1. Bowtie index base name
2. Known transcript file
3. Raw junctions
UGENE's software tutorial page is not very detailed in the instructions and so I visited Bowtie's website.
I found these files available for download
A) H. sapiens NCBI GRCh38 (ftp://ftp.ncbi.nlm.nih.gov/genomes/a...e_index.tar.gz) - >3.5gb size file
and also this:
B) H. sapiens, EMSEMBL GrCH37 (ftp://igenome:[email protected]..._GRCh37.tar.gz) -> more than 18gb size file
May I know if it's correct to use (A) as index file, and call it GrCH38 as bowtie index base name?
And is it correct to call (B) the transcript file ?
As for "raw junctions", where can I find the list of raw junctions?
Would really appreciate your help.
Comment