Greetings community! I am only a first year undergrad student, so please bare with my ignorance.
I have encountered some difficulties trying to use BBMap to align WGS pair end reads belonging to a Sheep to it's pertinent rna.fa file - I'm only interested in the organism's full mRNA sequences, but I do not know where else to get them.
The problem I'm facing surfaces after verifying Average Coverage and Standard Deviation calculations. While the coverage results are as expected, the standard deviation values are well over 600!
I will post the command line and the path to the reference data I am using below, to ease out the process of troubleshooting:
Reference Data: found under NCBI's ftp site, under the following path: genomes/Ovis_aries/RNA/rna.fa.gz
Here is a link that leads directly to the directory containing the reference file I am using: ftp://ftp.ncbi.nlm.nih.gov/genomes/Ovis_aries/RNA/
*If anyone else would kindly suggest a better way of obtaining only the full mRNA nucleotide sequences of the organism, please be kind enough to enlighten my poor soul*
Reads: I am using paired end fastq WGS reads that were used to build the organism's genome assembly. They have already undergone proper quality testing.
After I create my build using BBMap and my reference file (rna.fa), I proceed to align my reads to the reference build. Here is what my command line looks like:
bbmap.sh in1=read_1.fastq in2=read_2.fastq out=MappedFile.sam build=1 minratio=0.56
(I am trying to compare ambiguous values later on, so I need the min ratio. Making the "minratio=<>" parameter closer to 1 does lower the Standard Deviation, but it is still unusually high)
Cool, up to there everything runs smoothly. I take a look at my alignment results and proceed to calculate coverage and standard deviation values using pileup.sh. Again, here is my command line:
pileup.sh in=MappedFile.sam out=stats.txt hist=histogram.txt
After looking at the results, they look something like this:
Set USE_COVERAGE_ARRAYS to true
Set USE_BITSETS to false
Note: Coverage capped at 65535
Average Coverage: 37.84
Standard Deviation: 707.47
Percent scaffolds with any coverage: 80.76
Percent of reference bases covered: 35.87
I'd be incredibly thankful if anyone would shed some light into this little hindrance. What am I messing up?
I have encountered some difficulties trying to use BBMap to align WGS pair end reads belonging to a Sheep to it's pertinent rna.fa file - I'm only interested in the organism's full mRNA sequences, but I do not know where else to get them.
The problem I'm facing surfaces after verifying Average Coverage and Standard Deviation calculations. While the coverage results are as expected, the standard deviation values are well over 600!
I will post the command line and the path to the reference data I am using below, to ease out the process of troubleshooting:
Reference Data: found under NCBI's ftp site, under the following path: genomes/Ovis_aries/RNA/rna.fa.gz
Here is a link that leads directly to the directory containing the reference file I am using: ftp://ftp.ncbi.nlm.nih.gov/genomes/Ovis_aries/RNA/
*If anyone else would kindly suggest a better way of obtaining only the full mRNA nucleotide sequences of the organism, please be kind enough to enlighten my poor soul*
Reads: I am using paired end fastq WGS reads that were used to build the organism's genome assembly. They have already undergone proper quality testing.
After I create my build using BBMap and my reference file (rna.fa), I proceed to align my reads to the reference build. Here is what my command line looks like:
bbmap.sh in1=read_1.fastq in2=read_2.fastq out=MappedFile.sam build=1 minratio=0.56
(I am trying to compare ambiguous values later on, so I need the min ratio. Making the "minratio=<>" parameter closer to 1 does lower the Standard Deviation, but it is still unusually high)
Cool, up to there everything runs smoothly. I take a look at my alignment results and proceed to calculate coverage and standard deviation values using pileup.sh. Again, here is my command line:
pileup.sh in=MappedFile.sam out=stats.txt hist=histogram.txt
After looking at the results, they look something like this:
Set USE_COVERAGE_ARRAYS to true
Set USE_BITSETS to false
Note: Coverage capped at 65535
Average Coverage: 37.84
Standard Deviation: 707.47
Percent scaffolds with any coverage: 80.76
Percent of reference bases covered: 35.87
I'd be incredibly thankful if anyone would shed some light into this little hindrance. What am I messing up?
Comment