hi,
I need some help with RNA Seq transcriptome assembly using cufflinks v2.1.1 with bovine genome (using deer RNA).
I ran the following and encountered errors listed below.
It appears that some assembled sequences map longer than the reference genome.
I used RNASTAR to run the alignment and have checked that my reads do not align longer than the reference genome so this may be a cufflinks bug.
I think I may be able to get around it by replacing the softclips with the following command: samtools view -h Aligned.out.bam | awk 'BEGIN {OFS="\t"} {if (substr($1,1,1)!="@") {gsub(/[0-9]*S/,"",$6); $10=$11="*"}; print }' | samtools view -bS - > Aligned.out.noS.bam
However, I would like to ask if there any other recommended suggestions for fixing this?
thank you.
kdfe
/usr/bin/time -v cufflinks \
-p 8 \
-N \
-u \
-q \
--max-bundle-length 15000000 \
--mask-file bosTau7_rRNAtRNAChrM.gtf \
-b bosTau7.fa \
-g bosTau7refFlat \
-o /Tube01/ \
/Aligned.out.bam
$ You are using Cufflinks v2.1.1, which is the most recent release.
Command line:
[13:35:40] Loading reference annotation.
[13:35:42] Loading reference annotation.
[13:35:42] Inspecting reads and determining fragment length distribution.
Processed 66538 loci.
> Map Properties:
> Normalized Map Mass: 24789887.85
> Raw Map Mass: 24789887.85
> Number of Multi-Reads: 1911419 (with 4880055 total hits)
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 198.31
> Estimated Std Dev: 56.61
[13:44:37] Assembling transcripts and initializing abundances for multi-read correction.
[]$ Processed 66538 loci.
[19:00:43] Loading reference annotation and sequence.
Error (GFaSeqGet): end coordinate (42748) cannot be larger than sequence length 42715
Error (GFaSeqGet): end coordinate (18544) cannot be larger than sequence length 18532
Error (GFaSeqGet): end coordinate (12842) cannot be larger than sequence length 12841
Error (GFaSeqGet): end coordinate (8427) cannot be larger than sequence length 8418
Error (GFaSeqGet): subsequence cannot be larger than 10788
Error getting subseq for CUFF.49431.3 (1..10799)!
Command exited with non-zero status 1
Command being timed:
User time (seconds): 144650.42
System time (seconds): 769.22
Percent of CPU this job got: 743%
Elapsed (wall clock) time (h:mm:ss or m:ss): 5:26:09
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 17234208
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 7
Minor (reclaiming a frame) page faults: 149060165
Voluntary context switches: 5421678
Involuntary context switches: 4297120
Swaps: 0
File system inputs: 13767472
File system outputs: 331264
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1
I need some help with RNA Seq transcriptome assembly using cufflinks v2.1.1 with bovine genome (using deer RNA).
I ran the following and encountered errors listed below.
It appears that some assembled sequences map longer than the reference genome.
I used RNASTAR to run the alignment and have checked that my reads do not align longer than the reference genome so this may be a cufflinks bug.
I think I may be able to get around it by replacing the softclips with the following command: samtools view -h Aligned.out.bam | awk 'BEGIN {OFS="\t"} {if (substr($1,1,1)!="@") {gsub(/[0-9]*S/,"",$6); $10=$11="*"}; print }' | samtools view -bS - > Aligned.out.noS.bam
However, I would like to ask if there any other recommended suggestions for fixing this?
thank you.
kdfe
/usr/bin/time -v cufflinks \
-p 8 \
-N \
-u \
-q \
--max-bundle-length 15000000 \
--mask-file bosTau7_rRNAtRNAChrM.gtf \
-b bosTau7.fa \
-g bosTau7refFlat \
-o /Tube01/ \
/Aligned.out.bam
$ You are using Cufflinks v2.1.1, which is the most recent release.
Command line:
[13:35:40] Loading reference annotation.
[13:35:42] Loading reference annotation.
[13:35:42] Inspecting reads and determining fragment length distribution.
Processed 66538 loci.
> Map Properties:
> Normalized Map Mass: 24789887.85
> Raw Map Mass: 24789887.85
> Number of Multi-Reads: 1911419 (with 4880055 total hits)
> Fragment Length Distribution: Empirical (learned)
> Estimated Mean: 198.31
> Estimated Std Dev: 56.61
[13:44:37] Assembling transcripts and initializing abundances for multi-read correction.
[]$ Processed 66538 loci.
[19:00:43] Loading reference annotation and sequence.
Error (GFaSeqGet): end coordinate (42748) cannot be larger than sequence length 42715
Error (GFaSeqGet): end coordinate (18544) cannot be larger than sequence length 18532
Error (GFaSeqGet): end coordinate (12842) cannot be larger than sequence length 12841
Error (GFaSeqGet): end coordinate (8427) cannot be larger than sequence length 8418
Error (GFaSeqGet): subsequence cannot be larger than 10788
Error getting subseq for CUFF.49431.3 (1..10799)!
Command exited with non-zero status 1
Command being timed:
User time (seconds): 144650.42
System time (seconds): 769.22
Percent of CPU this job got: 743%
Elapsed (wall clock) time (h:mm:ss or m:ss): 5:26:09
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 17234208
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 7
Minor (reclaiming a frame) page faults: 149060165
Voluntary context switches: 5421678
Involuntary context switches: 4297120
Swaps: 0
File system inputs: 13767472
File system outputs: 331264
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 1