Hi,
I'm trying to run saet to correct miscalled reads on our RNA-seq data, but I'm not sure what to use for the refLength ("expected length of the assembled sequence"). Does anybody know what size would be appropriate for the Arabidopsis transcriptome?
I downloaded the TAIR10 exons, 5' UTR & 3' UTR files and ran them through "grep -v '^>' | wc -m" to come up with a figure around 77 x 10^6 bases, which seems a bit higher (I was expecting around 50-60 x 10⁶ based on ~25,000 genes x 2000bp per gene).
Any thoughts or experience with this? Is there a reference I've missed which gives a reliable number?
I'm trying to run saet to correct miscalled reads on our RNA-seq data, but I'm not sure what to use for the refLength ("expected length of the assembled sequence"). Does anybody know what size would be appropriate for the Arabidopsis transcriptome?
I downloaded the TAIR10 exons, 5' UTR & 3' UTR files and ran them through "grep -v '^>' | wc -m" to come up with a figure around 77 x 10^6 bases, which seems a bit higher (I was expecting around 50-60 x 10⁶ based on ~25,000 genes x 2000bp per gene).
Any thoughts or experience with this? Is there a reference I've missed which gives a reliable number?
Comment