Hello,
as part of my work I have been given the task of analyzing data from miRNA-seq. Such data was already preprocessed by a facility which did the first QC, adaptor and barcode trimming (it's made up by two pools) and given to me as a set of FASTA files.
As I'm quite inexperienced with sequencing (I come from the world of microarrays, and I started studying NGS just recently) I looked around (including these forums and wiki) to find a way to align the data properly.
Basing on what I read, I settled for bowtie. As I'm not doing any discovery, as reference I picked the human hairpin sequences from miRBase. Before alignment, I collapsed identical sequences using the fastx toolkit.
Now, when aligning, I get a lot of non-aligned reads. As an example, allowing one mismatch on one of the samples:
Going for no mismatches has even lower yields... As mentioned before (also due to inexperience) I'm not sure if it's what I should expect or not.
Any pointers in what I should try / read would be appreciated. Thanks!
as part of my work I have been given the task of analyzing data from miRNA-seq. Such data was already preprocessed by a facility which did the first QC, adaptor and barcode trimming (it's made up by two pools) and given to me as a set of FASTA files.
As I'm quite inexperienced with sequencing (I come from the world of microarrays, and I started studying NGS just recently) I looked around (including these forums and wiki) to find a way to align the data properly.
Basing on what I read, I settled for bowtie. As I'm not doing any discovery, as reference I picked the human hairpin sequences from miRBase. Before alignment, I collapsed identical sequences using the fastx toolkit.
Now, when aligning, I get a lot of non-aligned reads. As an example, allowing one mismatch on one of the samples:
Code:
bowtie Hs_miRBase_hairpin -f -n 1 -l 15 --best VB09121_Pool2/BarcodeCTTA.collapsed.fa -S Pool2_CTTA.sam # reads processed: 56581 # reads with at least one reported alignment: 646 (1.14%) # reads that failed to align: 55935 (98.86%) Reported 646 alignments to 1 output stream(s)
Any pointers in what I should try / read would be appreciated. Thanks!
Comment