I am running BWA->cufflink for differential gene expression and the cuffdiff output contains many repeated gene_ids. For example:
Can anyone explain why there are repeats like this?
Also, I need to use BLAST against similar plant species to find information about the genes which are differentially expressed. I am not able to understand the "locus" column. In the above example, the locus is IWGSC_CSS_1AL_scaff_3881073:2596-3747. In the reference file there is a sequence associated with IWGSC_CSS_1AL_scaff_3881073. Do I need to select the subsequence starting from location 2596 to 3747 for BLAST?
HTML Code:
test_id gene_id gene locus sample_1 sample_2 status value_1 value_2 log2(fold_change) test_stat p_value q_value significant XLOC_000665 XLOC_000665 Traes_1AL_34404D5D8 IWGSC_CSS_1AL_scaff_3881073:2596-3747 q3 q5 OK 42.1649 2.81737 -3.90362 -8.49777 5.00E-05 0.00921445 yes XLOC_000665 XLOC_000665 Traes_1AL_34404D5D8 IWGSC_CSS_1AL_scaff_3881073:2596-3747 q4 q5 OK 16.8182 2.81737 -2.5776 -5.45754 0.00025 0.0386488 yes
Also, I need to use BLAST against similar plant species to find information about the genes which are differentially expressed. I am not able to understand the "locus" column. In the above example, the locus is IWGSC_CSS_1AL_scaff_3881073:2596-3747. In the reference file there is a sequence associated with IWGSC_CSS_1AL_scaff_3881073. Do I need to select the subsequence starting from location 2596 to 3747 for BLAST?
Comment