Hi everyone,
I am using Htseq count to identify a total number of reads mapped onto each feature. While I do get results, I see that some samples have the same gene name repeated 12 times with the same gene count. Also, I see that the same result is repeated 12 times (for reference i have attached the repetitive value of __no_feature count).
__no_feature 1544569
__no_feature 1544569
__no_feature 1544569
__no_feature 1544569
__no_feature 1544569
__no_feature 1544569
__no_feature 1544569
__no_feature 1544569
__no_feature 1544569
__no_feature 1544569
__no_feature 1544569
__no_feature 1544569
It performs one run and does it again for 12 runs and gives me the same value for all the genes 12 times. I am confused why is this happening? is it normal for the software to work like this? or is it because I have requested more nodes to run this analysis? I did check my Gtf file and the transcript id and gene id are different in my Gtf file so I do not think that should be an issue. Please find the first few lines of gtf files as well here:
##gtf-version X
# GFF-like GTF i.e. not checked against any GTF specification. Conversion based on GFF input, standardised by AGAT.
# Liftoff v1.6.3
# /home/askhanal/miniconda3/envs/rna_seq/bin/liftoff -g ./Spurpurea_519_v5.1.gene_exons.gff3 -dir ./snigra_527M_annot/ -o ./snigra_527M_annot/snigra.527.liftoff.gff3 ./SN527M_sorted.fa ./Spurpurea_519_v5.0.fa
Chr01 Liftoff gene 804 3346 . + . gene_id "Sapur.001G001900.v5.1"; ID "Sapur.001G001900.v5.1"; Name "Sapur.001G001900"; copy_num_ID "Sapur.001G001900.v5.1_0"; coverage "0.995"; extra_copy_number "0"; sequence_ID "0.961"; valid_ORFs "0";
Chr01 Liftoff mRNA 804 3346 . + . gene_id "Sapur.001G001900.v5.1"; transcript_id "Sapur.001G001900.1.v5.1"; ID "Sapur.001G001900.1.v5.1"; Name "Sapur.001G001900.1"; Parent "Sapur.001G001900.v5.1"; extra_copy_number "0"; longest "1"; matches_ref_protein "False"; missing_stop_codon "True"; pacid "41822505"; valid_ORF "False";
Chr01 Liftoff exon 804 1502 . + . gene_id "Sapur.001G001900.v5.1"; transcript_id "Sapur.001G001900.1.v5.1"; ID "Sapur.001G001900.1.v5.1.exon.1"; Parent "Sapur.001G001900.1.v5.1"; extra_copy_number "0"; pacid "41822505";
Chr01 Liftoff exon 1541 3346 . + . gene_id "Sapur.001G001900.v5.1"; transcript_id "Sapur.001G001900.1.v5.1"; ID "Sapur.001G001900.1.v5.1.exon.2"; Parent "Sapur.001G001900.1.v5.1"; extra_copy_number "0"; pacid "41822505";
Chr01 Liftoff CDS 837 1502 . + 0 gene_id "Sapur.001G001900.v5.1"; transcript_id "Sapur.001G001900.1.v5.1"; ID "Sapur.001G001900.1.v5.1.CDS.1"; Parent "Sapur.001G001900.1.v5.1"; extra_copy_number "0"; pacid "41822505";
Chr01 Liftoff CDS 1541 3248 . + 0 gene_id "Sapur.001G001900.v5.1"; transcript_id "Sapur.001G001900.1.v5.1"; ID "Sapur.001G001900.1.v5.1.CDS.2"; Parent "Sapur.001G001900.1.v5.1"; extra_copy_number "0"; pacid "41822505";
Chr01 Liftoff five_prime_UTR 804 836 . + . gene_id "Sapur.001G001900.v5.1"; transcript_id "Sapur.001G001900.1.v5.1"; ID "Sapur.001G001900.1.v5.1.five_prime_UTR.1"; Parent "Sapur.001G001900.1.v5.1"; extra_copy_number "0"; pacid "41822505";
Chr01 Liftoff three_prime_UTR 3249 3346 . + . gene_id "Sapur.001G001900.v5.1"; transcript_id "Sapur.001G001900.1.v5.1"; ID "Sapur.001G001900.1.v5.1.three_prime_UTR.1"; Parent "Sapur.001G001900.1.v5.1"; extra_copy_number "0"; pacid "41822505";
Chr01 Liftoff gene 6845 9497 . - . gene_id "Sapur.001G002000.v5.1"; ID "Sapur.001G002000.v5.1"; Name "Sapur.001G002000"; copy_num_ID "Sapur.001G002000.v5.1_0"; coverage "0.982"; extra_copy_number "0"; sequence_ID "0.902"; valid_ORFs "1";
Chr01 Liftoff mRNA 6845 9497 . - . gene_id "Sapur.001G002000.v5.1"; transcript_id "Sapur.001G002000.1.v5.1"; ID "Sapur.001G002000.1.v5.1"; Name "Sapur.001G002000.1"; Parent "Sapur.001G002000.v5.1"; extra_copy_number "0"; longest "1"; matches_ref_protein "False"; pacid "41821890"; valid_ORF "True";
Chr01 Liftoff exon 6845 7216 . - . gene_id "Sapur.001G002000.v5.1"; transcript_id "Sapur.001G002000.1.v5.1"; ID "Sapur.001G002000.1.v5.1.exon.2"; Parent "Sapur.001G002000.1.v5.1"; extra_copy_number "0"; pacid "41821890";
Chr01 Liftoff exon 8416 9497 . - . gene_id "Sapur.001G002000.v5.1"; transcript_id "Sapur.001G002000.1.v5.1"; ID "Sapur.001G002000.1.v5.1.exon.1"; Parent "Sapur.001G002000.1.v5.1"; extra_copy_number "0"; pacid "41821890";
Chr01 Liftoff CDS 7153 7216 . - 1 gene_id "Sapur.001G002000.v5.1"; transcript_id "Sapur.001G002000.1.v5.1"; ID "Sapur.001G002000.1.v5.1.CDS.2"; Parent "Sapur.001G002000.1.v5.1"; extra_copy_number "0"; pacid "41821890";
Chr01 Liftoff CDS 8416 8918 . - 0 gene_id "Sapur.001G002000.v5.1"; transcript_id "Sapur.001G002000.1.v5.1"; ID "Sapur.001G002000.1.v5.1.CDS.1"; Parent "Sapur.001G002000.1.v5.1"; extra_copy_number "0"; pacid "41821890";
Chr01 Liftoff five_prime_UTR 8919 9497 . - . gene_id "Sapur.001G002000.v5.1"; transcript_id "Sapur.001G002000.1.v5.1"; ID "Sapur.001G002000.1.v5.1.five_prime_UTR.1"; Parent "Sapur.001G002000.1.v5.1"; extra_copy_number "0"; pacid "41821890";
Chr01 Liftoff three_prime_UTR 6845 7152 . - . gene_id "Sapur.001G002000.v5.1"; transcript_id "Sapur.001G002000.1.v5.1"; ID "Sapur.001G002000.1.v5.1.three_prime_UTR.1"; Parent "Sapur.001G002000.1.v5.1"; extra_copy_number "0"; pacid "41821890";
Also this is the code that i used:
for i in {1..36}; do htseq-count -r name -s no -f bam -c htseq_no_strand/out_S${i}.tsv ../alignment_S${i}.bam ../snigra.527.liftoff.gtf ; done
I have 36 samples and I have seen this issue in half of them while others are fine. I did check the file size of the bam file. For the one which gives redundant output in htseq the bam file is around 3.5g and the others have 1.5g. Not sure if it has something to do with it. Any kind of help will be appreciated.
Thank you
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:41 AM
|
0 responses
5 views
0 likes
|
Last Post
by seqadmin
Today, 07:41 AM
|
||
Started by seqadmin, Yesterday, 07:45 AM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Yesterday, 07:45 AM
|
||
Started by seqadmin, 12-10-2024, 07:59 AM
|
0 responses
11 views
0 likes
|
Last Post
by seqadmin
12-10-2024, 07:59 AM
|
||
Newborn Genomic Screening Shows Promise in Reducing Infant Mortality and Hospitalization
by seqadmin
Started by seqadmin, 12-09-2024, 08:22 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
12-09-2024, 08:22 AM
|