Seqanswers Leaderboard Ad
Collapse
X
-
@Ram you should not need to get the reference genome. The error above is not related to the original issue but an error with your DESeq2 command.
Leave a comment:
-
-
Thanks GenoMax and Micheal
I tried the other reference genome with same chromosome annotation but it still did not work
Below is the error message
> se <- summarizeOverlaps(features=ebg, reads=bamfiles,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE,suppressWarnings())
Error in validObject(.Object) :
invalid class “SummarizedExperiment0” object: 'assays' nrow differs from 'mcols' nrow
the heads of both the bam and the gtf files are as below:
[rshukla@dev01 test]$ samtools view -H Ctrl13058.bam | head
@HD VN:1.0 SO:coordinate
@RG ID:0 SM:Ctrl13058-Trimmed PI:50
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
[rshukla@dev01 test]$ head Human_hg19.gtf
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 12613 12721 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc001aaa.3"; transcript_id "uc001aaa.3";
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
chr1 hg19_knownGene exon 12646 12697 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
chr1 hg19_knownGene exon 13221 14409 0.000000 + . gene_id "uc010nxr.1"; transcript_id "uc010nxr.1";
chr1 hg19_knownGene start_codon 12190 12192 0.000000 + . gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
chr1 hg19_knownGene CDS 12190 12227 0.000000 + 0 gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
chr1 hg19_knownGene exon 11874 12227 0.000000 + . gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
chr1 hg19_knownGene CDS 12595 12721 0.000000 + 1 gene_id "uc010nxq.1"; transcript_id "uc010nxq.1";
[rshukla@dev01 test]$
As per Genomaxs suggestion I am also downloading reference genome from basespace, surprisingly it is 42.3GB !
any further suggestions?
Ram
Leave a comment:
-
-
Can you download a corresponding GTF file from BaseSpace? That should sort this problem out.
It should be possible to change the GTF file you have (from 1 to chr1 etc) but if the above works then you would know that you have a file that exactly corresponds to the genome used for alignment. Using an incorrect GTF file will lead to erroneous results.
Edit: If changing 1 to chr 1 is the preferred option then use the solution suggested by @vivek in this thread.
Copying it here for reference (use your file names) :
Code:$ awk '{if($0 !~ /^#/) print "chr"$0; else print $0}' no_chr.file> with_chr.file
Last edited by GenoMax; 04-12-2016, 06:22 AM.
Leave a comment:
-
-
Thanks a lot Michael,
below is the output of the codes you suggested:
[rshukla@dev01 test]$ samtools view -H Ctrl13058.bam | head
@HD VN:1.0 SO:coordinate
@RG ID:0 SM:Ctrl13058-Trimmed PI:50
@SQ SN:chrM LN:16571
@SQ SN:chr1 LN:249250621
@SQ SN:chr2 LN:243199373
@SQ SN:chr3 LN:198022430
@SQ SN:chr4 LN:191154276
@SQ SN:chr5 LN:180915260
@SQ SN:chr6 LN:171115067
@SQ SN:chr7 LN:159138663
[rshukla@dev01 test]$ head Homo_sapiens.GRCh37.75.gtf
#!genome-build GRCh37.p13
#!genome-version GRCh37
#!genome-date 2009-02
#!genome-build-accession NCBI:GCA_000001405.14
#!genebuild-last-updated 2013-09
1 pseudogene gene 11869 14412 . + . gene_id "ENSG00000223972"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene";
1 processed_transcript transcript 11869 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; g ene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana";
1 processed_transcript exon 11869 12227 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "1"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00002234944";
1 processed_transcript exon 12613 12721 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "2"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00003582793";
1 processed_transcript exon 13221 14409 . + . gene_id "ENSG00000223972"; transcript_id "ENST00000456328"; exon_numb er "3"; gene_name "DDX11L1"; gene_source "ensembl_havana"; gene_biotype "pseudogene"; transcript_name "DDX11L1-002"; transcript_source "havana"; exon _id "ENSE00002312635";
The format in .bam file is chr1 where as in .gtf I guess it is just the number.
Do you have any suggestion how to proceed further, I am very new to all these.
Thanks again,
Ram
Leave a comment:
-
-
Hi Ram,
could you check if the chromosomes' name in the bam files are the same as the ones in the gtf?
Code:samtools view -H my_bam.bam | head
Code:head Homo_sapiens.GRCh37.75.gtf
Often, one annotation is 'chr1', another 'Chr1', and sometimes just '1'. If you are mixing these up, no program or tool will find an overlap.
Cheers,
Michael
Leave a comment:
-
-
BAM files from RNAseq -Alignment (Basespace) not working with DESEq2 in R
Hello experts,
With my recent run in illumina Hiseq2500 I aligned the FASTAQ files generated in basespace to human reference genome (Homosapieans (PAR masked)/hg19 (Refseq) using the RNA-seq Alignment App.I used Tophat (Bowtie2) aligner for the same. except for "NO" to the stranded information, No other parameters were selected.
I downloaded the BAM files generated and used them in the Bioconductor RNAseq workflow (Using DESeq2)
Here we walk through an end-to-end gene-level RNA-seq differential expression workflow using Bioconductor packages. We will start from the FASTQ files, show how these were aligned to the reference genome, and prepare a count matrix which tallies the number of RNA-seq reads/fragments within each gene for each sample. We will perform exploratory data analysis (EDA) for quality assessment and to explore the relationship between samples, perform differential gene expression analysis, and visually explore the results.
to create a summarized object I used "Homo_sapiens.GRCh37.75.gtf"
below is the script:
(txdb <- makeTxDbFromGFF("Homo_sapiens.GRCh37.75.gtf", format="gtf", circ_seqs=character()))
(ebg <- exonsBy(txdb, by="gene"))
se <- summarizeOverlaps(features=ebg,reads=bamfiles,mode="Union",singleEnd=FALSE,ignore.strand=TRUE,fragments=TRUE )
I am getting the following error message and am unable to get any count
Warning message:
In .Seqinfo.mergexy(x, y) :
The 2 combined objects have no sequence levels in common. (Use
suppressWarnings() to suppress this warning.)
Any suggestions?
RamTags: None
-
Latest Articles
Collapse
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 03-20-2025, 05:03 AM
|
0 responses
17 views
0 reactions
|
Last Post
by seqadmin
03-20-2025, 05:03 AM
|
||
Started by seqadmin, 03-19-2025, 07:27 AM
|
0 responses
18 views
0 reactions
|
Last Post
by seqadmin
03-19-2025, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
19 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
185 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Leave a comment: