Hi Wei,
Well, that was the problem, the HD was almost full.
Thanks again
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Have you checked if there is enough disk space in your working directory?
Leave a comment:
-
Hi Wei,
Since yesterday an error message in thrown when running featureCount program on RNA-seq data.
Run1
//================================= Running ==================================\\
|| ||
|| Load annotation file ../Homo_sapiens.GRCh38.78.gtf ... ||
|| Features : 1232660 ||
|| Meta-features : 64253 ||
|| Chromosomes : 270 ||
|| ||
|| Process BAM file 36B_accepted_hits.bam... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| Found reads that are not properly paired. ||
|| (missing mate or the mate is not the next read) ||
Segmentation fault (core dumped)
Run2
//================================= Running ==================================\\
|| ||
|| Load annotation file ../Homo_sapiens.GRCh38.78.gtf ... ||
|| Features : 1232660 ||
|| Meta-features : 64253 ||
|| Chromosomes : 270 ||
|| ||
|| Process BAM file 36B1_accepted_hits.bam... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| Found reads that are not properly paired. ||
|| (missing mate or the mate is not the next read) ||
Cannot load read part from the tmp file!
featureCounts: input-files.c:2216: sort_SAM_finalise: Assertion `0' failed.
//================================= Running ==================================\\
|| ||
|| Load annotation file ../Homo_sapiens.GRCh38.78.gtf ... ||
|| Features : 1232660 ||
|| Meta-features : 64253 ||
|| Chromosomes : 270 ||
|| ||
|| Process BAM file 39B_accepted_hits.bam... ||
|| Paired-end reads are included. ||
|| Assign fragments (read pairs) to features... ||
|| Found reads that are not properly paired. ||
|| (missing mate or the mate is not the next read) ||
Cannot load read part from the tmp file!
featureCounts: input-files.c:2216: sort_SAM_finalise: Assertion `0' failed.
Aborted (core dumped)
Every different run a different error is flagged.Additionally about 225 files are dumped in working directory named
temp-core-002882-6B8B4567.sam temp-sort-002882-327B23C6-CHK00000000-BLK119.bin temp-sort-004072-327B23C6-CHK00000000-BLK007.bin temp-sort-004072-327B23C6-CHK00000000-BLK124.bin
temp-core-004072-6B8B4567.sam temp-sort-002882-327B23C6-CHK00000000-BLK120.bin temp-sort-004072-327B23C6-CHK00000000-BLK008.bin temp-sort-004072-327B23C6-CHK00000000-BLK125.bin
temp-sort-002882-327B23C6-CHK00000000-BLK004.bin
Parameters used in the program. With the same parameters I ran 40 Paired End reads aligned on bam files without any error.
featureCounts -p -s 2 -M -O \
-a ../Homo_sapiens.GRCh38.78.gtf -o ../../../Counts_overlap/${sampNo}_counts.txt -t exon -g gene_id -T 10 *accepted_hits.bam
Could you please let me know how do I troubleshoot this error.
Leave a comment:
-
Having a high mapping percentage does not mean you will have a high percentage of reads being assigned to genes included in your annotation. This is affected by many factors such as the completeness of your annotation, your mRNA enrichment quality and etc.
I typically see around 50-60% of reads successfully assigned to genes when NCBI RefSeq gene annotation is used.
Wei
Leave a comment:
-
total assigned fragments : 47.9%
Summary
Status 34b1_accepted_hits.bam
Assigned 28314328
Unassigned_Ambiguity 6375026
Unassigned_MultiMapping 0
Unassigned_NoFeatures 24480296
Unassigned_Unmapped 0
Unassigned_MappingQuality 0
Unassigned_FragementLength 0
Unassigned_Chimera 0
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_Duplicate 0
Any reason why there are so many unassigned Nofeature and ambiguous reads. When the aligned bam file was tested with samtools flagstats function it showed 100% mapped reads and ~86% properly paired reads.
Leave a comment:
-
I just found when when gene_name is used both the packages it provides consistent meta-features.
C version
featureCounts -p -s 2 -M \
-a ../Homo_sapiens.GRCh38.78.gtf -o counts.txt -t exon -g gene_name -T 10 *accepted_hits.bam
R subread
count<-featureCounts(files="accepted_hits.bam",
,annot.ext="Homo_sapiens.GRCh38.78.gtf",isGTFAnnotationFile=TRUE,
GTF.featureType="exon",GTF.attrType="gene_name",useMetaFeatures=TRUE,
allowMultiOverlap=FALSE,isPairedEnd=TRUE,requireBothEndsMapped=FALSE,
checkFragLength=FALSE,minFragLength=50,maxFragLength=1000,
nthreads=20,strandSpecific=2,minMQS=0,
readExtension5=0,readExtension3=0,read2pos=NULL,
minReadOverlap=1,countSplitAlignmentsOnly=FALSE,
countMultiMappingReads=TRUE,countPrimaryAlignmentsOnly=FALSE,
countChimericFragments=TRUE,ignoreDup=FALSE,chrAliases=NULL,reportReads=FALSE)
However, when multiple featurecounts ran on different tabs for different samples. Multi-threading does not get used up. However, for single run, its much faster.
Leave a comment:
-
Can you provide your commands and also the versions of programs you use? This will help find out the difference between your two runs.
Also, the percentages of assigned fragments suggest your data are reversely stranded reads. So you should use '-s 2', not '-s 1'.
Cheers,
Wei
Leave a comment:
-
Hi all,
Similar to the question posted by Leon regarding the chromosomes no: 270 with latest release 78, GRCh38.gtf from ensemble I found the same issue in my analysis. I am using RNA-seq paired end data illumina 75bp read outs. After QC and trimming adapters and low quality bp, I have aligned trimmed paired reads with Tophat2.0/Bowtie2. Before aligning, concatenated all .fa files for chromosomes 1:22, +M+X+Y, and then built index on the concatenated.GrCh38.fa. Using the bam file generated featureCount program, read summarisation was performed keeping parameter s -2 (52.1% assigned fragments) as s -1 (~1.5%) was giving significantly low successfully assigned fragments.
Result
Meta-features : 58674
Chromosomes : 270
This was tested in standalone FeatureCount program coded in C.
When the same data was used in Rsubread featureCount function
Meta-features : 64253
Chromosomes : 270
Questions
1. Is there any discrepancy in the program in C and R package.
2. Chromosomes 270 and meta feature 58674 makes sense ?
3. Increasing number of threads to 20 in both (C and R) is not increasing the speed of the program, and none of the cores/threads are being utilised.
Leave a comment:
-
Originally posted by shi View PostFor read mapping, you may also try Subread aligner. Its performance has been demonstrated in the recent SEQC/MAQC III study:
We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA …
This publication includes detailed results from the Subread-featureCounts-limma/voom pipeline and comparisons with other pipelines from using billions of RNA-seq reads generated in six different sequencing centers.
Leave a comment:
-
For read mapping, you may also try Subread aligner. Its performance has been demonstrated in the recent SEQC/MAQC III study:
We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA …
This publication includes detailed results from the Subread-featureCounts-limma/voom pipeline and comparisons with other pipelines from using billions of RNA-seq reads generated in six different sequencing centers.Last edited by shi; 01-15-2015, 02:46 PM.
Leave a comment:
-
Originally posted by dpryan View Posta) --clip3pAdapterSeq
b) You'd need to either run fastQC (or something like that) or just ask whomever did the sequencing. They can give you the adapter sequence (it's presumably just the standard Illumina adapter sequence).
TruSeq Universal Adapter
5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
Is this then the sequence, I should state after the '--clip3pAdapterSeq' option? Or should I reverse it like so:
3' TCTAGCCTTCTCGCAGCACATCCCTTTCTCACATCTAGAGCCACCAGCGGCATAGTAA
?
1. http://www.illumina.com/products/tru..._prep_kit.html
2. http://supportres.illumina.com/docum...nce-letter.pdf
Leave a comment:
-
a) --clip3pAdapterSeq
b) You'd need to either run fastQC (or something like that) or just ask whomever did the sequencing. They can give you the adapter sequence (it's presumably just the standard Illumina adapter sequence).
Leave a comment:
-
Originally posted by dpryan View PostJein, as my German colleagues would say. Aligners can nicely soft-clip low-quality or otherwise mismatched stretches of sequence. However, adapter sequence is going to be both good quality and easy to spot. The general worry is that by leaving it in there, you can bias what should otherwise be multimapping reads/pairs to aberrantly become uniquely mapping. Since adapter sequences can be spotted with high fidelity, they can just be trimmed off to avoid this. Quality trimming doesn't gain much for RNAseq (there are some papers on this), though it seems that trimming off the really low quality bases (<5 or so) seems to have some benefit). Anyway, I believe that STAR has an option to adapter trim for you, which is probably the most convenient way to go.
Looking in the manual, STAR does indeed have options for adapter removal:
----------------------------------------
--clip3pAdapterSeq
default: -
string(s): adapter sequences to clip from 3p of each mate. If one value is given,
it will be assumed the same for both mates.
--clip3pAdapterMMp
default: 0.1
double(s): max proportion of mismatches for 3p adpater clipping for each mate.
If one value is given, it will be assumed the same for both mates.
--clip3pAfterAdapterNbases
default: 0
int(s): number of bases to clip from 3p of each mate after the adapter clipping.
If one value is given, it will be assumed the same for both mates.
----------------------------------------
However I am not quite sure of:
a) Which would be the right option to use
b) How do I find out the appropriate adapter sequence to state to STAR?
Leave a comment:
-
Originally posted by LeonDK View PostRegarding 'trimming', I have been told, that this is not necessary, since the aligner will simply 'skip' adapters/bad quality bases, apparently using something called 'soft-clipping'?
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.
3D Genomics
While spatial biology often involves studying proteins and RNAs in their...-
Channel: Articles
01-01-2025, 07:30 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 07:35 AM
|
0 responses
12 views
0 likes
|
Last Post
by seqadmin
Today, 07:35 AM
|
||
Started by seqadmin, Yesterday, 09:43 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
Yesterday, 09:43 AM
|
||
Started by seqadmin, Yesterday, 08:36 AM
|
0 responses
17 views
0 likes
|
Last Post
by seqadmin
Yesterday, 08:36 AM
|
||
Started by seqadmin, 01-17-2025, 09:38 AM
|
0 responses
37 views
0 likes
|
Last Post
by seqadmin
01-17-2025, 09:38 AM
|
Leave a comment: