Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • suhaib58
    replied
    Hi Wei,

    Well, that was the problem, the HD was almost full.

    Thanks again

    Leave a comment:


  • shi
    replied
    Have you checked if there is enough disk space in your working directory?

    Leave a comment:


  • suhaib58
    replied
    Hi Wei,

    Since yesterday an error message in thrown when running featureCount program on RNA-seq data.

    Run1
    //================================= Running ==================================\\
    || ||
    || Load annotation file ../Homo_sapiens.GRCh38.78.gtf ... ||
    || Features : 1232660 ||
    || Meta-features : 64253 ||
    || Chromosomes : 270 ||
    || ||
    || Process BAM file 36B_accepted_hits.bam... ||
    || Paired-end reads are included. ||
    || Assign fragments (read pairs) to features... ||
    || Found reads that are not properly paired. ||
    || (missing mate or the mate is not the next read) ||
    Segmentation fault (core dumped)

    Run2
    //================================= Running ==================================\\
    || ||
    || Load annotation file ../Homo_sapiens.GRCh38.78.gtf ... ||
    || Features : 1232660 ||
    || Meta-features : 64253 ||
    || Chromosomes : 270 ||
    || ||
    || Process BAM file 36B1_accepted_hits.bam... ||
    || Paired-end reads are included. ||
    || Assign fragments (read pairs) to features... ||
    || Found reads that are not properly paired. ||
    || (missing mate or the mate is not the next read) ||
    Cannot load read part from the tmp file!
    featureCounts: input-files.c:2216: sort_SAM_finalise: Assertion `0' failed.


    //================================= Running ==================================\\
    || ||
    || Load annotation file ../Homo_sapiens.GRCh38.78.gtf ... ||
    || Features : 1232660 ||
    || Meta-features : 64253 ||
    || Chromosomes : 270 ||
    || ||
    || Process BAM file 39B_accepted_hits.bam... ||
    || Paired-end reads are included. ||
    || Assign fragments (read pairs) to features... ||
    || Found reads that are not properly paired. ||
    || (missing mate or the mate is not the next read) ||
    Cannot load read part from the tmp file!
    featureCounts: input-files.c:2216: sort_SAM_finalise: Assertion `0' failed.
    Aborted (core dumped)



    Every different run a different error is flagged.Additionally about 225 files are dumped in working directory named

    temp-core-002882-6B8B4567.sam temp-sort-002882-327B23C6-CHK00000000-BLK119.bin temp-sort-004072-327B23C6-CHK00000000-BLK007.bin temp-sort-004072-327B23C6-CHK00000000-BLK124.bin
    temp-core-004072-6B8B4567.sam temp-sort-002882-327B23C6-CHK00000000-BLK120.bin temp-sort-004072-327B23C6-CHK00000000-BLK008.bin temp-sort-004072-327B23C6-CHK00000000-BLK125.bin
    temp-sort-002882-327B23C6-CHK00000000-BLK004.bin

    Parameters used in the program. With the same parameters I ran 40 Paired End reads aligned on bam files without any error.

    featureCounts -p -s 2 -M -O \
    -a ../Homo_sapiens.GRCh38.78.gtf -o ../../../Counts_overlap/${sampNo}_counts.txt -t exon -g gene_id -T 10 *accepted_hits.bam


    Could you please let me know how do I troubleshoot this error.

    Leave a comment:


  • suhaib58
    replied
    Thank you very much for your advise and time.

    Leave a comment:


  • shi
    replied
    Having a high mapping percentage does not mean you will have a high percentage of reads being assigned to genes included in your annotation. This is affected by many factors such as the completeness of your annotation, your mRNA enrichment quality and etc.

    I typically see around 50-60% of reads successfully assigned to genes when NCBI RefSeq gene annotation is used.

    Wei

    Leave a comment:


  • suhaib58
    replied
    total assigned fragments : 47.9%

    Summary
    Status 34b1_accepted_hits.bam
    Assigned 28314328
    Unassigned_Ambiguity 6375026
    Unassigned_MultiMapping 0
    Unassigned_NoFeatures 24480296
    Unassigned_Unmapped 0
    Unassigned_MappingQuality 0
    Unassigned_FragementLength 0
    Unassigned_Chimera 0
    Unassigned_Secondary 0
    Unassigned_Nonjunction 0
    Unassigned_Duplicate 0

    Any reason why there are so many unassigned Nofeature and ambiguous reads. When the aligned bam file was tested with samtools flagstats function it showed 100% mapped reads and ~86% properly paired reads.

    Leave a comment:


  • suhaib58
    replied
    I just found when when gene_name is used both the packages it provides consistent meta-features.
    C version
    featureCounts -p -s 2 -M \
    -a ../Homo_sapiens.GRCh38.78.gtf -o counts.txt -t exon -g gene_name -T 10 *accepted_hits.bam

    R subread
    count<-featureCounts(files="accepted_hits.bam",
    ,annot.ext="Homo_sapiens.GRCh38.78.gtf",isGTFAnnotationFile=TRUE,
    GTF.featureType="exon",GTF.attrType="gene_name",useMetaFeatures=TRUE,
    allowMultiOverlap=FALSE,isPairedEnd=TRUE,requireBothEndsMapped=FALSE,
    checkFragLength=FALSE,minFragLength=50,maxFragLength=1000,
    nthreads=20,strandSpecific=2,minMQS=0,
    readExtension5=0,readExtension3=0,read2pos=NULL,
    minReadOverlap=1,countSplitAlignmentsOnly=FALSE,
    countMultiMappingReads=TRUE,countPrimaryAlignmentsOnly=FALSE,
    countChimericFragments=TRUE,ignoreDup=FALSE,chrAliases=NULL,reportReads=FALSE)

    However, when multiple featurecounts ran on different tabs for different samples. Multi-threading does not get used up. However, for single run, its much faster.

    Leave a comment:


  • shi
    replied
    Can you provide your commands and also the versions of programs you use? This will help find out the difference between your two runs.

    Also, the percentages of assigned fragments suggest your data are reversely stranded reads. So you should use '-s 2', not '-s 1'.

    Cheers,
    Wei

    Leave a comment:


  • suhaib58
    replied
    Hi all,

    Similar to the question posted by Leon regarding the chromosomes no: 270 with latest release 78, GRCh38.gtf from ensemble I found the same issue in my analysis. I am using RNA-seq paired end data illumina 75bp read outs. After QC and trimming adapters and low quality bp, I have aligned trimmed paired reads with Tophat2.0/Bowtie2. Before aligning, concatenated all .fa files for chromosomes 1:22, +M+X+Y, and then built index on the concatenated.GrCh38.fa. Using the bam file generated featureCount program, read summarisation was performed keeping parameter s -2 (52.1% assigned fragments) as s -1 (~1.5%) was giving significantly low successfully assigned fragments.

    Result
    Meta-features : 58674
    Chromosomes : 270
    This was tested in standalone FeatureCount program coded in C.

    When the same data was used in Rsubread featureCount function
    Meta-features : 64253
    Chromosomes : 270

    Questions
    1. Is there any discrepancy in the program in C and R package.
    2. Chromosomes 270 and meta feature 58674 makes sense ?
    3. Increasing number of threads to 20 in both (C and R) is not increasing the speed of the program, and none of the cores/threads are being utilised.

    Leave a comment:


  • LeonDK
    replied
    Originally posted by shi View Post
    For read mapping, you may also try Subread aligner. Its performance has been demonstrated in the recent SEQC/MAQC III study:

    We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA …


    This publication includes detailed results from the Subread-featureCounts-limma/voom pipeline and comparisons with other pipelines from using billions of RNA-seq reads generated in six different sequencing centers.
    Thank you for your input. The reason for using star is speed and that it "is-the-aligner-of-choice", where I am working

    Leave a comment:


  • shi
    replied
    For read mapping, you may also try Subread aligner. Its performance has been demonstrated in the recent SEQC/MAQC III study:

    We present primary results from the Sequencing Quality Control (SEQC) project, coordinated by the US Food and Drug Administration. Examining Illumina HiSeq, Life Technologies SOLiD and Roche 454 platforms at multiple laboratory sites using reference RNA samples with built-in controls, we assess RNA …


    This publication includes detailed results from the Subread-featureCounts-limma/voom pipeline and comparisons with other pipelines from using billions of RNA-seq reads generated in six different sequencing centers.
    Last edited by shi; 01-15-2015, 02:46 PM.

    Leave a comment:


  • LeonDK
    replied
    Originally posted by dpryan View Post
    a) --clip3pAdapterSeq
    b) You'd need to either run fastQC (or something like that) or just ask whomever did the sequencing. They can give you the adapter sequence (it's presumably just the standard Illumina adapter sequence).
    Ok, so the kit used is [1] and the standard Illumina adapter sequences can be found in [2], where it on p12 says:

    TruSeq Universal Adapter
    5’ AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT

    Is this then the sequence, I should state after the '--clip3pAdapterSeq' option? Or should I reverse it like so:

    3' TCTAGCCTTCTCGCAGCACATCCCTTTCTCACATCTAGAGCCACCAGCGGCATAGTAA

    ?


    1. http://www.illumina.com/products/tru..._prep_kit.html
    2. http://supportres.illumina.com/docum...nce-letter.pdf

    Leave a comment:


  • dpryan
    replied
    a) --clip3pAdapterSeq
    b) You'd need to either run fastQC (or something like that) or just ask whomever did the sequencing. They can give you the adapter sequence (it's presumably just the standard Illumina adapter sequence).

    Leave a comment:


  • LeonDK
    replied
    Originally posted by dpryan View Post
    Jein, as my German colleagues would say. Aligners can nicely soft-clip low-quality or otherwise mismatched stretches of sequence. However, adapter sequence is going to be both good quality and easy to spot. The general worry is that by leaving it in there, you can bias what should otherwise be multimapping reads/pairs to aberrantly become uniquely mapping. Since adapter sequences can be spotted with high fidelity, they can just be trimmed off to avoid this. Quality trimming doesn't gain much for RNAseq (there are some papers on this), though it seems that trimming off the really low quality bases (<5 or so) seems to have some benefit). Anyway, I believe that STAR has an option to adapter trim for you, which is probably the most convenient way to go.
    Ok, I think that makes sense! And 'Jein' would be 'Nja' in Danish

    Looking in the manual, STAR does indeed have options for adapter removal:
    ----------------------------------------
    --clip3pAdapterSeq
    default: -
    string(s): adapter sequences to clip from 3p of each mate. If one value is given,
    it will be assumed the same for both mates.
    --clip3pAdapterMMp
    default: 0.1
    double(s): max proportion of mismatches for 3p adpater clipping for each mate.
    If one value is given, it will be assumed the same for both mates.
    --clip3pAfterAdapterNbases
    default: 0
    int(s): number of bases to clip from 3p of each mate after the adapter clipping.
    If one value is given, it will be assumed the same for both mates.
    ----------------------------------------

    However I am not quite sure of:
    a) Which would be the right option to use
    b) How do I find out the appropriate adapter sequence to state to STAR?

    Leave a comment:


  • dpryan
    replied
    Originally posted by LeonDK View Post
    Regarding 'trimming', I have been told, that this is not necessary, since the aligner will simply 'skip' adapters/bad quality bases, apparently using something called 'soft-clipping'?
    Jein, as my German colleagues would say. Aligners can nicely soft-clip low-quality or otherwise mismatched stretches of sequence. However, adapter sequence is going to be both good quality and easy to spot. The general worry is that by leaving it in there, you can bias what should otherwise be multimapping reads/pairs to aberrantly become uniquely mapping. Since adapter sequences can be spotted with high fidelity, they can just be trimmed off to avoid this. Quality trimming doesn't gain much for RNAseq (there are some papers on this), though it seems that trimming off the really low quality bases (<5 or so) seems to have some benefit). Anyway, I believe that STAR has an option to adapter trim for you, which is probably the most convenient way to go.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Recent Innovations in Spatial Biology
    by seqadmin


    Spatial biology is an exciting field that encompasses a wide range of techniques and technologies aimed at mapping the organization and interactions of various biomolecules in their native environments. As this area of research progresses, new tools and methodologies are being introduced, accompanied by efforts to establish benchmarking standards and drive technological innovation.

    3D Genomics
    While spatial biology often involves studying proteins and RNAs in their...
    01-01-2025, 07:30 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:35 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 09:43 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:36 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 01-17-2025, 09:38 AM
0 responses
37 views
0 likes
Last Post seqadmin  
Working...
X