Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • missing libraries with featureCounts?

    Hello!

    I’m working with 33 RNAseq libraries, and I’m having a problem with featureCounts. I start with sorted bam files (which are named sorted_6346.bam, sorted_6347.bam all the way until sorted_6378.bam), which I then pass to featureCounts with this command:

    featureCounts -a ~/genomes/Mouse/ensembl_genome/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf —t exon -g gene_id -s 2 -p -R -M sorted_63* -o output

    The individual output files look fine, but there seems to be something wrong with the combined output table. Here, counts from the first two libraries appear to be missing. For example, if I take one particular gene, ENSMUSG00000029614:

    >grep "ENSMUSG00000029614" output
    ENSMUSG00000029614 5;5;5;5;5;5 121204481;121205406;121206638;121207077;121208196;121208782 121204552;121206445;121206810;121207384;121208575;121209241 +;+;+;+;+;+ 2433 36508 47652 50431 11667 15455 75749 15577 27682 67064 14802 12306 26099 55411 17297 52910 22243 29685 18242 36564 21280 31884 10634 75043 22386 31312 17584 5298 27524 13846 14408 21197


    As you can see, the first 6 fields are the usual ones from featureCounts:
    Geneid Chr Start End Strand Length
    ENSMUSG00000029614 5;5;5;5;5;5 121204481;121205406;121206638;121207077;121208196;121208782 121204552;121206445;121206810;121207384;121208575;121209241 +;+;+;+;+;+ 2433

    After this, there should be the counts from each of the 33 libraries (6346-6378), but there are only 31 (starting with 36508).

    To investigate further, I looked at the individual outputs:

    [ls299@themonster ensembl_genome]grep -c "ENSMUSG00000029614" sorted_63*.bam.featureCounts
    sorted_6346.bam.featureCounts:32761
    sorted_6347.bam.featureCounts:31802
    sorted_6348.bam.featureCounts:36508
    sorted_6349.bam.featureCounts:47652
    sorted_6350.bam.featureCounts:50431
    sorted_6351.bam.featureCounts:11667
    sorted_6352.bam.featureCounts:15455
    sorted_6353.bam.featureCounts:75749
    sorted_6354.bam.featureCounts:15577
    sorted_6355.bam.featureCounts:27682
    sorted_6356.bam.featureCounts:67064
    sorted_6357.bam.featureCounts:14802
    sorted_6358.bam.featureCounts:12306
    sorted_6359.bam.featureCounts:26099
    sorted_6360.bam.featureCounts:55411
    sorted_6361.bam.featureCounts:17297
    sorted_6362.bam.featureCounts:52910
    sorted_6363.bam.featureCounts:22243
    sorted_6364.bam.featureCounts:29685
    sorted_6365.bam.featureCounts:18242
    sorted_6366.bam.featureCounts:36564
    sorted_6367.bam.featureCounts:21280
    sorted_6368.bam.featureCounts:31884
    sorted_6369.bam.featureCounts:10634
    sorted_6370.bam.featureCounts:75043
    sorted_6371.bam.featureCounts:22386
    sorted_6372.bam.featureCounts:31312
    sorted_6373.bam.featureCounts:17584
    sorted_6374.bam.featureCounts:5298
    sorted_6375.bam.featureCounts:27524
    sorted_6376.bam.featureCounts:13846
    sorted_6377.bam.featureCounts:14408
    sorted_6378.bam.featureCounts:21197

    As you can see, there are in fact counts for the first two libraries, it just looks like they are missing in the combined table.

    Any ideas as to what’s going on?

    Thanks a lot!

  • #2
    Hi, are you using the latest version (1.5.0-p1)?

    Comment


    • #3
      Hi, thanks for the reply.

      Yes, I am:
      featureCounts -v
      featureCounts v1.5.0-p1

      Comment


      • #4
        I noticed that the '—t' option in your command includes a long dash, which is invalid. Could you replace it with a hyphen and then reran your command? This invalid option might cause problems for processing the parameters after it by featureCounts.

        Comment


        • #5
          Yes, that seems to have worked! Thank you so much.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Best Practices for Single-Cell Sequencing Analysis
            by seqadmin



            While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
            06-06-2024, 07:15 AM
          • seqadmin
            Latest Developments in Precision Medicine
            by seqadmin



            Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

            Somatic Genomics
            “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
            05-24-2024, 01:16 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 06-21-2024, 07:49 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-20-2024, 07:23 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-17-2024, 06:54 AM
          0 responses
          16 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 06-14-2024, 07:24 AM
          0 responses
          25 views
          0 likes
          Last Post seqadmin  
          Working...
          X