Seqanswers Leaderboard Ad

**aggp11** · 07-09-2012, 10:24 AM

Hi,

You could try the FASTQC package if you haven't already. It can take fastq/bam/sam files and gives most of the important statistics for a NGS run.

**husamia** · 07-09-2012, 10:32 AM

I suggest using native linux tools such as grep, sed, awk in multithreaded environment also 64 bit may be useful in some applications where it is supported. There is option of using CUDA with GPU to do super fast calculations.

**JackieBadger** · 07-09-2012, 11:17 AM

PRINSEQ and FASTQC

**Richard Finney** · 07-09-2012, 11:36 AM

If you're up for moding a couple of lines of code for your needs
this should do the trick ...

Code:

#include <stdio.h>
#include <string.h>
#include <ctype.h>
unsigned long int sum[5];
unsigned long int basecount;
unsigned long int readcount = 0;
char s[512];
int main()
{
    register int i,j;
    char ch;
    basecount = 0;
    memset(sum,0,sizeof(sum));
    while (gets(s))
    {
        if (s[0] == '>') continue; // skip fasta entry header
        readcount++;
        for (i=0;i<s[i];i++)
        {
            ch = toupper(s[i]);
            if (ch == 'A') { sum[0]++; basecount++; }
            else if (ch == 'C') { sum[1]++; basecount++; }
            else if (ch == 'G') { sum[2]++; basecount++; }
            else if (ch == 'T') { sum[3]++; basecount++; }
            else if (ch == 'N') { sum[4]++; basecount++; }
        }
        memset(s,0,sizeof(s));
    }
    for (j=0;j<5;j++)
    {
        if (j == 0) printf("A ");
        else if (j == 1) printf("C ");
        else if (j == 2) printf("G ");
        else if (j == 3) printf("T ");
        else if (j == 4) printf("N ");
        printf("%ld ",sum[j]);
        printf("\n");
    }
    printf("bases = %ld \n",basecount);
    printf("reads = %ld \n",readcount);
    return 0;
}

**maubp** · 07-09-2012, 12:20 PM

If you don't want error checking Heng Li has a very fast FASTA/FASTQ parser in C which could easily be used for the basic information you requested (read count and total bases):

FASTA/FASTQ Parser in C

http://lh3lh3.users.sourceforge.net/parsefastq.shtml

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 24 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

fastest way to 'parse' fasta or fastq?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News