Seqanswers Leaderboard Ad

**aggp11** · 07-09-2012, 10:24 AM

Hi,

You could try the FASTQC package if you haven't already. It can take fastq/bam/sam files and gives most of the important statistics for a NGS run.

**husamia** · 07-09-2012, 10:32 AM

I suggest using native linux tools such as grep, sed, awk in multithreaded environment also 64 bit may be useful in some applications where it is supported. There is option of using CUDA with GPU to do super fast calculations.

**JackieBadger** · 07-09-2012, 11:17 AM

PRINSEQ and FASTQC

**Richard Finney** · 07-09-2012, 11:36 AM

If you're up for moding a couple of lines of code for your needs
this should do the trick ...

Code:

#include <stdio.h>
#include <string.h>
#include <ctype.h>
unsigned long int sum[5];
unsigned long int basecount;
unsigned long int readcount = 0;
char s[512];
int main()
{
    register int i,j;
    char ch;
    basecount = 0;
    memset(sum,0,sizeof(sum));
    while (gets(s))
    {
        if (s[0] == '>') continue; // skip fasta entry header
        readcount++;
        for (i=0;i<s[i];i++)
        {
            ch = toupper(s[i]);
            if (ch == 'A') { sum[0]++; basecount++; }
            else if (ch == 'C') { sum[1]++; basecount++; }
            else if (ch == 'G') { sum[2]++; basecount++; }
            else if (ch == 'T') { sum[3]++; basecount++; }
            else if (ch == 'N') { sum[4]++; basecount++; }
        }
        memset(s,0,sizeof(s));
    }
    for (j=0;j<5;j++)
    {
        if (j == 0) printf("A ");
        else if (j == 1) printf("C ");
        else if (j == 2) printf("G ");
        else if (j == 3) printf("T ");
        else if (j == 4) printf("N ");
        printf("%ld ",sum[j]);
        printf("\n");
    }
    printf("bases = %ld \n",basecount);
    printf("reads = %ld \n",readcount);
    return 0;
}

**maubp** · 07-09-2012, 12:20 PM

If you don't want error checking Heng Li has a very fast FASTA/FASTQ parser in C which could easily be used for the basic information you requested (read count and total bases):

FASTA/FASTQ Parser in C

http://lh3lh3.users.sourceforge.net/parsefastq.shtml

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

fastest way to 'parse' fasta or fastq?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News