Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • number of read in RNA-seq?

    another biologist here.

    I got RNA sequence (in .txt format) back from sequencing facility. Each file of 2 lanes high throughput sequencing is nearly 25GB.

    I want to know the number of read in each file. Which programmer or code I should use.

    thanks you.

  • #2
    Try the LINUX command "wc -l <filename>". This will count the number of lines in the file. If the file is in fastq format, you will have to divide the number of lines by 4. This will give you the number of reads.

    Comment


    • #3
      why we use -l in the command?
      Last edited by ashiq.hussain; 07-12-2011, 12:17 PM.

      Comment


      • #4
        its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps

        Comment


        • #5
          Originally posted by upendra_35 View Post
          its not -1 it is -l and the command wc -l counts the number of lines in the file. The easiest way to find out the number of reads in a fastq file with one command is grep "@" -c <filename>. Hope this helps
          This will not work! The '@' character may appear in fastq quality lines as well as the seq-id line. See this thread for a discussion on the problems of using grep to count reads in fastq files.

          Comment


          • #6
            thank you all.

            Comment


            • #7
              Sorry I missed "^" with @. It should be grep "^@" -c <filename>. However I agree with kmcarr that '@' is a character that may appear in fastq quality lines as well (sometimes at the start of the line) and so even that may not work. So the best way to do it is in two lines: wc -l <filename> followed by expr <no.of.lines> / 4

              Comment


              • #8
                compressed files

                Does it work with .gz files, or do I have to unpack them?

                Comment


                • #9
                  no use the power of piping

                  gzip -d -c input.fastq.gz | wc -l

                  Comment


                  • #10
                    I don't mean to revive a closed thread, just in case someone needs to know the number of lines in multiple compressed fastq files...

                    for file in *.bz2; do echo $file; b=$(bzcat $file | wc -l); echo $(($b/4)); done;

                    Comment


                    • #11
                      Great. Thanks. Very useful for me.

                      Comment


                      • #12
                        I didn't intend to revive this thread, but as I was searching for a solution to automatize the fastq read counting process, this solution may come in handy to some :

                        for i in `find . -name "*.fastq"`; do echo "$i" >> project_nbread.txt; egrep -c "`head -n 1 $i | awk -F '[@:]' '{ print $2 } '`" $i >> project_nbread.txt ; done
                        In the case of only one file, you can use this :
                        egrep -c "`head -n 1 file.fastq | awk -F '[@:]' '{ print $2 } '`" file.fastq
                        This solution will count the number of lines where the id is found in the header of a fastq seq, i.e the number of fastq reads.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Recent Advances in Sequencing Analysis Tools
                          by seqadmin


                          The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                          Today, 07:48 AM
                        • seqadmin
                          Essential Discoveries and Tools in Epitranscriptomics
                          by seqadmin




                          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                          04-22-2024, 07:01 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, Today, 07:17 AM
                        0 responses
                        6 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 05-02-2024, 08:06 AM
                        0 responses
                        19 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-30-2024, 12:17 PM
                        0 responses
                        20 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 04-29-2024, 10:49 AM
                        0 responses
                        28 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X